Author Topic: Ideas for formatting raw data to display.  (Read 725 times)

KnacK

  • Global Moderator
  • Autococker
  • Posts: 3039
Ideas for formatting raw data to display.
« on: December 13, 2007, 07:27:03 AM »
I'vegot some logs from a communications server that I need to be able to format in to human readable form, and possibly to import in a spreadsheet or database.

Here is a snippet:

Code: [Select]
MGR_NUM=10ENTRY_DATE=20071212ENTRY_DATE_DISP=ENTRY_TIME=0000000012043100E-
NTRY_UNIQUE_NUM=1ENTRY_TYPE=99ENTRY_KEY=UPDATE_DATE=2007121212043100U_VER-
SION=!ENTRY_LOG=12:04:31.84W:SHUTDOWN
MGR_NUM=998ENTRY_DATE=20071212ENTRY_DATE_DISP=ENTRY_TIME=0000000012041600-
ENTRY_UNIQUE_NUM=1ENTRY_TYPE=99ENTRY_KEY=UPDATE_DATE=2007121212041600U_VE-
RSION=!ENTRY_LOG=12:04:16.75W:SHUTDOWN
MGR_NUM=998ENTRY_DATE=20071212ENTRY_DATE_DISP=ENTRY_TIME=0000000012082600-
ENTRY_UNIQUE_NUM=1ENTRY_TYPE=17ENTRY_KEY=UPDATE_DATE=2007121212082600U_VE-
RSION=!ENTRY_LOG=12:08:26.15W:S00090020005<cr>12:08:26.15W:<ak>0<cr>
MGR_NUM=998ENTRY_DATE=20071212ENTRY_DATE_DISP=ENTRY_TIME=0000000012082600-
ENTRY_UNIQUE_NUM=2ENTRY_TYPE=17ENTRY_KEY=UPDATE_DATE=2007121212082600U_VE-
RSION=!ENTRY_LOG=12:08:26.45W:S00090030000<cr>12:08:26.46W:<ak>0<cr>
MGR_NUM=998ENTRY_DATE=20071212ENTRY_DATE_DISP=ENTRY_TIME=0000000012160800-
ENTRY_UNIQUE_NUM=1ENTRY_TYPE=17ENTRY_KEY=UPDATE_DATE=2007121212160800U_VE

The charactor boxes i nthe data are "esc". The "-" are end of line for a word wrap.

I've never done any kind of filtering/formatting and I figured that one of you guys might be able to shed light on it.

Zorchenhimer

  • Autococker
  • Posts: 2614
Re: Ideas for formatting raw data to display.
« Reply #1 on: December 13, 2007, 01:32:43 PM »
I wrote a script last week to turn this:

Quote
                              Index of /files/maps

   [ICO] [1]Name [2]Last modified [3]Size [4]Description
     __________________________________________________________________

   [ ] [5]Parent Directory   -
   [ ] [6]2fort5.bsp 12-Jul-2000 23:05 1.2M
   [ ] [7]airtime.bsp 23-Apr-2005 00:11 601K
   [ ] [8]airtime2.bsp 01-Feb-2006 22:45 784K
   [ ] [9]anthills.bsp 02-Jun-2005 23:49 166K
   [ ] [10]anthills.ent 23-Jul-2005 14:57 3.2K
   [ ] [11]antioch.bsp 09-Nov-2002 23:23 3.0M
   [ ] [12]antioch2.bsp 13-Jun-2003 16:48 2.9M
   [ ] [13]antioch3.bsp 01-Jul-2004 20:50 1.5M
   [ ] [14]antioch3.ent 15-Apr-2004 21:54 6.8K
   [ ] [15]anubis.bsp 14-May-2007 13:40 4.8M
   [ ] [16]arctic2.bsp 11-May-2001 21:03 1.7M
   [TXT] [17]arctic2.txt 22-May-2001 19:33 2.3K
   [ ] [18]arctic3.bsp 11-May-2001 21:04 883K

To this:

Quote
2fort5.bsp
airtime.bsp
airtime2.bsp
anthills.bsp
antioch.bsp
antioch2.bsp
antioch3.bsp
anubis.bsp
arctic2.bsp
arctic3.bsp

Here is what I used:

Quote
lynx -dump http://dplogin.com/files/maps/ | tail -374 | head -181 | awk '/.bsp/ { print $0 }' | awk '{print $3}'| sed 's/\[/ /g;s/\]/ /g' | awk '{print $2}' | cat >maps_dplogin.tmp

The bolded part is what will help you the most.  What it does it finds all the "[" and "]" characters and chances them into spaces.  Then, the awk command only prints the second column, which in my case is the name of the map file.

Hope some of that helps you out a bit.

XtremeBain

  • Developer
  • Autococker
  • Posts: 1470
Re: Ideas for formatting raw data to display.
« Reply #2 on: December 14, 2007, 11:11:13 AM »
1. Strip all instances of '-\n'
2. Replace '<cr>^U^[' with '<cr>'
3. Strip all remaining '^U^['

Now each entry is on its own line, each paramater/value pair is separated by ^[.

4. Cycle through each entry by separating the log by '\n'.
5. Cycle through each pair in the entry by separating with '^['.

Each paramater is the string before '=' and the value is the string after '='.

6. Make an SQL query or append a line to a CSV file with the values of each parm.

Parms:
MGR_NUM int
ENTRY_DATE date(%Y%m%d)
ENTRY_DATE_DISP null
ENTRY_TIME 00000000 + date(%k%M%S) + 00 or date(%Y%m%d%k%M%S) + 00 (those last two zeros are the two decimal places of a second, this field only shows time, the date descriptors are all zero'd)
ENTRY_UNIQUE_NUM int
ENTRY_TYPE int
ENTRY_KEY null
UPDATE_DATE date(%Y%m%d%k%M%S) + 00 (the date values are here in this one)
U_VERSION '!'
ENTRY_LOG string

ENTRY_LOG can be further tokenized by splitting it with '<cr>'.  Each of these tokens follow the format: date(%k:%M:%S).(first two decimal places of the second)'W'(not sure what this is, maybe W for Wednesday, doubt it...)':'MESSAGESTRING

MESSAGESTRING sometimes contain <cr> and <ak>, you probably just want to keep those as is.

The date() I used it the POSIX one that comes with UNIX installs, and won't work verbatim with PHP and possibly PERL, Windows, etc.  If you figure that you can't make a parser for this yourself, then I might have enough time over the weekend to whip something up if you don't mind showing my paypal some love. ;)

KnacK

  • Global Moderator
  • Autococker
  • Posts: 3039
Re: Ideas for formatting raw data to display.
« Reply #3 on: December 14, 2007, 11:22:49 AM »
Thanks Bain,

Essentually what I want to be able to do is feed a program this raw data and format it so that its easy on the eyes and easy to decypher.
PM me what you want deposited in your paypal account and we can negotiate from there :D