Friday, February 22, 2013

Convert Apache Log Format to W3C format

A friend of mine came to me with a problem. He needs to convert his apache log format log files to W3C so that he can send them in to make sure royalties are getting collected properly.

Now why they demand the files in W3C format, I don't know.

After looking for, and not finding, a pre-made solution it was time to knuckle down and script that mother out.

Here's a sample of one line from the apache log:

127.0.0.1 - - [07/Feb/2013:00:00:16 -0600] "GET /admin.cgi ICY/1.0" 200 155 "-" "ShoutcastDSP (Mozilla Compatible)" 0

Here's what the W3C format of that log looks like:

127.0.0.1   127.0.01    02/07/2012    13:00:00    /admin.cgi    200    ShoutcastDSP (Mozilla Compatible)    155    0

So we've got some cutting, splicing, and re-ordering to get done:


-=Script=-

@echo off
setlocal enabledelayedexpansion
for /f "tokens=1,2,3,4,5,6,7,8,9,10,11,*" %%a in (Apache_LogFile_Format.txt) do (

::set date
set datetime=%%d
for /f "tokens=1" %%z in ("!datetime!") do set d=!datetime:~1,2!
for /f "tokens=1" %%z in ("!datetime!") do set m=!datetime:~4,3!
for /f "tokens=1" %%z in ("!datetime!") do set y=!datetime:~8,4!
if "!m!"=="Jan " set mn=01
if "!m!"=="Feb " set mn=02
if "!m!"=="Mar " set mn=03
if "!m!"=="Apr " set mn=04
if "!m!"=="May " set mn=05
if "!m!"=="Jun " set mn=06
if "!m!"=="Jul " set mn=07
if "!m!"=="Aug " set mn=08
if "!m!"=="Sep " set mn=09
if "!m!"=="Oct " set mn=10
if "!m!"=="Nov " set mn=11
if "!m!"=="Dec " set mn=12

::set time
for /f "tokens=1 delims=[" %%z in ("!datetime!") do set hh=!datetime:~13,2!
for /f "tokens=1 delims=[" %%z in ("!datetime!") do set mm=!datetime:~16,2!
for /f "tokens=1 delims=[" %%z in ("!datetime!") do set ss=!datetime:~19,2!


::replace " with #
for /f "tokens=* usebackq" %%w in ('%%l') do (
set tk=%%w
set tk=!tk:"=#!
)

::parse out the user-agent from the duration
for /f "tokens=1 delims=#" %%x in ("!tk!") do set ua=%%x
for /f "tokens=2 delims=#" %%y in ("!tk!") do set dur=%%y

echo %%a    %%a    !mn!/!d!/!y!    !hh!:!mm!:!ss!    %%g    %%i    !ua!    %%j    !dur! >> output.txt
)

-= End Script=-
Everything is pretty straight forward until we have to parse out the useragent and duration:
"ShoutcastDSP (Mozilla Compatible)" 0

The user agent string can be just about anything really, varied length, so we can't delimit on the spaces or the parenthesis, it sure would be nice to delimit on the double quotes but you can't do that.
So I picked up a new trick, using set to replace specific characters in a variable.
So we use this  set tk=!tk:"=#! the important part is "=# that's where we change the " to #. # we can use as a delimiter, suddenly splitting the user agent and duration just got easy.
instead of this:

"ShoutcastDSP (Mozilla Compatible)" 0
we end up with this:
#ShoutcastDSP (Mozilla Compatible)# 0

That's is kids. Hope this helps you out.

No comments:

Post a Comment

All comments moderated.
Comments like "sweet dude" or "this is awesome" or "thanks" will be denied,
if you've got something genuinely interesting to say, say it. Other than that just sit back and bask in the glory.