Tuesday, March 30, 2010

Extracting Log Messages on category

I was recently faced with this question of how to ease out log analysis for our production system. We run a cluster of sorts with applications running on 4 tomcat fronts. The site in question has requests/day going into millions. So, it is not a surprise the there is a flooding of logs. It gets really tough to monitor logs to see what's for example the highest occurring WARN messages if you end up logging 400 MB of log statements per hour.

So I came up with script that might help you as well if you are looking for some quick filtering of logs.

We have the following log format


here's the script:

use Getopt::Long;

$w = "(.+?)";

$DISTILLED_LOGFILE = "distilled-".$opt_level."-".$opt_file;
open(INPUTFILE, "$opt_file") or die("Could not open log file.");
open OUTPUTFILE, ">", $DISTILLED_LOGFILE, or die("Could not create filtered log file.");
foreach $line () {

 $line =~ m/^$w $w $w $w \| $w \[$w\]/;
 $date = $1;
 $timeStamp = $2;
 $logLevel = $3;
 $message = $4;
 $classLocation = $5;
 $httpProcessor = $6;

 if($logLevel eq $opt_level) {
  print OUTPUTFILE "$logLevel\t$message\t$classLocation\n";


}else {
  print STDOUT "You didn't select a file!\n";

This is how to use it

filter_logfile.pl -file someTomcatlogFile -level logLevel

What it does is pretty simple. matches each line for a regex pattern, checks if the line is the same log level as provided by you on command line; if yes. it copies the Log Level; Message and the component logging the message to a new file named distilled-LOGLEVEL-[YOUR-INPUT-LOG-FILE-NAME].

Happy Log filtering!

No comments: