Rick Wagner's Blog: Visualizing a Log's Timeline (without going blind reading text)

How to Visualize a Log Timeline

Have you ever tried to read a massive log file? It can be difficult. Logs from multithreaded servers (or worse, clusters) can be miserable to work with. Sometimes a little visualization can help.

Log reading in small single threaded applications is easy. You just read through the log until you see the ERROR, then back up from there to see what went wrong. But what if your server is multithreaded? Then it gets a little more difficult. Or if you're dealing with distributed components like messaging servers and users of those servers. Things can also be ugly there.

Sometimes it's just fun to visualize data, looking for patterns. So how can you do this easily? I like to use R.

Let's take an example. Say you're working with JBoss AS 7, and your log looks something like this:

2011-03-20 21:22:38,854 DEBUG [org.jboss.logging.Log4jService] Installed System.out adapter

2011-03-20 21:22:38,855 DEBUG [org.jboss.logging.Log4jService] Installed System.err adapter

2011-03-20 21:22:38,859 DEBUG [org.jboss.logging.Log4jService] Added ThrowableListener: org.jboss.logging.Log4jService$ThrowableListenerLoggingAdapter@1815338

Pretty dry, right? Working with this log, I'd probably go about visualizing it by computing 2 fields and extracting one for convenience:

A timestamp field, formatted so it's easily sortable. You probably could do this in R, or even with shell script utilities, but I took the easy road for this one and used python to work my log for me.

A numeric value that maps to the log severity. 0 for 'TRACE', 1 for 'DEBUG', and so on for INFO, WARN and ERROR.

For readability, I also extracted the text value of the log level. Shame on me for carrying duplicate data, but it makes the extract file more convient to read.

So the first 4 lines of my extract file look like this:

TimeStamp LogCode LogLevel

76958854 2 DEBUG

76958855 2 DEBUG

76958859 2 DEBUG

The header line is necessary for the R script. I'm sure you would've figured this out, but the first field was calculated by taking the 'hours' value (21 in the first line above) and multiplying it by 60 * 60 * 1000. That was added to the 'minutes' value (22 above) by 60 * 1000. That was added to 38 * 1000, and that was added to 854. As I said above, python works great for that. If you're able to do that quickly in a bash script, my hat's off to you.

But I digress. So we've got our extract file, now to visualize it. Just run it past an R script that reads something like this:

logData <- read.table("my3FiledExtractFile.txt", header=TRUE)

summary (logData)

png("Graph.png", res=200, height=1200, width=1200)

plot (jitter(logData$TimeStamp), logData$LogLevel,

col="blue",

xlab="Time",

ylab="Log Level", yaxt='n')

axis(2, c(1, 2, 3, 4, 5))

rug(logData$TimeStamp)

dev.off()

q()

First, the summary() output will provide you with some interesting statistics:

summary (logData)
   TimeStamp           LogCode       LogLevel    
 Min.   :76958854   Min.   :2.000   DEBUG:31126  
 1st Qu.:76997498   1st Qu.:2.000   ERROR:   61  
 Median :77043075   Median :2.000   INFO :  647  
 Mean   :77096907   Mean   :2.027   WARN :    7  
 3rd Qu.:77236122   3rd Qu.:2.000                
 Max.   :77320903   Max.   :5.000

So in an instant we can see what's in this log we're dealing with. But that's not the visualization we're looking for, that's what we see below.

Isn't that cool? We can see we start out with a bunch of DEBUG messages (the heavy line at the bottom), then get our first WARN (the lonely first dot on level 3) , a bunch of more DEBUGs, a bunch of WARNs and then our first ERROR. The timeline reads from left to right, the error levels progress upward for the various levels.

I think data visualization is cool, and I intend to learn more about it to help me draw information from raw data sources. To that end, I've been working with Packt's "R Graphs Cookbook", and will provide a book review here soon.

Happy Visualizing!

Rick Wagner's Blog

Thursday, April 7, 2011

Visualizing a Log's Timeline (without going blind reading text)

1 comment:

Search This Blog

BlogCritics

Blog Archive

Daily Chess Puzzle

About Me