Sunday, October 4, 2009

Statistics 'R' easy!

A while back I read an early review of 'R in Action' from Manning. This gave me some ideas about what the R language can be used for-- to be specific, it makes statistics easy! With just a few lines, you can generate mean, standard deviation, quartiles, histograms, graphs, and much more.

I've recently had the opportunity to use this new-found knowledge. At work we're building a new sharded-MySQL back end, and we've just started performance testing. We ran our first performance test and started looking at latency data. I poured the data into R, ran a few stats and printed a few graphs-- instantly a pattern showed up in the graph that shows us a repeating pattern of incremental slowdowns. (It's a saw-tooth pattern, much like what you see in JVM garbage collection graphs-- starts out low, builds to an unacceptable level, then drops back down to the initial decent value.) Cool!

R pointed out the problem, now to go dig up the root cause....