Saturday, April 3, 2010

Performance testing tip -- Visualize your data to help reveal patterns!

When you're doing performance testing, take a little extra time and examine your data for patterns. Sometimes the patterns aren't easy to see, but with the help of some tools they'll jump right out at you. Let's look at an example.

Here's some sample data that represents the amount of time it takes for a test driver to invoke the server. Do you see the pattern?

latency
0.7630
0.0626
0.0662
0.0705
0.0833
0.0631
0.0656
0.0712
0.0841
0.0623
0.0671
0.0699
0.0840
0.0619
0.0659
0.0720
0.0835
0.0630
0.0659
0.0706
0.0835
0.0619
0.0671


I can't see the pattern immediately. So let's have a look using 'R', the excellent open source statistical language. (If you're not using R yet, you might consider looking at it. It really is easy to use, and it gives you excellent statistical capabilities. There are lots of nice web tutorials on how to use it, as well as the excellent Manning title "R in Action") Here's the R script:

timingticks <- read.table("/home/rick/Blog_Temp/My.dat", header=TRUE)
attach(timingticks)
'Count'
length(latency)
summary(timingticks)
plot(latency, col='blue')
axis(2, tck=1)
q()


The script is pretty self-explanatory, but what we're saying is roughly this:
- Use my data file.
- Get me the number of records (length) of that file, call it 'Count'
- Give me a statistical summary of the file
- Make a plot of the file, using units of '1' on the left axis. THIS IS THE PART THAT SOMETIMES MAKES THE PATTERNS JUMP OUT!

So let's run the script then have a look at the results. First, the statistical output:

[1] "Count"
[1] 23
latency
Min. :0.06190
1st Qu.:0.06435
Median :0.06710
Mean :0.10036
3rd Qu.:0.07765
Max. :0.76300
> proc.time()
user system elapsed
0.732 0.028 0.747

This alone makes using R worthwhile. With almost no effort, I got some good information about my application timing performance. But that's not the real pay-dirt in this case! Let's have a look at the plot:



Now that's more interesting! Let's ignore the top dot, which represents a 'first transaction'. These are often much longer than the following entries due to 'warm up' issues. But look at the little repeating saw-tooth at the bottom! This is indicative of some sort of repeated behaviour.

In a real-world scenario, this was exactly what I saw when we found some server-side code that was trying to improve performance with a cache. The code was caching a value in a list, then when the list got to a certain size it tossed the cache and started over again. (In this case, the list size would be 4. See how the data goes in patterns, the timing gets a little longer each time for 4 cycles, then it 'busts' and starts over again?) It turned out our "performance-enhancing" cache was actually costing us performance! We removed the cache.

This is just one sample of how data visualisation can be used to help you find patterns in performance data. It doesn't take much time, and it can lead you to valuable insites. It's also fun!

Happy Coding!

No comments: