KeYeR's Code Kitchen: jmeter

Saturday, January 30, 2016

How to read performance tests report and use it to create reliable SLA (by example)

Analysis presented in this post are mostly suitable to analyse performance of web pages and services (especially web services) but also can give a hint how to approach performance report analysis of different application types.

If you have configured your performance tests well you will finish with rich report having many cool graphs and numerical data. This post will explain how to work with such a report well. It will also show what data should such a report contain for good application analysis. You can use it to know what extra widgets should be added to your report.

In the last post I've describe how to create decent performance tests resulting with pretty report. Let's use this loadosophias report as an example of a good performance tests report and describe the parts I find the most useful in everyday work.

(In screens below I am using old loadsophias report layout as it has all of the graphs on one tab which makes legend below easier. All graphs described below are present on the new layout and can be found easily.)

Although the whole report is filled with interesting and important statistical data we should analyse, there are five things that should be analysed first (according to numbers on the picture above).

Summary and why this is not enough for creating a reliable SLA

Test summery information (1) is the data you will normally end after running tests with typical test tools and should only be a start in well configured report. Those information are only really useful to compare the results to previous runs. It gives some overview of the system, although it doesn't explain why those values are as such - so it is not sufficient to create an SLA. Why? Such document should be a contract about what conditions we guarantee to client and with what probability.

For example SLA should state that we ensure that 99% of requests will end in at most 100 ms. Saying that avg response time will be 100 ms says nothing, because it can mean that 50% will be 190 ms and 50% will be 10 ms. If client would then set timeout to 110 ms he will get problems in 50% of cases - which is in most cases unacceptable and against SLA we've agreed to.

Response time diagnosis

To create a meaningful SLA or to check application performance in case of response time we need to see how those values were behaving through the time of test and what is its distribution.

First big input for creating an SLA is Overall Response time distribution & Quantiles section (2). This graph can literally fill our "X% will end in Y ms" section of such a document.

For example basing on our test, Google could say that 90% requests will end in 400 ms. Or that 87% of requests will end in 300 ms. Or even, 92% of requests will end in 500 ms, but 50% will end in less than 250 ms.

The question of how to write it in documents itself depends on the client profile. This data lets you to adjust SLA for the clients needs, but also gives you important information about your application performance.
In this case one could ask why 7.5% of requests are so far from the general distribution. Most requests end in 200 - 400 ms here, but the other group suddenly take 500 - 600 ms. Such a big deviation can mean application, hardware or even net problems. It's not rare that diagnosing such fact can improve general performance, not only performance of 7.5% requests.

I'm not saying that you must fight about performance of those 7.5% of requests. maybe even 600 ms is absolutely acceptable for your application? However you should always be able to answer why those 7.5% have that deviation in response time.

Additional input in response time analysis brings Response Times Distribution Over Time graph (3). It can additionally say when (according to test time) response time deviations happened. In our example we can see that the long response times appeared at the start of the test and than diapered. It can be an important tip, that whatever was the cause of such response times it disappered over time or is only appearing from time to time.

Response codes for the rescue

Response code over time graph (4) is a simple, but powerful tool for diagnosing web applications. First it can tell if your application is working correctly (200 response code). If error pages appear, it will tell you when. It also should be compared to transactions per second (TPS), which can be really helpful.

For example we can imagine that errors (for example code 500) can appear when test exceeds 15 TPS and it overloads our system (for example backed timeouts).

In other case our application can experience occasional error codes. This can represent temporary backend problems or web instability. This can sometimes be absolutely acceptable, but sometimes can be a prediction of future troubles. Therefore again it is really important to know why those occasional errors appear to predict if they will grow and at what scale.
If we predict that some % response codes would be errors (for example when we do some cyclic backend operation that will make our services unavailable for couple of hours) we need to write it in our SLA also.
Measuring response codes is extremely important in web applications performance tests and unfortunately often ignored. I've seen more then once a situation when developers measured performance of an application and were extremely happy of an unexpected performance grow. They didn't notice that it was caused by a big percentage of errors caused by backed malfunction. Simple report they were using (avg TPS and response time) had no indication of response code problems.

The story of TPS

Transactions per second graph (5) is the cherry on top of those graphs.

In most cases SLA covers not only response times, but also is stating how many concurrent requests is our application able to handle in any given time.
For such SLA basing on this graph we need to pick a safe number. In this example probably 8-10 TPS would be a safe pick (having in mind that its end is just caused by threads finishing up testing).
Remember to not over-advertise your service. If you guarantee 9 TPS and the customer gets 15 he will be positively surprised and might hold you in high regard for quality of service. On the other hand if you would promise 15 TPS and deliver 9 you would probably deal with lots of bug reports and customer dissatisfaction.

In our example Response Count By Type (5) gives you additionally information about Virtual Users number at the given time. You can use that and give a big ramp up time in your test: For example you can set the test to run 64 threads, and give a rump up time of 640 seconds - every 10 seconds new virtual user will be added showing you how next concurrent user affects TPS.

This graph should also be used to determine TPS stability. If the VU number is stable, is TPS stable as well? Its obvious that TPS will be jumping in some range (like +/- 5%). If we would notice some more drastic jumps like working at pace of 100 TPS but suddenly dropping to 5 TPS we would be sure that we deal with something serious (network issues, software optimization, some process killing backend from time to time).
Of course no matter of how wide the range of TPS is you need to ask yourself yet again a simple question why.

Real treasury

The real fun begins, when you create such reports systematically (for example every week on the same hour). Such test should be also run after every big (or not) software or hardware change. Having such historical data will open you a door to grand new possibilities - to observe how changes in application, infrastructure, information stored in backend and even user load affects your application performance.

The important thing here is never to be afraid to ask questions. There should never be any voodoo in good report interpretation. Every team member must know why those numbers appear and if there are errors to realize if and when they will become a problem

This report gives you great tools to detect abnormalities of your application. It also gives you realistic data for creating SLA for your clients.

This short article is only a tip of an iceberg of possibilities that good reports after performance tests brings. I hope it will be a good start for you and an inspiration to dig further.

Wednesday, February 12, 2014

Performance Tests made easy

Why performance tests?

When you are working on a project which has a web interface (like web-service, webpage or hosting server) you finally will come to a point you will need to start doing some performance tests. There are several reasons for such a necessity:

to know how much traffic you can serve without problems
to create realistic Service Level Agreement
to check how well you scale while adding new servers
to make a baseline for performance improvements

Knowing how much traffic your product can handle is very crucial in making a decision if you need to invest time in new functionality to get more users, or in scalability or performance changes to handle the users you already have.

It is always important to have performance tests before you will attempt any performance related changes. Without baseline from before the changes you will have no idea if your work changed anything. It is not rare that developers are spending weeks on something that in the end happen to be a micro-optimization (or even the product is running slower), because they didn't took something under consideration. To not fallow that scenario you need to check if your optimization idea works. To do that, you need the baseline and performance tests to run again in order to compare.
On the other hand those tests cannot take forever to create. One needs to find a way to create them quickly, so it will not delay product development. If it would take days, some teams might sacrifice the need of such tests (accepting the risks) and ignore it until too late. On the other hand if it would take up to an hour every "mentally stable" developer team would like to do it an be left on the safe side.

Creating performance tests for web interface.

While creating performance tests, you need to remember to assure some crucial properties. They must:

be easy to rerun - you should be able to run them one after another with no problem on any time of day and night. This implicates tests automatisation. It is always good to ask yourself if you could add your test script to crone on some server, and only see test results after the test run.
be configurable - such things as host address, number of threads, number of retries etc must be read from config, so you can easily change them and reuse tests on different environments.
have results easy to analyse - dumping results to file or only the simplest statistics calculated iare rarely enough. You need to model tests output in such a way, that it would explain not only what but also why. For example if request duration mean is around 100 ms, does it mean that all requests take around 90 ms - 110 ms, or are the requests around 20 ms but some of them can take even over 10 seconds and this is misleading the statistic.

This tutorial will show how to quickly create configurable, automated tests and how to visualize their results in such a way they are easy to analyse. It will take less than half an hour!

Creating a test

Let's test google! Our test will show us performance of http://www.google.pl/search?q={query} web interface As we want our test to be easily configurable, we will look at is as http://{host}/search?q={query}.

First, we need to get newest JMeter. Then we need to download zip of jmeter-plugins (standard set is OK) and copy contents of it's lib/ext to apache-jmeter-X.Y\lib\ext. Jmeter-plugins is a great set of JMeter extenstions.

Having our JMeter configured with jmeter-plugins, we run it in graphical (default) mode with a script you can find in apache-jmeter-X.Y\bin folder. First we will create a Thread Pool for our tests:

We are using ${__P(name)} tool to read values from property file, where:

${__P(test.thread.max)} - number of threads we want to use
${__P(test.thread.rampUp)} - how much time to spawn the threads
${__P(test.baseCount)} - number of tests each thread will do

Then we will add a sampler for our threads - Http Reqest:

where we use two additional configurable properties:

${__P(google.host)} - to specify tested host
${__P(google.query)} - to specify test query

Lets save our test plan as google.jmx. Then lets create google.propertes:

test.thread.max=4
test.thread.rampUp=1
test.baseCount=10
google.host=google.pl
google.query=loadosophia

Now we can run our test! Lets do it from console (after all that would be the way crone would run it each night)

java -jar ApacheJMeter.jar -n -t /path/to/google.jmx -q /path/to/google.propertes
Created the tree successfully using google.jmx
Starting the test @ Fri Oct 04 11:54:10 CEST 2013 (1380880450276)
Waiting for possible shutdown message on port 4445
Tidying up ...    @ Fri Oct 04 11:54:20 CEST 2013 (1380880460120)
... end of run

Tests are run! Lets see the results! If you have any problems You can consult files in the repo.

Analysing the results with loadosophia.org

Loadosophia is a great site, where you can upload jMeter results and receive a rich graphical report helping to analyse it. It has the Pay What You Want policy so you may test it and pay if you feel like it. All operations there are under https and signed with Your google account, so we may say it's decently safe to use.

We will cover Loadsophia in details soon, but for now You only need to get Your Upload Token and create new project in your workspace named GoogleQueryTest.

Now let's open google.jmx in jMeter graphical interface and add Loadosophia.org Uploader:

We are changing the project name to the one we've just created: GoogleQueryTest, and we need to paste in the token from Your Upload Token, I also like to specify the folder to save in results (relative to where tests are run from) and to name test after the time it was run with ${__time(yyyy-MM-dd-HH:mm:ss)}. It helps in compareson of many tests over time.
Lets save the file, and open console. Lets create the results directory and run the tests:

$ mkdir results
$ java -jar ApacheJMeter.jar -n -t /path/to/google.jmx -q /path/to/google.propertes
Created the tree successfully using google.jmx
Starting the test @ Fri Oct 04 14:25:58 CEST 2013 (1380889558228)
Waiting for possible shutdown message on port 4445
Tidying up ...    @ Fri Oct 04 14:26:22 CEST 2013 (1380889582299)
... end of run

Now, we can see our results (I've made the results public for You to see. Normally it require to be member of project to see it's results).

What's next

This tutorial gives You the basic knowledge how to create performance tests in the manner of minutes. The next post will cover the tips how to read Loadosophia report and how to use it to make Your application better.

Graphic courtesy of lasvegassportsperformance.com