Sunday, June 2, 2013

Short jhat tutorial: diagnosing OutOfMemoryError by example

Last time we've learned what can be the reason of OutOfMemoryErrors and what tool-chain can we use to diagnose it. Today we will learn by example how to use the most important one: jhat.
I've prepared a sample project for this exercise, which can be cloned from github. The code is really simple and its  problem is obvious, but this simplicity will make it easy to get to know jhat.
First, we need to run our program. We will use small heap size, for two reasons:
  1. Program will throw the exception faster
  2. jhat will start more quickly, as the heap dump will be smaller
$ git clone https://github.com/petermodzelewski/snippets.git
$ cd snippets/OutOfMemoryErrorGenerator/
$ mvn package
$ java -Xmx128m -Xms128m -jar target/OutOfMemoryErrorGenerator-1.0.jar
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at pl.keyer.oome.generator.Sleepyhead(Sleepyhead.java:6)
        at pl.keyer.oome.generator.App.main(App.java:11)

We can notice, that the program is still running. We will need another console to run jhat. We are using the following commands:
$ jps -l
752 target/OutOfMemoryErrorGenerator-1.0.jar
4480 sun.tools.jps.Jps
$ jmap -dump:file=dump.map 752
$ jhat -port 7401 dump.map
Reading from dump.map...
Dump file created Sat Jun 01 23:25:55 CEST 2013
Snapshot read, resolving...
Resolving 561438 objects...
Chasing references, expect 112 dots................................................................................................................
Eliminating duplicate references................................................................................................................
Snapshot resolved.
Started HTTP server on port 7401
Server is ready.

Important notes about that process:
  • All commands must be executed by the same user: the java process owner
  • The "expect X dots" message is not a joke. While processing bigger heap dumps one can check the number of dots there in editor to see the progress, as it can take quite a while to process such a file.
  • When processing bigger dumps one must watch heap size of jhat itself. This depends on the case, but to be safe (provided with enough resources) jhat should have 2-4 times more heap size, than process heap it will diagnose. If memory size for jhat is too small it will just crush after using it and the process will need to be repeated with bigger amount of memory. For example to provide jhat with 4 gigs the command will be:
    $ jhat -port 7401 -J-mx4G dump.map
  • Diagnosed process may be terminated after dumping heap with jmap. 
  • Obviously jhat can be run on any machine where the dump will be present. On many occasions developers choose to zip the dump and move the debug process to machine more accessible for them and with enough ram. 
After executing the commands we can visit http://localhost:7401/

When facing with jhat You will quickly realize that this tool is from times, where such tools were designed without consideration of prettiness or usability. This tutorial will show how to navigate it in most cases - all it's features are cool and necessary, but everyday programmer will use only subset of them to quickly diagnose where the OOME came from.
jhat main page can be divided into sections:

  1. List of all classes in your program (excluding platform - that is, all that is not from standard library). This list is normally really long and in most cases it is not necessary. Normally You will scroll down to "Other Queries" section right away. 
  2. More options for listing classes
  3. Bread and butter of memory debugging, we will use them in a moment
  4. More tools for debugging, but not as helpfull as section 3.
    • Heap histogram is sometimes useful to compare quantity vs size of objects
    • When you become jhat ninja, you sometimes could use OQL to diagnose the application. It is a SQL-like language for searching heap and calculating it's statistics. 
Firstly, lets see heap histogram.


This view illustrates very important fact, that jhat does not compute "deep" object size. That's why on top of our process' memory consumption histogram we see class [B which is really an array of bytes: byte[]. Array of bytes is often on top of such histogram and what's more - in most cases it doesn't mean that there is a problem. For example, if your program processes lot's of strings it will naturally have lot of byte arrays, as each string object contains reference to some byte array. Similar problem will manifest when we follow the "Show instance counts for all classes (including platform)".


That view is very similar to histogram sorted by quantity. Normally during OOMR we will look for "our" classes. We will need to exclude platform classes to easily detect abnormalities in our classes quantity (or size). A good start is to follow "Show instance counts for all classes (excluding platform)".

Things to look: unnaturally big numbers of objects of some class. In our example the number of workers. Our example illustrates the common problem, when producer creates tasks faster, than consumer handles them, and the producer - consumer queue will not block after limit is reached. 
Unfortunately in most cases it is not as easy. Diagnosing that objects of some class are eating too much memory is one thing. Diagnosing why and where are they allocated is another. To do so, we need to track the objects references to something "active", for example thread, controller, etc. When given that "active" object we can than analyze the algorithm and find out why so many objects are created. 
To illustrate such process lets track class [B references.



jhat enables going through references as long as one needs. Unfortunately when you click on class "page" it will display all it's instances. To dodge that time consuming view, you can copy link of the class page (for example from histogram view) and construct reference by Type view link. For example class [B page can have the following url:
http://localhost:7401/class/0x3881d790
so the reference by type summery will have url as follow:
http://localhost:7401/refsByType/0x3881d790

Those methods normally are all you need to detect the memory trouble in the process. Additionally in case you are working with application container (like tomcat) and having problems to see what application (or pool) is leaking the objects, you should diagnose the Classloader section of class page:


That's it. All you need to know about jhat to start your own memory problem debug. Hope it'll help.

One last tip: Many developers after solving their memory problem while running new application version, are taking heap dump and running jhat with it - just in case. They are often terrified that despite their efforts object count is still grand but somehow OOME is not appearing. Of course they were so focused on fighting the memory leakage that they've forget how GC works. Remember: Always trigger GC collection (for example with visual vm) to clean old gen from unnecessary objects before taking the dump and analyzing it with jhat.