CSI Java Edition - Locating & Terminating Memory Leaks

Although Java has garbage collection freeing up the programmer from memory management, it is still possible to run into situations which defeat automatic memory management, leading to a memory leak. This becomes more likely as your application leverages multiple third party libraries which may themselves have hidden bugs in them. The worst part is that this issue is most likely to manifest itself in production where you don't have access to the environment and limited support for troubleshooting the problem. So one way to mitigate the issues in this situation is to use a flag which tells the Java VM to attempt to dump the heap if it starts running out of memory. You can enable this via the:

'-XX:-HeapDumpOnOutOfMemoryError'

flag to the JVM. This will create an HPROF file if the JVM runs out of memory in the user's home directory. You may also be interested in some of the other JVM options.

The generated HPROF file contains forensic heap and cpu profiling information which can be used to look at the state of the JVM when it ran out of memory. One of several useful tools is Eclipse Memory Analyzer which can be used to glean useful information from the generated HPROF file. Load the HPROF file in the tool and you will be presented with an option to create a 'Leak Suspects Report' which presents information on which objects are taking up the most amount of space on the heap.

We used this mechanism to come across multiple instances of the following class in multiple java.util.Vector data structures:

com.sun.org.apache.xml.resolver.CatalogManager

which appeared to be taking up 300MB of the total heap by itself. Note the interesting package name which seems to indicate a mixed Apache/Sun heritage.

Anyways, googling around revealed that other people had encountered this issue. The class is actually used in JAX-WS:

http://www.java.net/forum/topic/glassfish/metro-and-jaxb/memory-leaks-consuming-web-services

The suggestion in the thread above was to use a -D flag to inhibit this class from using a static data structure. We used that and tested the application under a profiler to ensure that the heap was not growing over time. Voila - problem solved!