Monday, June 30, 2008

Java and Memory Leaks

People are often surprised that I prefer Java as the primary language for coding serious applications. The assume that it must be ignorance of other languages, enforced slavery by my employer, or simple insanity. I assure you, however, that while I have experience in programming with 15 different programming languages and I enjoy many of them, I still choose Java for doing real work.

This is because while first class functions, closures, and metaobjects are all very cool and fun, I don't think these are the important factors when writing, say, a web application that you need to scale up to lots of users. What really matters are the libraries and the tools. These will save you more time than not having to type semi-colons at the ends of lines.

An example of what I mean is memory profiling. I recently wrote a handy load testing tool for Ringlight which generates an up-to-date google sitemap by hitting every URL on the website, comparing hashes to see if it's changed, and updating the sitemap's lastmod field. There are currently around 200,000 pages and hitting them all at once is a good test of the memory cache, database responsiveness, average page load time, etc. Interestingly enough, this process causes the server to run out of memory and crash.

Excellent! The load test revealed a memory leak, just what a load test should do. If the application was in, say, python, I would do is run the code under the primitive profiler that's available. This would spit out a stats file. I could then write some code using the bstats.py library to sort the stats in various ways looking for area which are consuming lots of memory.

Luckily, the server is in Java, so I can get a 15-day trial of YourKit Java Profiler (there are free ones as well, but YourKit is the best). My code runs on the server, the user interface runs on my local machine. They automatically communicate over the network so that I can get realtime graphs of memory consumption as my app runs. I can take snapshots of the memory state, run tests scenarious, compare the snapshots to see only the memory retained between tests, drill down through paths of method calls that look suspicious, check for common memory allocation gotchas, etc. It's an excellent tool and it makes finding memory leaks easy.

In this case, the memory leak seems to be inside the Java database access layer (JDBC). It appears that this is because the MySQL JDBC driver intentionally does not garbage collect results. You must wrap the use of any ResultSet objects with a finally clause that will attempt to close them. Of course, this is just good coding style anyway and I had already done this in my applications database access methods. Unfortunately, I later decided that I didn't like the way one of these methods was called and so added an additional database access method and this time I had forgotton to add the finally clause. As this new method because more prevelant in my code, the memory leak got worse.

Of course I'm sure that you, dear readers, would never be guilty of such inconsistent coding practices. This memory leak is a result of my fast and loose coding style. Some might say that it is this style which leads me to use languages with good tools. Others I know prefer to write their own memory profilers, object graph inspectors, and even syntax style checkers from scratch. Personally, I prefer to spend my time writing applications, at least while engaged in the professional business of writing applications. When not at work, I enjoy inventing my own impractical languages as much as anyone.