A few years ago I wrote a little app named ‘exception log introspect’ (ELI) that sent a signal to a running java process to obtain the stack trace.
I then analyzed the stack trace to find threads with the exact same fingerprint and group duplicates. Then I would score threads by number of duplicates and print them by descending score.
I have since updated it for Java 1.5’s jstack to get a running snapshot of the threads in a VM.
The first pass of ELI assumed that blocked threads would have identical stack traces (which is correct). It would then print out the following result:
-------------
1. References: 105
-------------
java.lang.Thread.State: BLOCKED (on object monitor)
at blogindex.robot.ObjectQueueManager.getNextXToProcess(ObjectQueueManager.java:73)
- waiting to lock (a blogindex.robot.RemoteFeedQueue)
at blogindex.robot.TaskThread.doProcessTask(TaskThread.java:303)
at blogindex.robot.TaskThread.run(TaskThread.java:107)
This is our robot code blocked on our queue server waiting for work.
This wasn’t very helpful because what I REALLY care about is the thread holding the lock named 0×00002aaab5450fb8.
… after thinking about this for a while I realized that this was just a graph clustering problem.
Threads are nodes, with lock waits to lock holds being the edges between nodes.
… so I applied a trivial graph clustering algorithm to this problem and I now have ELI render the following tree:
-------------
1. References: 105
-------------
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
- locked (a java.io.BufferedInputStream)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1000)
- locked (a sun.net.www.protocol.http.HttpURLConnection)
at org.apache.xmlrpc.DefaultXmlRpcTransport.sendXmlRpc(DefaultXmlRpcTransport.java:87)
at org.apache.xmlrpc.XmlRpcClientWorker.execute(XmlRpcClientWorker.java:72)
at org.apache.xmlrpc.XmlRpcClient.execute(XmlRpcClient.java:194)
at org.apache.xmlrpc.XmlRpcClient.execute(XmlRpcClient.java:185)
at org.apache.xmlrpc.XmlRpcClient.execute(XmlRpcClient.java:178)
at blogindex.queue.RemoteQueue.refresh1(RemoteQueue.java:138)
at blogindex.queue.RemoteQueue.refresh(RemoteQueue.java:73)
at blogindex.robot.ObjectQueueManager.getNextXToProcess(ObjectQueueManager.java:84)
- locked (a blogindex.robot.RemoteFeedQueue)
at blogindex.robot.TaskThread.doProcessTask(TaskThread.java:303)
at blogindex.robot.TaskThread.run(TaskThread.java:107)
105 blocked threads:
--------
java.lang.Thread.State: BLOCKED (on object monitor)
at blogindex.robot.ObjectQueueManager.getNextXToProcess(ObjectQueueManager.java:73)
- waiting to lock (a blogindex.robot.RemoteFeedQueue)
at blogindex.robot.TaskThread.doProcessTask(TaskThread.java:303)
at blogindex.robot.TaskThread.run(TaskThread.java:107)
This is much more manageable! Now I get to see the thread that’s blocking out these 105 threads so I can try (hopefully) to make that code faster.
What’s next?
It would be nice to migrate ELI to use JMX and become more of a Java ‘top’-style utility.












July 7, 2008 at 1:33 pm
We have something similar in the internal diagnostics dump of our application (but via JMX), and one of the guys on the team took it a step further and wrote out the thread information from JMX into .dot form, which makes it *very* easy to visualize who’s waiting on what, once you bring up the file in graphviz.
July 7, 2008 at 1:53 pm
Hey Mark.
If you could post more details I’d appreciate it…..
graphviz is a bit primitive but it’s starting to become a bit better for graph visualization in recent years.
If you could publish any of this code I’d appreciate it…
I was thinking of diving into JMX to see what I could write with it……
Seems like there is a lot of potential here.