Say you've got a good-sized chunk of code, in production, that doesn't always act as expected but it does so often enough that everyone's willing to keep using it (including your customers). You have a lot on your to-do list, and you keep busy enough just handling the serious meltdowns, so much so that you don't really have time to investigate that occasional failed parse-and-load, or that mysterious stack trace that's supposed to be harmless. Besides: your application does so much; statistically, it can't get everything right all the time, can it?

For an issue that isn't easily repeatable but is considered to be a significant problem, running in the debugger can be demoralizing. What if you could produce the equivalent of a "robo-debugger", a process that would run the debugger for you, continuously, and wait with infinite patience for that rare occurrence? And then have the common sense to collect information off the stack and -- gasp -- even tell you about it? If this does not sound revolutionary, then good for you. Why do humans ever sit in front of a monitor, stepping through a debugger manually anyway? We've "manualized" an operation that should be automated.

Of course, logging could do the same thing for you. My interest in this idea arose when I was supporting an application for which I had the source code, but I was not permitted to modify it. I was allowed to recompile the application with the debug switch on, however, and I was allowed to attach with a debugger. After I wrote a JDI-based monitor for this application, I realized it had one additional advantage -- you don't have to add a lot of logging statements for an issue that might only need to be debugged once. Also note that code like this could be embedded into another application (for example, it could be a VisualVM extension) and be used to generate events on demand, another reason to skip the embedded logging statements.

Here's the general approach:
  1. Ensure your target application is compiled with the -g switch.

  2. Start the targeted application as usual, but listening on a port for a debugger connection.

  3. Start your robo-debugger and attach to the target JVM.

  4. Read a list of breakpoint specifications, each of which contains the following information:

    • Class name and source line number.

    • List of variables on the stack that you want to inspect.

    • Optional message in the form of a formatted String with placeholders for said variables retrieved from the stack.

    • Optional list of key-value pairs, values again being retrieved from the stack.


  5. At each defined breakpoint, halt execution (briefly!) and generate an event, realized as a log (to file, JDBC, etc) message, JMS message, etc.

  6. Mine the event stream from your application to solve all the issues that have been nagging you since you went to production.

This post will cover everything except the last item, which is the hard part. I also won't write logging or JMS code, as that's not relevant to the discussion. My example will generate some output to stdout.

To get started, pick a target application. I'll be using an application I wrote called "JarView" (just a simple Swing application to search through a directory of .jar files to find a missing class file).

Start the target application

There are two primary transports in JPDA (Java Platform Debugger Architecture): socket-based, and shared-memory-based. I'll start my application using socket-based JPDA and (transport=dt_socket), instruct it to wait for a debugger to attach to it (server=y) and do not suspend while waiting for a connection (suspend=n):
     c:\JarView>java -agentlib:jdwp=transport=dt_socket,server=y,suspend=n -cp jarview.jar JarView
You'll see a launch message like the following:
     Listening for transport dt_socket at address: 50069
Attach to the target

Write a program to
  1. Use the JDI Bootstrap class to get an instance of a VirtualMachineManager.

  2. Iterate over the VirtualMachineManager's list of AttachingConnectors until you find a connector supporting transport dt_socket.

  3. Get the port Connector.Argument of the AttachingConnector and set it to the port on which your target application is listening.

  4. Attach to the AttachingConnector and get an instance of a VirtualMachine.
An example piece of code (bare minimum, with no special exception handling) that will perform the above steps follows. You will need to compile and run with the JDK's lib/tools.jar on the classpath (this is not found in the JRE, by the way).
import java.util.List;
import java.util.Map;
import com.sun.jdi.Bootstrap;
import com.sun.jdi.VirtualMachine;
import com.sun.jdi.VirtualMachineManager;
import com.sun.jdi.connect.AttachingConnector;
import com.sun.jdi.connect.Connector;

public class JDIDemo
{
public static void main(String[] args) throws Exception
{
VirtualMachineManager vmMgr = Bootstrap.virtualMachineManager();
AttachingConnector socketConnector = null;
List attachingConnectors = vmMgr.attachingConnectors();
for (AttachingConnector ac: attachingConnectors)
{
if (ac.transport().name().equals("dt_socket"))
{
socketConnector = ac;
break;
}
}
if (socketConnector != null)
{
Map paramsMap = socketConnector.defaultArguments();
Connector.IntegerArgument portArg = (Connector.IntegerArgument)paramsMap.get("port");
portArg.setValue(Integer.parseInt(args[0]));
VirtualMachine vm = socketConnector.attach(paramsMap);
System.out.println("Attached to process '" + vm.name() + "'");
}
}
}
It is a lot easier to get a Connector.Argument from an existing data structure (as above) than it is to create one from scratch. Also note, there are very few (if any) constructors in this API; just about every reference you get is retrieved ultimately by going through the Bootstrap class and working your way into the API. In my example, there were 3 AttachingConnectors, representing transports dt_socket, dt_shmem, and local. When I run the above example, I see the following output:
    Attached to process 'Java HotSpot(TM) 64-Bit Server VM'
Note that when this program exits, the target VM changes the port on which it is listening, something you should remember if you run again. I don't remember this behavior on Java 5, but it has been a while since I've written a JDI application.

Pause at a breakpoint and generate an event

To conclude this post in a reasonable length, I will just pick a line of code in my target that I know well and give some example code that will pick a variable off the stack and output it to stdout. The details of logging or sending a JMS message aren't really relevant to this topic.

For this example, I want to break at line 863, where I'm about to add the name of a file to my Swing table. This is a file whose name at least partially matches an input class name. Below is a segment of the source:
849:    if (fullName.lastIndexOf("/") > -1)
850: {
851: directoryName = fullName.substring(0, fullName.lastIndexOf("/"));
852: fileName = fullName.substring(directoryName.length()+1, fullName.length());
853: }
854: else
855: {
856: fileName = fullName;
857: }
858: if (fileName.indexOf(searchForTextField.getText()) > -1)
859: {
860: Vector nextRow = new Vector();
861: nextRow.add(archive.getAbsolutePath());
862: nextRow.add(fileName);
863: rowData.add(nextRow);
864: }

I'd like to print a short message at line 863 which outputs the value of fileName.

How do you specify a breakpoint in JDI? You have to know what you're asking for. Normally you would look for a class, maybe a method, and a line number. My target application is a Swing application with a lot of anonymous inner classes, so rather than figure out which one is the one I want, I'm just going to search on line number. You probably want to call a constructor to create a breakpoint for a line number, but there is no constructor; you'll have to search through a lot of metadata and "find" the description of this line of code, then request a breakpoint using that description and a factory method in the EventRequestManager. To make a long story somewhat shorter:
  1. Get a list of all classes (as ReferenceTypes).

  2. For each class, get all line locations (Location).

  3. At line location corresponding to line 863, break out of the search loop.

  4. Get an instance of the EventRequestManager from the VirtualMachine.

  5. Create a BreakpointRequest in the EventRequestManager, using the Location object for line 863.

  6. Get the EventQueue instance from the VirtualMachine.

  7. Create a while(true) loop on the EventQueue, calling its remove() method.

  8. For each EventSet removed from the queue, process each Event.

  9. For each Event, check to see if it is a BreakpointEvent, and if the line number matches the breakpoint we're interested in, process the Event further.

  10. For a matching Event, get the top element of the StackFrame, get all visible variables on the StackFrame element, find the one whose name matches the variable you are looking for, and if so, dig through the API for the correct chain of method calls to extract its value.
This is probably easier shown with code. Below is an updated version of the first cut of the example code (note: please refactor out of main for a real application!):
import java.util.List;
import java.util.Map;
import com.sun.jdi.AbsentInformationException;
import com.sun.jdi.Bootstrap;
import com.sun.jdi.LocalVariable;
import com.sun.jdi.Location;
import com.sun.jdi.ReferenceType;
import com.sun.jdi.StackFrame;
import com.sun.jdi.StringReference;
import com.sun.jdi.ThreadReference;
import com.sun.jdi.Value;
import com.sun.jdi.VirtualMachine;
import com.sun.jdi.VirtualMachineManager;
import com.sun.jdi.connect.AttachingConnector;
import com.sun.jdi.connect.Connector;
import com.sun.jdi.event.BreakpointEvent;
import com.sun.jdi.event.Event;
import com.sun.jdi.event.EventIterator;
import com.sun.jdi.event.EventQueue;
import com.sun.jdi.event.EventSet;
import com.sun.jdi.request.BreakpointRequest;
import com.sun.jdi.request.EventRequest;
import com.sun.jdi.request.EventRequestManager;

public class JDIDemo
{
public static void main(String[] args) throws Exception
{
if (args.length != 3)
{
System.out.println("Usage: java JDIDemo debugPortNumber sourceLineNumber variableName");
System.exit(-1);
}
int debugPort = Integer.parseInt(args[0]);
int lineNumber = Integer.parseInt(args[1]);
String varName = args[2];

VirtualMachineManager vmMgr = Bootstrap.virtualMachineManager();
AttachingConnector socketConnector = null;
List attachingConnectors = vmMgr.attachingConnectors();
for (AttachingConnector ac: attachingConnectors)
{
if (ac.transport().name().equals("dt_socket"))
{
socketConnector = ac;
break;
}
}

if (socketConnector != null)
{
Map paramsMap = socketConnector.defaultArguments();
Connector.IntegerArgument portArg = (Connector.IntegerArgument)paramsMap.get("port");
portArg.setValue(debugPort);
VirtualMachine vm = socketConnector.attach(paramsMap);
System.out.println("Attached to process '" + vm.name() + "'");

List refTypes = vm.allClasses();
Location breakpointLocation = null;
for (ReferenceType refType: refTypes)
{
if (breakpointLocation != null)
{
break;
}
List locs = refType.allLineLocations();
for (Location loc: locs)
{
if (loc.lineNumber() == lineNumber)
{
breakpointLocation = loc;
break;
}
}
}

if (breakpointLocation != null)
{
EventRequestManager evtReqMgr = vm.eventRequestManager();
BreakpointRequest bReq = evtReqMgr.createBreakpointRequest(breakpointLocation);
bReq.setSuspendPolicy(BreakpointRequest.SUSPEND_ALL);
bReq.enable();
EventQueue evtQueue = vm.eventQueue();
while(true)
{
EventSet evtSet = evtQueue.remove();
EventIterator evtIter = evtSet.eventIterator();
while (evtIter.hasNext())
{
try
{
Event evt = evtIter.next();
EventRequest evtReq = evt.request();
if (evtReq instanceof BreakpointRequest)
{
BreakpointRequest bpReq = (BreakpointRequest)evtReq;
if (bpReq.location().lineNumber() == lineNumber)
{
System.out.println("Breakpoint at line " + lineNumber + ": ");
BreakpointEvent brEvt = (BreakpointEvent)evt;
ThreadReference threadRef = brEvt.thread();
StackFrame stackFrame = threadRef.frame(0);
List visVars = stackFrame.visibleVariables();
for (LocalVariable visibleVar: visVars)
{
if (visibleVar.name().equals(varName))
{
Value val = stackFrame.getValue(visibleVar);
if (val instanceof StringReference)
{
String varNameValue = ((StringReference)val).value();
System.out.println(varName + " = '" + varNameValue + "'");
}
}
}
}
}
}
catch (AbsentInformationException aie)
{
System.out.println("AbsentInformationException: did you compile your target application with -g option?");
}
catch (Exception exc)
{
System.out.println(exc.getClass().getName() + ": " + exc.getMessage());
}
finally
{
evtSet.resume();
}
}
}
}

}
}
}
When I run this application with a command line like:
     java -cp c:\jdk1.6.0_20\lib\tools.jar;. JDIDemo 56485 863 fileName
I get the following output:
    Attached to process 'Java HotSpot(TM) 64-Bit Server VM'
Breakpoint at line 863:
fileName = 'BreakpointEvent.class'
Breakpoint at line 863:
fileName = 'EventSetImpl$BreakpointEventImpl.class'
Pointers

You might have noticed the line above referencing the AbsentInformationException. You will get this if your target application has not been compiled with the debug (-g) switch. If you cannot compile the code with the debug switch, you will be able to set a breakpoint, but there won't be any information available on the stack when you get there.

Some JDI operations are more expensive than others. The last time I wrote a JDI application, I noticed that "method-entry" and "method-exit" breakpoints were enormously more expensive than simple line breakpoints. Now that I have a working example, I'll investigate these issues in a later post to see how things are in the current update of Java 6.

For nearly two years, I've been trying to branch out and add another programming language to my brain.  I read and blogged about Seven Languages in Seven Weeks, by Brian Tate, an excellent book that I blasted through in seven days to save a little time.  If you read my blog, you'll know that I finally settled on Haskell, started posting about my experience as an object-oriented programmer writing in a functional language, and then things kind of fizzled out.

I really like Haskell.  However, I think I'm one of those people who tend to learn better when under pressure.  Since I didn't have a job requirement to learn Haskell or an otherwise motivating situation, I never really quite got in to it.  I still plan to, some day.

But, I have finally picked the "new" language I want to learn, and that is R (I say "new" because of course R is not a new language).  I had a number of reasons to do so:
  • Big Data is all the buzzword-rage right now, and R figures prominently in many big-data scenarios.
  • I'm taking MOOCs at coursera, and the ones I'm taking use R as the programming platform, ensuring that I must have more than a superficial understanding of the language.  I had actually looked at R once before and never stuck with it for the same reasons I did not stick with Haskell -- no looming deadlines!
  • As I learn more about R, I become more impressed by how handily it performs tasks that require a lot of boilerplate code in any other language I've used, so that experience provides me more motivation to keep learning.
  • I am currently working at a bank, and I'm already starting to use R not only to greatly speed up some tasks that I need to perform, but also to perform analyses that would have required so much Java code that they would have gone on the "back burner."
I'm also happy to report there has been some convergence, for me, among big data, R, Haskell and my recent exposure to functional programming.  R is an interesting language.  I don't have an especially formal computer-science background (instead, I'm from physics, math, and electrical engineering), so I probably would not be the best person to articulate how R checks (and does not check) boxes for functional and object-oriented languages.  But all that Haskell investigation helped a lot when I started learning MapReduce, and seeing functional features in R that also fit well into the MapReduce paradigm makes me feel - as all curious types should - that all that investigation was worthwhile.

I'll still blog about Java occasionally, but my posts for the near future will be focused on my self-training to fill in gaps in my skill set related to big data.  I have started a new blog on this topic, called Data Scientist in Training.  If you read me on DZone, you don't have to do much to find me, as my posts from both blogs will continue to find their way to DZone (the big-data posts go to a microzone called Big Data/BI Zone).  If you read me directly on Blogger, then please bookmark the link above if you're interested in what I'm doing.  At the least, please check out my Welcome! post, where I explain my path and reference some resources that you, too, may want to check out in the event that you want to learn more about big data, too.

My posts about R on Data Scientist in Training will not explicitly say anything in the title like "Java developer struggles with R data frames", but it will still be obvious that my approach to R is that of a developer who has used Java for about 90% of his coding for the last 15 years.  If you're a Java developer and are learning R, I hope there will be some content there of special use to you.  As I've searched online while learning R, I've noticed helpful responders trying to explain how to move from the "use a for-loop to iterate and then build your model in rows" approach to "use a mapping function to create your new column of data, then add it to your data frame".  (In fact, this reminds me of another feature I like about R -- R data frames remind me of tables in the column-oriented databases used extensively in big data).  I'm going to blog in near-real-time so I don't forget those dead ends I encountered as I was trying to map Java onto R, and that perspective is the one I think will be most helpful to fellow Java/OO developers.

There are a few posts on Data Scientist in Training already.  The next one will be specifically about R -- I hope you check it out when it arrives!












About Me
About Me
My Photo
I'm a software architect/consultant in Boulder, Colorado.
Picture
Picture
Blog Archive
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.