This page contains miscellaneous tips for how to debug Impala.
Diagnosing classpath issues
If you have a class getting loaded that appears to be the wrong version, add:
to your JVM startup flags. (And get ready for a lot of output). Grep for the class you're looking for, and the JVM will tell you the originating jar once it's been loaded.
If you are using JNI, add this to
JAVA_TOOL_OPTIONS. If Maven, set
Debugging a JVM hosted in a C binary
When the Impala cluster is started using the
start-impala-cluster.py script all services that use an embedded JVM are launched with open debug ports that allow to conveniently attach using JDB or Eclipse. The convention for the ports is as follows:
- Impalad - debug port is 30000 + x , where x is the running number of impalads started. If 3 daemons are started, the ports range from 3000[0-2].
- Catalogd - debug port is 30030.
As soon as the JVM is started, it is possible to connect using the above ports. The simplest debugger is
(jdb is not too convenient by default, so it is recommended to wrap jdb in the rlwrap command, e.g.: alias jdb='rlwrap jdb')
Using Eclipse to attach to a running Impalad JVM
From the Eclipse UI do:
- Run->Debug Configurations
- In the dialog pane, add (if it doesn't exist) a new "Remote Java Application" with the following fields:
- Connection Type: Standard (Socket Attach)
- Host: localhost
- Port: 3000[0-2] (for a cluster of 3 nodes)
- Click on Debug
Additional jvm options can be apssed to the
start-impala-cluster.py script using the
--jvm_args flag. If it is required to attach the debugger at startup time, use the command below:
JAVA_TOOL_OPTIONS in the environment of the C process as follows:
Debugging a C library called from a JVM
The easiest thing to do is to arrange for a sleep to occur once the C library is called, long enough for you to call
gdb on the Java process id and set a break point. Ugly, but effective.
Verbose impalad logging can be enabled by:
The backend/frontend language split makes roundtrip debugging a bit tricky. To debug things in the backend, use your favorite C++ debugger (e.g. gdb). The front end can be debugged using eclipse. If you need to debug an issue that starts in the frontend but fails in the backend you can start the frontend in eclipse and breakpoint it. Then using jps find the
RemoteTestRunner process. You can then attach to this process using the C++ debugger. Note that the JVM will generate segmentation faults that you should just continue from. It always generates one at startup and sometimes others, apparently randomly. More recently we are seeing that there are always two segmentation faults at startup and then it will hang. If you interrupt it (^C) and then continue you can debug.
It is best to just have gdb ignore the SIGSEGV traps and let java handle them. This gdb command will do that:
Note that after you do this, sometimes gdb will be unable to evaluate functions (it will say something like "The program being debugged was signaled while in a function called from GDB."). To run functions from gdb while the program is halted, undo the above command:
You can the issue the nostop command again before you continue running the program.
As an alternative you can insert the following lines into your .gdbinit to have gdb automatically switch between both modes:
The follow bash function is useful for getting the stack traces of all the threads (there are usually plenty of them).
Saving common GDB settings
Common commands (such as ignoring SIGSEGV traps) can be added to your .gdbinit file. GDB looks for this file in the current directory and also in your home directory. Note: GDB will only load the .gdbinit file if the file is owned by the current user. An example gdbinit looks like:
Additional .gdbinit fun can be had by using some of the "pretty printer" functions available for STL/Boost libraries. These are each added directly into the .gdbinit file as functions, or (with GDB v7+) using Python extensions. This helps to output data structures in a much easier to read format. Some examples are:
STL via Python
Boost pretty printers for GDB
Debugging on a cluster machine
Debugging ImpalaD and State Store on a cluster requires first setting up environment variables like JAVA_HOME, LD_LIBRARY_PATH, etc. This is already being done by existing wrapper scripts under /usr/bin/impalad. The easiest way to get started is to copy one of those scripts locally (ex ~/start-impalad-gdb.sh) then modify it to look like this:
Since Impalad take a large number of arguments it is easier to save this in a file and pass them to the GDB start script. This way an ImpalaD debug session can then be started by simply doing:
Resolving ascii stack traces
Certain bad status' will result in a stack trace being logged. This, on release, shows up without symbol information. It looks something like:
We can resolve those symbols manually in gdb. I've setup the binary and source on c1419 (/home/nong/beta-binary)
TODO: can we script this?