This page contains miscellaneous tips for how to debug Impala.
Diagnosing classpath issues
If you have a class getting loaded that appears to be the wrong version, add:
-verbose:class
to your JVM startup flags. (And get ready for a lot of output). Grep for the class you're looking for, and the JVM will tell you the originating jar once it's been loaded.
If you are using JNI, add this to JAVA_TOOL_OPTIONS
. If Maven, set MAVEN_OPTS
.
Debugging a JVM hosted in a C binary
When the Impala cluster is started using the start-impala-cluster.py
script all services that use an embedded JVM are launched with open debug ports that allow to conveniently attach using JDB or Eclipse. The convention for the ports is as follows:
- Impalad - debug port is 30000 + x , where x is the running number of impalads started. If 3 daemons are started, the ports range from 3000[0-2].
- Catalogd - debug port is 30030.
As soon as the JVM is started, it is possible to connect using the above ports. The simplest debugger is jdb
:
(jdb is not too convenient by default, so it is recommended to wrap jdb in the rlwrap command, e.g.: alias jdb='rlwrap jdb')
jdb -attach localhost:30030
Using Eclipse to attach to a running Impalad JVM
From the Eclipse UI do:
- Run->Debug Configurations
- In the dialog pane, add (if it doesn't exist) a new "Remote Java Application" with the following fields:
- Connection Type: Standard (Socket Attach)
- Host: localhost
- Port: 3000[0-2] (for a cluster of 3 nodes)
- Click on Debug
Additional jvm options can be apssed to the start-impala-cluster.py
script using the --jvm_args
flag. If it is required to attach the debugger at startup time, use the command below:
Set JAVA_TOOL_OPTIONS
in the environment of the C process as follows:
JAVA_TOOL_OPTIONS="-agentlib:jdwp=transport=dt_socket,address=localhost:9009,server=y,suspend=y -Xcheck:jni"
Debugging a C library called from a JVM
The easiest thing to do is to arrange for a sleep to occur once the C library is called, long enough for you to call gdb
on the Java process id and set a break point. Ugly, but effective.
Log Verbosity
Verbose impalad logging can be enabled by:
export GLOG_v=2
Using GDB
The backend/frontend language split makes roundtrip debugging a bit tricky. To debug things in the backend, use your favorite C++ debugger (e.g. gdb). The front end can be debugged using eclipse. If you need to debug an issue that starts in the frontend but fails in the backend you can start the frontend in eclipse and breakpoint it. Then using jps find the RemoteTestRunner
process. You can then attach to this process using the C++ debugger. Note that the JVM will generate segmentation faults that you should just continue from. It always generates one at startup and sometimes others, apparently randomly. More recently we are seeing that there are always two segmentation faults at startup and then it will hang. If you interrupt it (^C) and then continue you can debug.
It is best to just have gdb ignore the SIGSEGV traps and let java handle them. This gdb command will do that:
handle SIGSEGV nostop noprint pass
Note that after you do this, sometimes gdb will be unable to evaluate functions (it will say something like "The program being debugged was signaled while in a function called from GDB."). To run functions from gdb while the program is halted, undo the above command:
handle SIGSEGV stop print pass
You can the issue the nostop command again before you continue running the program.
As an alternative you can insert the following lines into your .gdbinit to have gdb automatically switch between both modes:
define hook-stop handle SIGSEGV stop print pass end define hook-run handle SIGSEGV nostop noprint pass end define hook-continue handle SIGSEGV nostop noprint pass end
The follow bash function is useful for getting the stack traces of all the threads (there are usually plenty of them).
stack() { gdb --batch --quiet -ex "thread apply all bt full" -ex "quit" $1 $2 }
Saving common GDB settings
Common commands (such as ignoring SIGSEGV traps) can be added to your .gdbinit file. GDB looks for this file in the current directory and also in your home directory. Note: GDB will only load the .gdbinit file if the file is owned by the current user. An example gdbinit looks like:
echo Loading gdbinit\n # Used to get full backtrace of all threads define bt-thread thread apply all bt full end # These help improve GDB output, but make the output take up more screen spaace. set print pretty on set print array on # By default this will log to 'gdb.log' set logging on #set logging file your-gdb-logfile.log handle SIGSEGV nostop noprint pass
Additional .gdbinit fun can be had by using some of the "pretty printer" functions available for STL/Boost libraries. These are each added directly into the .gdbinit file as functions, or (with GDB v7+) using Python extensions. This helps to output data structures in a much easier to read format. Some examples are:
STL support
STL via Python
Boost pretty printers for GDB
Debugging on a cluster machine
Debugging ImpalaD and State Store on a cluster requires first setting up environment variables like JAVA_HOME, LD_LIBRARY_PATH, etc. This is already being done by existing wrapper scripts under /usr/bin/impalad. The easiest way to get started is to copy one of those scripts locally (ex ~/start-impalad-gdb.sh) then modify it to look like this:
# exec $IMPALA_BIN/impalad "$@" gdb --args $IMPALA_BIN/impalad "$@"
Since Impalad take a large number of arguments it is easier to save this in a file and pass them to the GDB start script. This way an ImpalaD debug session can then be started by simply doing:
./start-impalad-gdb.sh `cat impalad-arg-file`
Resolving ascii stack traces
Certain bad status' will result in a stack trace being logged. This, on release, shows up without symbol information. It looks something like:
I1105 20:19:56.888319 16516 status.cc:24] Bad sync hash @ 0x75ea41 (unknown) @ 0x830da0 (unknown) @ 0x831156 (unknown) @ 0x83216d (unknown) @ 0x8335aa (unknown) @ 0x818a07 (unknown) @ 0x7f12d7489d97 (unknown) @ 0x375b2077f1 (unknown) @ 0x375aee592d (unknown)
We can resolve those symbols manually in gdb. I've setup the binary and source on c1419 (/home/nong/beta-binary)
[nong@c1419 binaries]$ gdb usr/lib/impala/sbin-retail/impalad GNU gdb (GDB) Red Hat Enterprise Linux (7.2-50.el6) Copyright (C) 2010 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /home/nong/beta-binary/impala-0.1-SNAPSHOT/binaries/usr/lib/impala/sbin-retail/impalad...Missing separate debuginfo for /home/nong/beta-binary/impala-0.1-SNAPSHOT/binaries/usr/lib/impala/sbin-retail/impalad Try: yum --disablerepo='*' --enablerepo='*-debuginfo' install /usr/lib/debug/.build-id/d1/f60b313fc3e45912ad919c51cb744dbe1a9399.debug (no debugging symbols found)...done. (gdb) set debug-file-directory usr/lib/debug/usr/lib/impala/ (gdb) info sym 0x75ea41 impala::Status::Status(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 193 in section .text (gdb) info sym 0x830da0 impala::HdfsSequenceScanner::ReadCompressedBlock() + 1296 in section .text (gdb) info sym 0x831156 impala::HdfsSequenceScanner::ProcessBlockCompressedScanRange() + 118 in section .text (gdb) info sym 0x83216d impala::HdfsSequenceScanner::ProcessRange() + 925 in section .text
TODO: can we script this?