Impala Debugging Tips

This page contains miscellaneous tips for how to debug Impala.

Diagnosing classpath issues

If you have a class getting loaded that appears to be the wrong version, add:

-verbose:class

to your JVM startup flags. (And get ready for a lot of output). Grep for the class you're looking for, and the JVM will tell you the originating jar once it's been loaded.

If you are using JNI, add this to JAVA_TOOL_OPTIONS. If Maven, set MAVEN_OPTS.

Debugging a JVM hosted in a C binary

When the Impala cluster is started using the start-impala-cluster.py script all services that use an embedded JVM are launched with open debug ports that allow to conveniently attach using JDB or Eclipse. The convention for the ports is as follows:

Impalad - debug port is 30000 + x , where x is the running number of impalads started. If 3 daemons are started, the ports range from 3000[0-2].
Catalogd - debug port is 30030.

As soon as the JVM is started, it is possible to connect using the above ports. The simplest debugger is jdb:

(jdb is not too convenient by default, so it is recommended to wrap jdb in the rlwrap command, e.g.: alias jdb='rlwrap jdb')

jdb -attach localhost:30030

Using Eclipse to attach to a running Impalad JVM

From the Eclipse UI do:

Run->Debug Configurations
In the dialog pane, add (if it doesn't exist) a new "Remote Java Application" with the following fields:
1. Connection Type: Standard (Socket Attach)
2. Host: localhost
3. Port: 3000[0-2] (for a cluster of 3 nodes)
Click on Debug

Additional jvm options can be apssed to the start-impala-cluster.py script using the --jvm_args flag. If it is required to attach the debugger at startup time, use the command below:

Set JAVA_TOOL_OPTIONS in the environment of the C process as follows:

JAVA_TOOL_OPTIONS="-agentlib:jdwp=transport=dt_socket,address=localhost:9009,server=y,suspend=y -Xcheck:jni"

Debugging a C library called from a JVM

The easiest thing to do is to arrange for a sleep to occur once the C library is called, long enough for you to call gdb on the Java process id and set a break point. Ugly, but effective.

Log Verbosity

Verbose impalad logging can be enabled by:

export GLOG_v=2

Using GDB

The backend/frontend language split makes roundtrip debugging a bit tricky. To debug things in the backend, use your favorite C++ debugger (e.g. gdb). The front end can be debugged using eclipse. If you need to debug an issue that starts in the frontend but fails in the backend you can start the frontend in eclipse and breakpoint it. Then using jps find the RemoteTestRunner process. You can then attach to this process using the C++ debugger. Note that the JVM will generate segmentation faults that you should just continue from. It always generates one at startup and sometimes others, apparently randomly. More recently we are seeing that there are always two segmentation faults at startup and then it will hang. If you interrupt it (^C) and then continue you can debug.

It is best to just have gdb ignore the SIGSEGV traps and let java handle them. This gdb command will do that:

handle SIGSEGV nostop noprint pass

Note that after you do this, sometimes gdb will be unable to evaluate functions (it will say something like "The program being debugged was signaled while in a function called from GDB."). To run functions from gdb while the program is halted, undo the above command:

handle SIGSEGV stop print pass

You can the issue the nostop command again before you continue running the program.

As an alternative you can insert the following lines into your .gdbinit to have gdb automatically switch between both modes:

.gdbinit

define hook-stop
  handle SIGSEGV stop print pass
end
define hook-run
  handle SIGSEGV nostop noprint pass
end
define hook-continue
  handle SIGSEGV nostop noprint pass
end

The follow bash function is useful for getting the stack traces of all the threads (there are usually plenty of them).

stack() {
gdb --batch --quiet -ex "thread apply all bt full" -ex "quit" $1 $2
}

Saving common GDB settings

Common commands (such as ignoring SIGSEGV traps) can be added to your .gdbinit file. GDB looks for this file in the current directory and also in your home directory. Note: GDB will only load the .gdbinit file if the file is owned by the current user. An example gdbinit looks like:

echo Loading gdbinit\n

# Used to get full backtrace of all threads
define bt-thread
  thread apply all bt full
end

# These help improve GDB output, but make the output take up more screen spaace.
set print pretty on
set print array on

# By default this will log to 'gdb.log'
set logging on
#set logging file your-gdb-logfile.log

handle SIGSEGV nostop noprint pass

Additional .gdbinit fun can be had by using some of the "pretty printer" functions available for STL/Boost libraries. These are each added directly into the .gdbinit file as functions, or (with GDB v7+) using Python extensions. This helps to output data structures in a much easier to read format. Some examples are:
STL support
STL via Python
Boost pretty printers for GDB

Debugging on a cluster machine

Debugging ImpalaD and State Store on a cluster requires first setting up environment variables like JAVA_HOME, LD_LIBRARY_PATH, etc. This is already being done by existing wrapper scripts under /usr/bin/impalad. The easiest way to get started is to copy one of those scripts locally (ex ~/start-impalad-gdb.sh) then modify it to look like this:

# exec $IMPALA_BIN/impalad  "$@"
gdb --args $IMPALA_BIN/impalad  "$@"

Since Impalad take a large number of arguments it is easier to save this in a file and pass them to the GDB start script. This way an ImpalaD debug session can then be started by simply doing:

./start-impalad-gdb.sh `cat impalad-arg-file`

Resolving ascii stack traces

Certain bad status' will result in a stack trace being logged. This, on release, shows up without symbol information. It looks something like:

I1105 20:19:56.888319 16516 status.cc:24] Bad sync hash
    @           0x75ea41  (unknown)
    @           0x830da0  (unknown)
    @           0x831156  (unknown)
    @           0x83216d  (unknown)
    @           0x8335aa  (unknown)
    @           0x818a07  (unknown)
    @     0x7f12d7489d97  (unknown)
    @       0x375b2077f1  (unknown)
    @       0x375aee592d  (unknown)

We can resolve those symbols manually in gdb. I've setup the binary and source on c1419 (/home/nong/beta-binary)

[nong@c1419 binaries]$ gdb usr/lib/impala/sbin-retail/impalad 
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-50.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/nong/beta-binary/impala-0.1-SNAPSHOT/binaries/usr/lib/impala/sbin-retail/impalad...Missing separate debuginfo for /home/nong/beta-binary/impala-0.1-SNAPSHOT/binaries/usr/lib/impala/sbin-retail/impalad
Try: yum --disablerepo='*' --enablerepo='*-debuginfo' install /usr/lib/debug/.build-id/d1/f60b313fc3e45912ad919c51cb744dbe1a9399.debug
(no debugging symbols found)...done.
(gdb) set debug-file-directory usr/lib/debug/usr/lib/impala/
(gdb) info sym 0x75ea41
impala::Status::Status(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 193 in section .text
(gdb) info sym 0x830da0
impala::HdfsSequenceScanner::ReadCompressedBlock() + 1296 in section .text
(gdb) info sym 0x831156
impala::HdfsSequenceScanner::ProcessBlockCompressedScanRange() + 118 in section .text
(gdb) info sym 0x83216d
impala::HdfsSequenceScanner::ProcessRange() + 925 in section .text

TODO: can we script this?

Debugging Impala Core Dumps on another System

It is possible to debug impala core dumps generated on a different OS: Debugging Impala Core Dumps on Another System

Space shortcuts

Page tree