You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Next »

Provides a collection of tips for debugging Trafodion code.


Debugging an mxosrvr Process

For purposes of debugging compiler and executor code, using gdb on sqlci is the simplest and easiest debugging environment.

However, occasionally you may be debugging an issue that occurs only via ODBC/JDBC and cannot be reproduced via sqlci. For these issues, you may need to debug in the mxosrvr process. These are persistent server processes on the Trafodion cluster that service ODBC/JDBC connections.

Finding the Right mxosrvr Process

You can put an mxosrvr process into debug via gdb by starting gdb on the proper node and using the gdb attach command. To do this, though, you will need to know which mxosrvr process your client is connected to and its Linux pid. If you are using trafci (the type-4 JDBC interactive client to Trafodion), you can use the "show remoteprocess" command as in the following example:

SQL>show remoteprocess;
REMOTE PROCESS \venkatsentry-2.novalocal:1.$Z0112LJ

In the output above, the node hosting the mxosrvr process is venkatsentry-2.novalocal, and the Trafodion process name is $Z0112LJ.

If you now start a shell on that node, you can do an "sqps | grep Z0112LJ" command to see the Linux pid of the process.

Dealing with Timeouts

The mxosrvr process is designed to be persistent and relies on Zookeeper and the DCS Master process for this purpose. There is timeout logic to determine if an mxosrvr process is still alive. If mxosrvr is unresponsive for longer than that time it may kill itself or be killed (if it still exists) and a new mxosrvr process is created. This can be a problem in debugging; slowly stepping through code in gdb can cause one or another timeout to be exceeded. To mitigate this, you can set the timeouts to higher values. For example, add the following to the conf/dcs-site.xml file on each node in the cluster:

   <property>
      <name>dcs.server.user.program.zookeeper.session.timeout</name>
      <value>3600</value>
   </property>
   <property>
      <name>zookeeper.session.timeout</name>
      <value>3600000</value>
  </property>

After changing conf/dcs-site.xml, you will need to stop and restart DCS (use the "dcsstop" and "dcsstart" scripts) in order for the change to take effect.

For more detailed information about mxosrvr configuration parameters, see the Trafodion Data Connectivity Services Reference Guide at http://trafodion.apache.org/docs/dcs_reference/index.html.

Turning off Repository Writes

If you are debugging a compiler or executor issue in an mxosrvr process, you may find that your breakpoints are hitting on writes to the Trafodion Repository tables. There is a separate thread in mxosrvr that periodically flushes out statistical data to the Repository using SQL DML statements. This can be annoying. You can turn off Repository writes by adding the following to conf/dcs-site.xml:

   <property>
      <name>dcs.server.user.program.statistics.enabled</name>
      <value>false</value>
   </property>

Debugging Mixed C++/Java Processes

Many Trafodion processes (such as sqlci and mxosrvr) have a C++ main and substantial amounts of Java code invoked via JNI.

You can debug the C++ parts using a debugger such as gdb. One gotcha is that JVM threads often throw signals such as SIGSEGV as part of their normal processing. (The HotSpot JVM for example is reputed to use SIGSEGV to trigger lazy compilation of Java methods.) Unfortunately, gdb catches the signals first. This can be quite annoying.

A way to work around this is to enter the following command into gdb:

handle SIGSEGV pass noprint nostop

Alternatively, place this command in your .gdbinit file.

  • No labels