Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: minor changes

...

Hive ODBC Driver

...

...

The Hive ODBC Driver is a software library that implements the Open Database Connectivity (ODBC) API standard for the Hive database management system, enabling ODBC compliant applications to interact seamlessly (ideally) with Hive through a standard interface. This driver will NOT be built as a part of the typical Hive build process and will need to be compiled and built separately according to the instructions below.

...

This guide assumes you are already familiar with the following:

  • Hive
  • Hive Server
  • Thrift
  • ODBC API
  • unixODBC

    Software Requirements

    The following software components are needed for the successful compilation and operation of the Hive ODBC driver:
  • Hive Server - a service through which clients may remotely issue Hive commands and requests. The Hive ODBC driver depends on Hive Server to perform the core set of database interactions. Hive Server is built as part of the Hive build process. More information regarding Hive Server usage can be found here.
  • Apache Thrift - a scalable cross-language software framework that enables the Hive ODBC driver (specifically the Hive client) to communicate with the Hive Server. See this link for the details on Thrift Installation. The Hive ODBC driver was developed with Thrift trunk version r790732, but the latest revision should also be fine. Make sure you note the Thrift install path during the Thrift build process as this information will be needed during the Hive client build process. The Thrift install path will be referred to as THRIFT_HOME.

    Driver Architecture

    Internally, the Hive ODBC Driver contains two separate components: Hive client, and the unixODBC API wrapper.
  • Hive client - provides a set of C-compatible library functions to interact with Hive Server in a pattern similar to those dictated by the ODBC specification. However, Hive client was designed to be independent of unixODBC or any ODBC specific headers, allowing it to be used in any number of generic cases beyond ODBC.
  • unixODBC API wrapper - provides a layer on top of Hive client that directly implements the ODBC API standard. The unixODBC API wrapper will be compiled into a shared object library, which will be the final form of the Hive ODBC driver. The wrapper files will remain a file attachment on the associated JIRA until it can be checked into the unixODBC code repository: HIVE-187, HIVE-1101.

...

  1. Checkout and setup the latest version of Apache Hive. For more details, see Getting Started with Hive. From this point onwards, the path to the Hive root directory will be referred to as HIVE_HOME.
  2. Build the Hive client by running the following command from HIVE_HOME. This will compile and copy the libraries and header files to HIVE_HOME/build/odbc/. Please keep in mind that all paths should be fully specified (no relative paths). If you encounter an "undefined reference to vtables" error, make sure that you have specified the absolute path for thrift.home.
    Code Block
     $ ant compile-cpp -Dthrift.home=<THRIFT_HOME>
     
    You can optionally force Hive client to compile into a non-native bit architecture by specifying the additional parameter (assuming you have the proper compilation libraries):
    Code Block
     $ ant compile-cpp -Dthrift.home=<THRIFT_HOME> -Dword.size=<32 or 64>
     
    You can verify the entire Hive compilation by running the Hive test suite from HIVE_HOME. Specifying the argument '-Dthrift.home=<THRIFT_HOME>' will enable the tests for the Hive client. If you do NOT specify thrift.home, the Hive client tests will not be run and will just return successful.
    Code Block
     $ ant test -Dthrift.home=<THRIFT_HOME>
     
    You can specifically execute the Hive client tests by running the above command from HIVE_HOME/odbc/. NOTE: Hive client tests require that a local Hive Server be operating on port 10000.
    1.#3 To install the Hive client libraries onto your machine, run the following command from HIVE_HOME/odbc/. NOTE: The install path defaults to /usr/local. While there is no current way to change this default directory from the ant build process, a manual install may be performed by skipping the command below and copying out the contents of HIVE_HOME/build/odbc/lib and HIVE_HOME/build/odbc/include into their local file system counterparts.
    Code Block
     $ sudo ant install -Dthrift.home=<THRIFT_HOME>
     
    NOTE: The compiled static library, libhiveclient.a, requires linking with stdc++ as well as thrift libraries to function properly.
    NOTE: Currently, there is no way to specify non-system library and header directories to the unixODBC build process. Thus, the Hive client libraries and headers MUST be installed to a default system location in order for the unixODBC build process to detect these files. This issue may be remedied in the future.

    unixODBC API Wrapper Build/Setup

    After you have built and installed the Hive client, you can now install the unixODBC API wrapper:
  3. In the unixODBC root directory, run the following command:
    Code Block
     $ ./configure --enable-gui=no --prefix=<unixODBC_INSTALL_DIR>
     
    If you encounter the the errors: "redefinition of 'struct _hist_entry'" or "previous declaration of 'add_history' was here" then re-execute the configure with the following command:
    Code Block
     $ ./configure --enable-gui=no --enable-readline=no --prefix=<unixODBC_INSTALL_DIR>
     
    To force the compilation of the unixODBC API wrapper into a non-native bit architecture, modify the CC and CXX environment variables to include the appropriate flags. For example:
    Code Block
     $ CC="gcc -m32" CXX="g++ -m32" ./configure --enable-gui=no --enable-readline=no --prefix=<unixODBC_INSTALL_DIR>
     
    1.#2 Compile the unixODBC API wrapper with the following:
    Code Block
     $ make
     
    1.#3 If you want to completely install unixODBC and all related drivers:
  4. Run the following from the unixODBC root directory:
    Code Block
      $ sudo make install
      
    a.#2 If your system complains about undefined symbols during unixODBC testing (such as with isql or odbcinst) after installation, try running ldconfig to update your dynamic linker's runtime libraries.
    1.#4 If you only want to obtain the Hive ODBC driver shared object library:
  5. After compilation, the driver will be located at <unixODBC_BUILD_DIR>/Drivers/hive/.libs/libodbchive.so.1.0.0.
  6. This may be copied to any other location as desired. Keep in mind that the Hive ODBC driver has a dependency on the Hive client shared object library: libhiveclient.so and libthrift.so.0.
  7. You can manually install the unixODBC API wrapper by doing the following:
    Code Block
      $ cp <unixODBC_BUILD_DIR>/Drivers/hive/.libs/libodbchive.so.1.0.0 <SYSTEM_INSTALL_DIR>
      $ cd <SYSTEM_INSTALL_DIR>
      $ ln -s libodbchive.so.1.0.0 libodbchive.so
      $ ldconfig
      

    Connecting the Driver to a Driver Manager

    This portion assumes that you have already built and installed both the Hive client and the unixODBC API wrapper shared libraries on the current machine. To connect the Hive ODBC driver to a previously installed Driver Manager (such as the one provided by unixODBC or a separate application):
  8. Locate the odbc.ini file associated with the Driver Manager (DM):
    1. If you are installing the driver on the system DM, then you can run the following command to print the locations of DM configuration files.
      Code Block
        $ odbcinst -j
        unixODBC 2.2.14
        DRIVERS............: /usr/local/etc/odbcinst.ini
        SYSTEM DATA SOURCES: /usr/local/etc/odbc.ini
        FILE DATA SOURCES..: /usr/local/etc/ODBCDataSources
        USER DATA SOURCES..: /home/ehwang/.odbc.ini
        SQLULEN Size.......: 8
        SQLLEN Size........: 8
        SQLSETPOSIROW Size.: 8
        
      a.#2 If you are installing the driver on an application DM, then you have to help yourself on this one (wink). Hint: try looking in the installation directory of your application.
  9. Keep in mind that an application's DM can exist simultaneously with the system DM and will likely use its own configuration files, such as odbc.ini.
  10. Also, note that some applications do not have their own DMs and simply use the system DM.

...