If you wish to compile and package Tajo for CDH, you first need to identify which CDH version you are using. Currently, Tajo provides repositories for Apache Hadoop 2.2.0 and Apache Hadoop 2.3.0 here; however, you will need to use this repository on Github for CDH4 as it is based on Hadoop 2.0.x alpha version. As CDH5 is based on Hadoop 2.3.0, you can build Tajo for CDH5 using the source code from the Apache Hadoop 2.3.0 repository linked to above.
Step 1. Clone the Git Source Code
If you wish to use the current trunk, clone the following source:
If you wish to use the branch for tajo-0.8.0, clone this source instead:
Step 2. Configure the Project POM File with the CDH5 Maven Repository
Next, you will need to add the CDH5 repository to the Tajo POM file ($TAJO_HOME/tajo-project/pom.xml) as follows (see the inline comment):
As you will notice, the Apache and Eclipse repositories are already included.
Step 3. Configure Catalog for Hive Integration
You can integrate Tajo with Apache Hive through Hive’s HCatalogStore by configuring Tajo’s POM files.
First, add the following profile to Tajo’s catalog POM file ($TAJO_HOME/tajo-catalog/tajo-catalog-drivers/pom.xml):
Then, add the following profile to Tajo’s dist POM file ($TAJO_HOME/tajo-dist/pom.xml):
As you will note, this configuration applies to hive-0.12.0-cdh5.0.0, the version of Hive bundled with CDH5.
Step 5. Complete Your Build
If you wish to put together a build without HCatalogStore, the build command is as follows:
If you wish to put together a build with HCatalogStore, use this build command instead:
Once that is done, your Apache Tajo build for CDH5 will be well on its way!
Should you have further questions on the build process, please email the Apache Tajo team at email@example.com.