If you wish to compile and package Tajo for CDH, you first need to identify which CDH version you are using. Currently, Tajo provides repositories for Apache Hadoop 2.2.0 and Apache Hadoop 2.3.0 here; however, you will need to use this repository on Github for CDH4 as it is based on Hadoop 2.0.x alpha version. As CDH5 is based on Hadoop 2.3.0, you can build Tajo for CDH5 using the source code from the Apache Hadoop 2.3.0 repository linked to above.  


Step 1.  Clone the Git Source Code


If you wish to use the current trunk, clone the following source: 

 

git clone https://git-wip-us.apache.org/repos/asf/tajo.git 

 

If you wish to use the branch for tajo-0.8.0, clone this source instead: 

 

git clone -b branch-0.8.0 https://git-wip-us.apache.org/repos/asf/tajo.git

 

Step 2.  Configure the Project POM File with the CDH5 Maven Repository


Next, you will need to add the CDH5 repository to the Tajo POM file ($TAJO_HOME/tajo-project/pom.xml) as follows (see the inline comment): 

<repositories>
    <repository>
      <id>apache.snapshots</id>
      <url>http://repository.apache.org/snapshots</url>
      <snapshots>
        <enabled>true</enabled>
      </snapshots>
    </repository>
    <repository>
      <id>eclipse-jetty</id>
      <url>http://repo2.maven.org/maven2/org/eclipse/jetty/jetty-distribution/</url>
      <snapshots>
        <enabled>false</enabled>
      </snapshots>
    </repository>
 
    <!-- note the following -->
    <repository>
      <id>cloudera</id>
      <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
      <snapshots>
        <enabled>false</enabled>
      </snapshots>

    </repository>
  </repositories>

 As you will notice, the Apache and Eclipse repositories are already included.


Step 3.  Configure Catalog for Hive Integration


You can integrate Tajo with Apache Hive through Hive’s HCatalogStore by configuring Tajo’s POM files.

First, add the following profile to Tajo’s catalog POM file ($TAJO_HOME/tajo-catalog/tajo-catalog-drivers/pom.xml):

<profile>
      <id>hcatalog-cdh5.0.0</id>
      <activation>
        <activeByDefault>false</activeByDefault>
      </activation>
      <modules>
        <module>tajo-hcatalog</module>
      </modules>
    </profile>


Then, add the following profile to Tajo’s dist POM file ($TAJO_HOME/tajo-dist/pom.xml):

 

if [ -f $ROOT/tajo-catalog/tajo-catalog-drivers/tajo-hcatalog/target/lib/hive-hcatalog-core-*-cdh*.jar ]
then
run cp -r $ROOT/tajo-catalog/tajo-catalog-drivers/tajo-hcatalog/target/lib/hive-hcatalog-core-*-cdh*.jar lib/
fi

 

Finally, add the following profile to Tajo’s HCatalog POM file ($TAJO_HOME/tajo-catalog/tajo-catalog-drivers/tajo-hcatalog/pom.xml):

 

<profile>
      <repositories>
        <repository>
          <id>cloudera</id>
          <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
          <snapshots>
            <enabled>true</enabled>
          </snapshots>
        </repository>
      </repositories>
      <id>hcatalog-cdh5.0.0</id>
      <activation>
        <activeByDefault>false</activeByDefault>
      </activation>
      <properties>
        <hive.version>0.12.0-cdh5.0.0</hive.version>
      </properties>
      <dependencies>
        <dependency>
          <groupId>javax.jdo</groupId>
          <artifactId>jdo2-api</artifactId>
          <version>2.3-eb</version>
          <scope>provided</scope>
        </dependency>
        <dependency>
          <groupId>org.apache.hive</groupId>
          <artifactId>hive-exec</artifactId>
          <version>${hive.version}</version>
          <scope>provided</scope>
          <exclusions>
            <exclusion>
              <groupId>org.apache.hive</groupId>
              <artifactId>hive-common</artifactId>
            </exclusion>
            <exclusion>
              <groupId>org.apache.hive</groupId>
              <artifactId>hive-contrib</artifactId>
            </exclusion>
            <exclusion>
              <groupId>org.apache.hive</groupId>
              <artifactId>hive-hbase-handler</artifactId>
            </exclusion>
            <exclusion>
              <groupId>org.apache.hive</groupId>
              <artifactId>hive-metastore</artifactId>
            </exclusion>
            <exclusion>
              <groupId>org.apache.hive</groupId>
              <artifactId>hive-serde</artifactId>
            </exclusion>
            <exclusion>
              <groupId>org.apache.hive</groupId>
              <artifactId>hive-shims</artifactId>
            </exclusion>
            <exclusion>
              <groupId>org.apache.hive</groupId>
              <artifactId>hive-testutils</artifactId>
            </exclusion>
            <exclusion>
              <groupId>org.apache.thrift</groupId>
              <artifactId>libfb303</artifactId>
            </exclusion>
            <exclusion>
              <groupId>org.apache.thrift</groupId>
              <artifactId>libthrift</artifactId>
            </exclusion>
            <exclusion>
              <groupId>javax.jdo</groupId>
              <artifactId>jdo2-api</artifactId>
            </exclusion>
          </exclusions>
        </dependency>
        <dependency>
          <groupId>org.apache.hive</groupId>
          <artifactId>hive-metastore</artifactId>
          <version>${hive.version}</version>
          <scope>provided</scope>
          <exclusions>
            <exclusion>
              <groupId>org.apache.hive</groupId>
              <artifactId>hive-common</artifactId>
            </exclusion>
            <exclusion>
              <groupId>org.apache.hive</groupId>
              <artifactId>hive-serde</artifactId>
            </exclusion>
            <exclusion>
              <groupId>org.apache.hive</groupId>
              <artifactId>hive-shimss</artifactId>
            </exclusion>
            <exclusion>
              <groupId>org.apache.thrift</groupId>
              <artifactId>libfb303</artifactId>
            </exclusion>
            <exclusion>
              <groupId>org.apache.thrift</groupId>
              <artifactId>libthrift</artifactId>
            </exclusion>
            <exclusion>
              <groupId>javax.jdo</groupId>
              <artifactId>jdo2-api</artifactId>
            </exclusion>
          </exclusions>
        </dependency>
        <dependency>
          <groupId>org.apache.hive</groupId>
          <artifactId>hive-cli</artifactId>
          <version>${hive.version}</version>
          <scope>provided</scope>
          <exclusions>
            <exclusion>
              <groupId>org.apache.hive</groupId>
              <artifactId>hive-common</artifactId>
            </exclusion>
            <exclusion>
              <groupId>org.apache.hive</groupId>
              <artifactId>hive-exec</artifactId>
            </exclusion>
            <exclusion>
              <groupId>org.apache.hive</groupId>
              <artifactId>hive-metastore</artifactId>
            </exclusion>
            <exclusion>
              <groupId>org.apache.hive</groupId>
              <artifactId>hive-serde</artifactId>
            </exclusion>
            <exclusion>
              <groupId>org.apache.hive</groupId>
              <artifactId>hive-service</artifactId>
            </exclusion>
            <exclusion>
              <groupId>org.apache.hive</groupId>
              <artifactId>hive-shims</artifactId>
            </exclusion>
            <exclusion>
              <groupId>javax.jdo</groupId>
              <artifactId>jdo2-api</artifactId>
            </exclusion>
          </exclusions>
        </dependency>
        <dependency>
          <groupId>org.apache.hive.hcatalog</groupId>
          <artifactId>hive-hcatalog-core</artifactId>
          <version>${hive.version}</version>
          <scope>provided</scope>
          <exclusions>
            <exclusion>
              <groupId>org.apache.hive</groupId>
              <artifactId>hive-cli</artifactId>
            </exclusion>
            <exclusion>
              <groupId>org.apache.hive</groupId>
              <artifactId>hive-common</artifactId>
            </exclusion>
            <exclusion>
              <groupId>org.apache.hive</groupId>
              <artifactId>hive-exec</artifactId>
            </exclusion>
            <exclusion>
              <groupId>org.apache.hive</groupId>
              <artifactId>hive-metastore</artifactId>
            </exclusion>
            <exclusion>
              <groupId>org.apache.hive</groupId>
              <artifactId>hive-serde</artifactId>
            </exclusion>
            <exclusion>
              <groupId>org.apache.hive</groupId>
              <artifactId>hive-service</artifactId>
            </exclusion>
            <exclusion>
              <groupId>org.apache.hive</groupId>
              <artifactId>hive-shims</artifactId>
            </exclusion>
            <exclusion>
              <groupId>javax.jdo</groupId>
              <artifactId>jdo2-api</artifactId>
            </exclusion>
          </exclusions>
        </dependency>
        <dependency>
          <groupId>org.apache.hadoop</groupId>
          <artifactId>hadoop-mapreduce-client-core</artifactId>
          <version>${hadoop.version}</version>
          <scope>provided</scope>
        </dependency>
      </dependencies>
    </profile>

As you will note, this configuration applies to hive-0.12.0-cdh5.0.0, the version of Hive bundled with CDH5. 


Step 5. Complete Your Build

 

If you wish to put together a build without HCatalogStore, the build command is as follows:

 

mvn clean install package  -DskipTests -Pdist -Dhadoop.version=2.3.0-cdh5.0.0

 

If you wish to put together a build with HCatalogStore, use this build command instead:

 

mvn clean install package  -DskipTests -Pdist -Dhadoop.version=2.3.0-cdh5.0.0 -Phcatalog-cdh5.0.0

 

Once that is done, your Apache Tajo build for CDH5 will be well on its way!



Should you have further questions on the build process, please email the Apache Tajo team at dev@tajo.apache.org.