Install and test Giraph 0.20 on CDH3u1

Preparation

We start with a simple installation of a Linux distribution, preferably CentOS or RedHat, but others like openSUSE work well too.

Before we can start with the installation we have to check, if all prerequisites are fullfiled. Look here for more details ...
To use the yum package management tool we have to define the repository like explained here ...
Now we need maven3 to compile the giraph source code and run all tests. The installation of maven3 on CentOS is explained here
finally we install subversion (and git which might be usefull for other community projects as well) via package manager

Installation of hadoop (version 0.20.203)

Lets prepare a single node system for developement and tests first. The packages are installed via *sudo yum install $PACKAGE$

hadoop-0.20
hadoop-0.20-namenode
hadoop-0.20-secondarynamenode
hadoop-0.20-datanode
hadoop-0.20-tasktracker
hadoop-0.20-jobtracker
hadoop-0.20-conf-pseudo
hadoop-zookeeper
hadoop-zookeeper-server

Optionaly we can install the following packages as well:

hadoop-hive
hadoop-pig
hadoop-hue
hadoop-scoop

The services are started with the command:

 
$ for service in /etc/init.d/hadoop-0.20*
$ > do
$ > sudo $service start
$ > done
...
$ sudo /sbin/service hadoop-zookeeper-server start

Now we are ready to install and compile Giraph via git.

$ git clone git://git.apache.org/giraph.git
$ cd giraph
$ mvn test

Look for error messages and on the number of faild tests to check, if all things were installed correct.

Let's give it a try ...

First we have to upload some data to the HDFS od our test cluster.

... see wiki for details ... (link comes soon)

$ cd target
$ hadoop jar giraph-0.2-SNAPSHOT-jar-with-dependencies.jar \ org.apache.giraph.GiraphRunner \
org.apache.giraph.examples.SimpleShortestPathsVertex 
-if org.apache.giraph.lib.JsonBase64VertexInputFormat \
-ip shortestPathsInputGraph \
-of org.apache.giraph.lib.JsonBase64VertexOutputFormat \
-op shortestPathsOutputGraph 
-w 3

The input and output format classes are not selected the right way here. I have to improve this example to make it usefull ... but this is in progress.

Child pages

Install and test Giraph 0.20 on CDH3u1

Preparation

Installation of hadoop (version 0.20.203)

Let's give it a try ...