We start with a simple installation of a Linux distribution, preferably CentOS or RedHat, but others like openSUSE work well too.
- Before we can start with the installation we have to check, if all prerequisites are fullfiled. Look here for more details ...
- To use the yum package management tool we have to define the repository like explained here ...
- Now we need maven3 to compile the giraph source code and run all tests. The installation of maven3 on CentOS is explained here
- finally we install subversion (and git which might be usefull for other community projects as well) via package manager
Installation of hadoop (version 0.20.203)
Lets prepare a single node system for developement and tests first. The packages are installed via *sudo yum install $PACKAGE$
Optionaly we can install the following packages as well:
The services are started with the command:
$ for service in /etc/init.d/hadoop-0.20* $ > do $ > sudo $service start $ > done ... $ sudo /sbin/service hadoop-zookeeper-server start
Now we are ready to install and compile Giraph via git.
$ git clone git://git.apache.org/giraph.git $ cd giraph $ mvn test
Look for error messages and on the number of faild tests to check, if all things were installed correct.
Let's give it a try ...
First we have to upload some data to the HDFS od our test cluster.
... see wiki for details ... (link comes soon)
$ cd target $ hadoop jar giraph-0.2-SNAPSHOT-jar-with-dependencies.jar \ org.apache.giraph.GiraphRunner \ org.apache.giraph.examples.SimpleShortestPathsVertex -if org.apache.giraph.lib.JsonBase64VertexInputFormat \ -ip shortestPathsInputGraph \ -of org.apache.giraph.lib.JsonBase64VertexOutputFormat \ -op shortestPathsOutputGraph -w 3
The input and output format classes are not selected the right way here. I have to improve this example to make it usefull ... but this is in progress.