Building
Build requirements:
- Java 1.6
- Maven 3 or higher
- Download Giraph from git://git.apache.org/giraph.git. One easy way to do this is:
git clone git://git.apache.org/giraph.git
- In the base path, use ‘mvn compile’ to build the giraph jar (will be generated in giraph/target/giraph-{version}-jar-with-dependencies). If you would like to build the jar and run the unittests use ‘mvn package’ instead.
Running an example:
Hadoop requirements:
- Hadoop 0.20.203 or higher (must contain the necessary security changes)
- Build the Giraph jar with dependencies as described above.
- In this example, we run a page rank benchmark included with Giraph located in org.apache.giraph.benchmark.PageRankBenchmark. For help on the options, run the following command with the appropriate location changed for your generated jar file:
hadoop jar giraph-0.1-jar-with-dependencies.jar org.apache.giraph.benchmark.PageRankBenchmark -h usage: org.apache.giraph.benchmark.PageRankBenchmark [-e <arg>] [-h] [-s <arg>] [-v] [-V <arg>] [-w <arg>] -e,--edgesPerVertex <arg> Edges per vertex -h,--help Help -s,--supersteps <arg> Supersteps to execute before finishing -v,--verbose Verbose -V,--aggregateVertices <arg> Aggregate vertices -w,--workers <arg> Number of workers
Example page rank benchmark run with 5 million vertices, 3 supersteps, and 30 workers:
$ hadoop jar giraph-0.1-jar-with-dependencies.jar org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 3 -v -V 50000000 -w 30 11/08/01 20:40:35 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 3635750 for user 11/08/01 20:40:35 INFO security.TokenCache: Got dt for user … 11/08/01 20:40:35 WARN bsp.BspOutputFormat: checkOutputSpecs: ImmutableOutputCommiter will not check anything 11/08/01 20:40:38 INFO mapred.JobClient: Running job: job_201107180643_176350 11/08/01 20:40:39 INFO mapred.JobClient: map 0% reduce 0% 11/08/01 20:41:06 INFO mapred.JobClient: map 100% reduce 0% 11/08/01 20:41:38 INFO mapred.JobClient: Job complete: job_201107180643_176350 11/08/01 20:41:38 INFO mapred.JobClient: Counters: 30 11/08/01 20:41:38 INFO mapred.JobClient: Job Counters 11/08/01 20:41:38 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=1306584 11/08/01 20:41:38 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 11/08/01 20:41:38 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 11/08/01 20:41:38 INFO mapred.JobClient: Launched map tasks=31 11/08/01 20:41:38 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 11/08/01 20:41:38 INFO mapred.JobClient: Giraph Timers 11/08/01 20:41:38 INFO mapred.JobClient: Total (milliseconds)=38320 11/08/01 20:41:38 INFO mapred.JobClient: Superstep 3 (milliseconds)=8607 11/08/01 20:41:38 INFO mapred.JobClient: Setup (milliseconds)=2190 11/08/01 20:41:38 INFO mapred.JobClient: Shutdown (milliseconds)=230 11/08/01 20:41:38 INFO mapred.JobClient: Superstep 0 (milliseconds)=2320 11/08/01 20:41:38 INFO mapred.JobClient: Superstep 4 (milliseconds)=5664 11/08/01 20:41:38 INFO mapred.JobClient: Superstep 5 (milliseconds)=3181 11/08/01 20:41:38 INFO mapred.JobClient: Superstep 2 (milliseconds)=6108 11/08/01 20:41:38 INFO mapred.JobClient: Superstep 1 (milliseconds)=10016 11/08/01 20:41:38 INFO mapred.JobClient: Giraph Stats 11/08/01 20:41:38 INFO mapred.JobClient: Aggregate edges=5000000 11/08/01 20:41:38 INFO mapred.JobClient: Superstep=6 11/08/01 20:41:38 INFO mapred.JobClient: Current workers=30 11/08/01 20:41:38 INFO mapred.JobClient: Sent messages=0 11/08/01 20:41:38 INFO mapred.JobClient: Aggregate finished vertices=5000000 11/08/01 20:41:38 INFO mapred.JobClient: Aggregate vertices=5000000 11/08/01 20:41:38 INFO mapred.JobClient: File Output Format Counters 11/08/01 20:41:38 INFO mapred.JobClient: Bytes Written=0 11/08/01 20:41:38 INFO mapred.JobClient: FileSystemCounters 11/08/01 20:41:38 INFO mapred.JobClient: FILE_BYTES_READ=7470 11/08/01 20:41:38 INFO mapred.JobClient: HDFS_BYTES_READ=1364 11/08/01 20:41:38 INFO mapred.JobClient: FILE_BYTES_WRITTEN=948218 11/08/01 20:41:38 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=830011427 11/08/01 20:41:38 INFO mapred.JobClient: File Input Format Counters 11/08/01 20:41:38 INFO mapred.JobClient: Bytes Read=0 11/08/01 20:41:38 INFO mapred.JobClient: Map-Reduce Framework 11/08/01 20:41:38 INFO mapred.JobClient: Map input records=31 11/08/01 20:41:38 INFO mapred.JobClient: Spilled Records=0 11/08/01 20:41:38 INFO mapred.JobClient: Map output records=0 11/08/01 20:41:38 INFO mapred.JobClient: SPLIT_RAW_BYTES=1364