- The SSSP (abbr. for Single Source Shortest Paths) algorithm described in the Google Pregel paper was used.
- Introduces IO usage, partitioning based on hashing of vertextID, and collective communication.
- The implementation for the SSSP can be found at ShortestPath.
Short summary of the algorithm
bin/hama jar ../hama-0.4x.0-examples.jar sssp <start vertex> <input path> <output path> [number of tasks]
Make sure that every vertex's outlink can somewhere be found in the file as a key-site. Otherwise it will result in weird NullPointerExceptions.
Now you need to transform the text file using:
Then you can run sssp on it with:
bin/hama jar ../hama-0.4x.0-examples.jar sssp Berlin /tmp/outinput.txt /tmp/sssp-output
Note that based on what you have configured, the paths may be in HDFS or on local disk.
On the left side you see your vertex name and on the right the cost which is needed to get to that vertex. In the output sequence file you should get a org.apache.hadoop.io.Text (KEY) and org.apache.hadoop.io.IntWritable (VALUE) pair which is exactly the output from above.