Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

3/5/2013 UPDATE: Lots of new API changes and new Graph representation options, especially around Edges, Edge weights/values, and Multigraph support. More docs to follow!

Due to the distributed BSP nature of Giraph, all graphs are represented by application code as a Vertex implementation. By subclassing existing Vertex implementations (strongly suggested at this stage of Giraph development) the user can decide on trade offs in memory and speed vs. feature set of that Vertex. This is also important because common, provided base classes in Giraph handle the underlying mechanisms for messaging vertices that are not local to the worker node hosting the sender vertex, etc. Only very experienced users are likely to have good luck re-implementing low-level Vertex bases such as those in graph/ – New users are encouraged to extend or emulate user-ready templates in the example/ directory instead.

...

Weighted Graphs: A weighted graph can be supplied as input to Giraph when the user (1) Chooses or supplies a VertexInputFormat/VertexReader combo that expects out-edge destination data to also include out-edge value data, and (2) chooses the Writable subclass that the user wishes to represent the edge values and to applies it to the generic E parameter in application code.

Multigraphs: Giraph does support multigraphs! A multigraph in Giraph can be represented by careful manipulation of the generic parameter E. Let's define a multigraph as a graph in which a vertex can contain more than one out-edge to the same neighbor while maintaining different state about those edges (there's more than one road from Boston to New York, but with different costs, for instance.) In Giraph, this functionality can be implemented as follows:

In a Vertex<I,V,E,M> the generic parameter E can be a Java type (must subclass Hadoop's Writable) that is not a single object but a tuple. The only tuple type supplied in Giraph currently (more to follow on this soon) is the org.apache.giraph.comm.ArrayListWritable which is not to be confused with ArrayWritable from Hadoop. Once implemented, an ArrayListWritable behaves like, and supports a subset of the API of, a regular Java ArrayList. Because this collection is ordered, a careful application implementor can keep track of the individual state data of each out-edge that targets the same remote neighbor vertex.

A single Edge<I,E> object, for instance, could feature a Text (a form of String Writable, maybe "Boston") vertex id as its I datatype, and an IntArrayListWritable E value representing (in order) each state value (perhaps miles that each alternate road takes to the same destination vertex, or some other form of user-defined edge cost.)

A Vertex with I parameter Text as its vertex ID datatype could hold the ID data "Boston." It could possess an out-edge such as this one (one of many, presumably):

Edge<Text,IntArrayListWritable> == ["New York" [ 100, 243, 453, ... ]]

More state could be represented by making the type stored by the ArrayListWritable subclass yet another tuple type (also must be Writable) that can hold more data per out-edge from the same source. The implementor is free to use non-Writable state fields within the local confines of the vertex they implement to keep track of which index of the ArrayListWritable represents which edge in the multigraph. This is a temporary rather than an ideal solution to multigraph implementation, but has proven workable for application authors so far. Extensions and contributions from the community are welcome!

See SimpleTriangleClosingVertex.java in the examples/ directory or Giraph's MsgList.java file in the comm/ directory for examples of how to subclass the ArrayListWritable and supply it with the strongly typed Writable that will be stored within the array. MsgList.java uses ArrayListWritable for internal transport of vertex messages. SimpleTriangleClosingVertex uses an IntArrayListWritable to store tuples of IntWritables as its V parameter, the vertex values, which are the output of the application runnow features native support for Multigraphs! More to follow...