Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Pregel Programming Model with Spark GraphX

The Pregel Programming Model with Spark GraphX

GraphX is the API for distributed graph processing of Apache Spark. In this talk we'll introduce the Pregel programming model and we'll focus on how to write graph algorithms with this paradigm.

Andrea Iacono

April 30, 2016
Tweet

More Decks by Andrea Iacono

Other Decks in Programming

Transcript

  1. Agenda - GraphX Introduction - Pregel programming model - Code

    examples The main focus will be on the programming model
  2. GraphX is a graph processing system built on top of

    Apache Spark - property graph representation - based on RDDs - user defined partitioning on RDDs
  3. Pregel Programming Model https://kowshik.github.io/JPregel/pregel_paper.pdf - based on vertices - messages

    from/to neighbours - bounded in supersteps - status (active / inactive)
  4. GraphX implementation of Pregel Uses three functions: - vprog computes

    the new vertex value - sendMsg decides to whom send the new value - mergeMsg merges incoming values
  5. graph.pregel( initialMsg = Int.MinValue, maxIterations = Int.MaxValue, activeDirection = EdgeDirection.Out

    )( // vprog (vertexId: Long, currentVertexAttr: Int, newVertexAttr: Int) => if (newVertexAttr > currentVertexAttr) newVertexAttr else currentVertexAttr, // sendMsg (edgeTriplet: EdgeTriplet[Int, Int]) => { if (edgeTriplet.srcAttr > edgeTriplet.dstAttr) Iterator( (edgeTriplet.dstId, edgeTriplet.srcAttr) ) else Iterator.empty }, // mergeMsg (attribute1: Int, attribute2: Int) => if (attribute1 > attribute2) attribute1 else attribute2 ) Max Value implementation
  6. Graph initial state Node [1]: 3 Node [2]: 6 Node

    [3]: 2 Node [4]: 1 Graph final state Node [1]: 6 Node [2]: 6 Node [3]: 6 Node [4]: 6 Max value of the graph is 6. Max Value implementation Results:
  7. type VertexId = scala.Long case class City( name: String, id:

    VertexId ) case class VertexAttribute( cityName: String, distance: Double, path: List[City] ) Dijkstra's algorithm implementation Types definitions:
  8. val shortestPathGraph = initialGraph.pregel( initialMsg = VertexAttribute( "", Double.PositiveInfinity, List[City]()

    ), maxIterations = Int.MaxValue, activeDirection = EdgeDirection.Out )( vprog, sendMsg, mergeMsg ) Dijkstra's algorithm implementation
  9. val vprog = ( vertexId: VertexId, currentVertexAttr: VertexAttribute, newVertexAttr: VertexAttribute

    ) => if (currentVertexAttr.distance <= newVertexAttr.distance) { currentVertexAttr else newVertexAttr } val mergeMsg = ( attribute1: VertexAttribute, attribute2: VertexAttribute ) => if (attribute1.distance < attribute2.distance) { attribute1 else attribute2 } Dijkstra's algorithm implementation
  10. val sendMsg = (edgeTriplet: EdgeTriplet[VertexAttribute, Double]) => { if (edgeTriplet.srcAttr.distance

    < (edgeTriplet.dstAttr.distance - edgeTriplet.attr)) { Iterator( ( edgeTriplet.dstId, new VertexAttribute( edgeTriplet.dstAttr.cityName, edgeTriplet.srcAttr.distance + edgeTriplet.attr, edgeTriplet.srcAttr.path :+ new City( edgeTriplet.dstAttr.cityName, edgeTriplet.dstId ) ) ) ) } else Iterator.empty } Dijkstra's algorithm implementation
  11. Going from Washington to Chicago has a distance of 105.0

    km. Path is: Washington [1] => Baltimore [2] => Detroit [3] => NewYork [5] => Chicago [4] Going from Washington to Washington has a distance of 0.0 km. Path is: Washington [1] Going from Washington to Philadelphia has a distance of 91.0 km. Path is: Washington [1] => Baltimore[2] => Detroit[3] => NewYork[5] => Philadelphia[6] Going from Washington to Detroit has a distance of 62.0 km. Path is: Washington [1] => Baltimore [2] => Detroit [3] Going from Washington to NewYork has a distance of 76.0 km. Path is: Washington [1] => Baltimore [2] => Detroit [3] => NewYork [5] Going from Washington to Baltimore has a distance of 27.0 km. Path is: Washington [1] => Baltimore [2] Dijkstra's algorithm implementation Results: