PROCESSING LARGE-SCALE GRAPHS WITH GOOGLE(TM) PREGEL MICHAEL HACKSTEIN FRONT END AND GRAPH SPECIALIST ARANGODB

Processing large-scale graphs with GoogleTMPregel Michael Hackstein @mchacki November 17th www.arangodb.com

Michael Hackstein ArangoDB Core Team Web Frontend Graph visualisation Graph features Host of cologne.js Master’s Degree (spec. Databases and Information Systems) 1

Graph Algorithms Pattern matching Search through the entire graph Identify similar components ⇒ Touch all vertices and their neighbourhoods 2

Graph Algorithms Pattern matching Search through the entire graph Identify similar components ⇒ Touch all vertices and their neighbourhoods Traversals Deﬁne a speciﬁc start point Iteratively explore the graph ⇒ History of steps is known 2

Graph Algorithms Pattern matching Search through the entire graph Identify similar components ⇒ Touch all vertices and their neighbourhoods Traversals Deﬁne a speciﬁc start point Iteratively explore the graph ⇒ History of steps is known Global measurements Compute one value for the graph, based on all it’s vertices or edges Compute one value for each vertex or edge ⇒ Often require a global view on the graph 2

Pregel A framework to query distributed, directed graphs. Known as “Map-Reduce” for graphs Uses same phases Has several iterations Aims at: Operate all servers at full capacity Reduce network traﬃc Good at calculations touching all vertices Bad at calculations touching a very small number of vertices 3

Example – Connected Components active inactive 3 forward message 2 backward message 1 1 2 2 3 3 4 4 5 5 6 6 7 7 4

Example – Connected Components active inactive 3 forward message 2 backward message 1 1 2 2 3 3 4 4 5 5 6 6 7 7 2 3 4 4 5 6 7 4

Example – Connected Components active inactive 3 forward message 2 backward message 1 1 2 2 3 3 4 4 5 5 6 6 7 7 2 3 4 4 5 6 7 4

Example – Connected Components active inactive 3 forward message 2 backward message 1 1 2 2 3 3 4 4 5 5 6 5 7 6 1 2 2 3 5 5 6 4

Example – Connected Components active inactive 3 forward message 2 backward message 1 1 2 2 3 3 4 4 5 5 6 5 7 6 1 2 2 3 5 5 6 4

Example – Connected Components active inactive 3 forward message 2 backward message 1 1 2 1 3 2 4 2 5 5 6 5 7 5 1 1 2 2 4

Example – Connected Components active inactive 3 forward message 2 backward message 1 1 2 1 3 2 4 2 5 5 6 5 7 5 1 1 2 2 4

Example – Connected Components active inactive 3 forward message 2 backward message 1 1 2 1 3 1 4 1 5 5 6 5 7 5 1 1 4

Example – Connected Components active inactive 3 forward message 2 backward message 1 1 2 1 3 1 4 1 5 5 6 5 7 5 1 1 4

Example – Connected Components active inactive 3 forward message 2 backward message 1 1 2 1 3 1 4 1 5 5 6 5 7 5 4

Pregel – Sequence 5

Pregel – Sequence 5

Pregel – Sequence 5

Pregel – Sequence 5

Pregel – Sequence 5

Worker ˆ = Map “Map” a user-deﬁned algorithm over all vertices Output: set of messages to other vertices Available parameters: The current vertex and his outbound edges All incoming messages Global values Allow modiﬁcations on the vertex: Attach a result to this vertex and his outgoing edges Delete the vertex and his outgoing edges Deactivate the vertex 6

Combine ˆ = Reduce “Reduce” all generated messages Output: An aggregated message for each vertex. Executed on sender as well as receiver. Available parameters: One new message for a vertex The stored aggregate for this vertex Typical combiners are SUM, MIN or MAX Reduces network traﬃc 7

Activity ˆ = Termination Execute several rounds of Map/Reduce Count active vertices and messages Start next round if one of the following is true: At least one vertex is active At least one message is sent Terminate if neither a vertex is active nor messages were sent Store all non-deleted vertices and edges as resulting graph 8

Pregel at ArangoDB Started as a side project in free hack time Experimental on operational database Implemented as an alternative to traversals Make use of the ﬂexibility of JavaScript: No strict type system No pre-compilation, on-the-ﬂy queries Native JSON documents Really fast development 9

Pagerank for TinkerPop3 11 1 public class PageRankVertexProgram implements VertexProgram < Double > { 2 private MessageType.Local messageType = MessageType.Local.of (() -> GraphTraversal .of().outE ()); 3 public static final String PAGE_RANK = Graph.Key.hide("gremlin .pageRank"); 4 public static final String EDGE_COUNT = Graph.Key.hide(" gremlin.edgeCount"); 5 private static final String VERTEX_COUNT = "gremlin. pageRankVertexProgram .vertexCount"; 6 private static final String ALPHA = "gremlin. pageRankVertexProgram .alpha"; 7 private static final String TOTAL_ITERATIONS = "gremlin. pageRankVertexProgram . totalIterations "; 8 private static final String INCIDENT_TRAVERSAL = "gremlin. pageRankVertexProgram . incidentTraversal "; 9 private double vertexCountAsDouble = 1; 10 private double alpha = 0.85d; 11 private int totalIterations = 30; 12 private static final Set COMPUTE_KEYS = new HashSet <>( Arrays.asList(PAGE_RANK , EDGE_COUNT)); 13 14 private PageRankVertexProgram () {} 15 16 @Override 17 public void loadState(final Configuration configuration) { 18 this. vertexCountAsDouble = configuration .getDouble( VERTEX_COUNT , 1.0d); 19 this.alpha = configuration .getDouble(ALPHA , 0.85d); 20 this. totalIterations = configuration .getInt( TOTAL_ITERATIONS , 30); 21 try { 22 if ( configuration .containsKey( INCIDENT_TRAVERSAL )) { 23 final SSupplier traversalSupplier = VertexProgramHelper .deserialize(configuration , INCIDENT_TRAVERSAL ); 24 VertexProgramHelper . verifyReversibility ( traversalSupplier .get()); 25 this.messageType = MessageType.Local.of(( SSupplier) traversalSupplier ); 26 } 27 } catch (final Exception e) { 28 throw new IllegalStateException (e.getMessage (), e); 29 } 30 } 32 @Override 33 public void storeState(final Configuration configuration) { 34 configuration .setProperty(GraphComputer.VERTEX_PROGRAM , PageRankVertexProgram .class.getName ()); 35 configuration .setProperty(VERTEX_COUNT , this. vertexCountAsDouble ); 36 configuration .setProperty(ALPHA , this.alpha); 37 configuration .setProperty(TOTAL_ITERATIONS , this. totalIterations ); 38 try { 39 VertexProgramHelper .serialize(this.messageType. getIncidentTraversal (), configuration , INCIDENT_TRAVERSAL ); 40 } catch (final Exception e) { 41 throw new IllegalStateException (e.getMessage (), e); 42 } 43 } 44 45 @Override 46 public Set getElementComputeKeys () { 47 return COMPUTE_KEYS; 48 } 49 50 @Override 51 public void setup(final Memory memory) { 52 53 } 54 55 @Override 56 public void execute(final Vertex vertex , Messenger messenger , final Memory memory) { 57 if (memory. isInitialIteration ()) { 58 double initialPageRank = 1.0d / this. vertexCountAsDouble ; 59 double edgeCount = Double.valueOf (( Long) this. messageType.edges(vertex).count ().next ()); 60 vertex. singleProperty(PAGE_RANK , initialPageRank ); 61 vertex. singleProperty(EDGE_COUNT , edgeCount); 62 messenger.sendMessage(this.messageType , initialPageRank / edgeCount); 63 } else { 64 double newPageRank = StreamFactory.stream(messenger. receiveMessages (this.messageType)).reduce (0.0d, (a, b) -> a + b); 65 newPageRank = (this.alpha * newPageRank) + ((1.0d - this .alpha) / this. vertexCountAsDouble ); 66 vertex. singleProperty(PAGE_RANK , newPageRank); 67 messenger.sendMessage(this.messageType , newPageRank / vertex.property(EDGE_COUNT).orElse (0.0d)); 68 } 69 } 70 71 @Override 72 public boolean terminate(final Memory memory) { 73 return memory.getIteration () >= this. totalIterations ; 74 } 75 }

Pagerank for ArangoDB 1 var pageRank = function (vertex , message , global) { 2 var total , rank , edgeCount , send , edge , alpha , sum; 3 total = global.vertexCount; 4 edgeCount = vertex._outEdges.length; 5 alpha = global.alpha; 6 sum = 0; 7 if (global.step > 0) { 8 while (message.hasNext ()) { 9 sum += message.next ().data; 10 } 11 rank = alpha * sum + (1- alpha) / total; 12 } else { 13 rank = 1 / total; 14 } 15 vertex._setResult(rank); 16 if (global.step < global.MAX_STEPS) { 17 send = rank / edgeCount; 18 while (vertex._outEdges.hasNext ()) { 19 edge = vertex._outEdges.next (); 20 message.sendTo(edge._getTarget (), send); 21 } 22 } else { 23 vertex._deactivate (); 24 } 25 }; 26 27 var combiner = function (message , oldMessage) { 28 return message + oldMessage; 29 }; 30 31 var Runner = require ("org/arangodb/pregelRunner ").Runner; 32 var runner = new Runner (); 33 runner.setWorker(pageRank); 34 runner.setCombiner(combiner); 35 runner.start (" myGraph "); 12