Girafe, Powergraph, GraphLab, GraphChi, Chaos, … …but existing graph processing engines offer little support for dynamic graphs – Some specialized systems exist. E.g., Kineograph, Chronos, not generic enough
generation and computation • Window operations on snapshots • Mix data and graph parallel computations Existing solutions do not satisfy all the requirements
0.556 2.39 0.557 0.557 0.968 0.977 Iteration N b d c e a d 0.556 2.39 0.557 0.557 0.968 0.977 Pause & Shift b d c e a d 0.502 2.07 0.502 0.849 1.224 0.849 Continue from N a b c d e a d x b c d e b d Use vertex state a e d c a b e d c a b e d c f
convergence E.g. PageRank: pause iterating, update snapshot, continue iterating b d c e a d 0.556 2.39 0.557 0.557 0.968 0.977 Iteration N b d c e a d 0.556 2.39 0.557 0.557 0.968 0.977 Pause & Shift b d c e a d 0.502 2.07 0.502 0.849 1.224 0.849 Continue from N
B C D E A A F B C A D F E A D D B C D E A A F Transition (0.977, 0.968) (X , Y): X is 10 iteration PageRank Y is 23 iteration PageRank After 11 iteration on graph 2, Both converge to 3-digit precision (0.977, 0.968) (0.571, 0.556) 1.224 0.849 0.502 (2.33, 2.39) 2.07 0.849 0.502 (0.571, 0.556) (0.571, 0.556)
Need to keep per-vertex state to handle changes Connected components on an evolving graph can be done if each vertex stores its component a b c d e a d x b c d e b d Use vertex state
GraphStream } func: User provided function to do bulk operations on vertices and edges to create a new graph, allows aggregations over vertices and edges transform:Applies func over each snapshot Graphs in a GraphStream
available T = 1 class GraphStream { def StreamingBSP(..., iterationFunc, ...): GraphStream } Combine previous results with new snaphot, continue iterating T = 2 T = 3 Continue until convergence
by a simple call def pageRankEvolGraph(gs: GraphStream) = { def vprog(v: VertexId, msgSum: double) = 0.15+0.85*msgSum return gs.StreamingBSP(1, 100, EdgeDirection.Out, "10s") (vprog, triplet => triplet.src.pr/triplet.src.outDeg, (msgA, msgB) => msgA+msgB) } Listing 3: Page Rank Computation on Time-Evolving Graphs 4.4 Live Graph State Tracking Streaming graph applications may want to keep track of live graph state. For example, social network applications may keep track of Faster convergence than running PageRank from scratch on every snapshot
- 1 part = full graph - 5 parts = 20% of graph in each part Comparison: - Time to complete PageRank in GraphX on full graph - Time to complete streaming PageRank in GraphTau when the graph is streamed in parts
�� �� �� ��� �������������������� ������������������������������������������ ������ ����� ������ ����� � GraphXon whole graph could not converge! GraphTau converged fast when 20% of the graph is streamed at a time Smaller batches lead to faster convergence
Detection of persistent hotspots using incremental connected components - Built specialized system to do temporal analysis Re-implemented on general system GraphTau - Uses mergeByWindow for sliding window analysis - Strawman (baseline) runs non-incremental connected components on whole window of snapshots
2 4 6 8 10 12 Analysis Time (s) Window Size (m) Strawman GraphTau CellIQ GraphTau managed to get performance comparable to specialized system, without domain specific optimizations