Slide 1

Slide 1 text

Time-Evolving Graph Processing at Scale Anand Iyer#, Li Erran Li+, Tathagata Das*, Ion Stoica#* #UC Berkeley +Uber Technologies *Databricks

Slide 2

Slide 2 text

Motivation Dynamically evolving graphs prevalent in many domains – Social networks (e.g., Twitter, Facebook) – Communication networks (e.g. cellular networks) – Internet-of-Things

Slide 3

Slide 3 text

Motivation Many applications need to leverage the evolution characteristics – Product recommendations – Network troubleshooting – Real-time ad placement

Slide 4

Slide 4 text

Motivation Lots of interest in distributed graph processing… – GraphX, Girafe, Powergraph, GraphLab, GraphChi, Chaos, … …but existing graph processing engines offer little support for dynamic graphs – Some specialized systems exist. E.g., Kineograph, Chronos, not generic enough

Slide 5

Slide 5 text

Challenges • Consistent & fault-tolerant snapshot generation • Co-ordinate snapshot generation and computation • Window operations on snapshots • Mix data and graph parallel computations Existing solutions do not satisfy all the requirements

Slide 6

Slide 6 text

GraphTau Abstraction Computational Model b d c e a d 0.556 2.39 0.557 0.557 0.968 0.977 Iteration N b d c e a d 0.556 2.39 0.557 0.557 0.968 0.977 Pause & Shift b d c e a d 0.502 2.07 0.502 0.849 1.224 0.849 Continue from N a b c d e a d x b c d e b d Use vertex state a e d c a b e d c a b e d c f

Slide 7

Slide 7 text

GraphTau a e d c a b e d c f a b e d c t1 t2 t3 GraphTau represents time-evolving graphs as a series of consistent graph snapshots

Slide 8

Slide 8 text

New Computational Models Two new models for processing time-evolving graphs Pause Shift Resume Online Rectification

Slide 9

Slide 9 text

Pause-Shift-Resume Many graph algorithms robust to changes in graph before convergence E.g. PageRank: pause iterating, update snapshot, continue iterating b d c e a d 0.556 2.39 0.557 0.557 0.968 0.977 Iteration N b d c e a d 0.556 2.39 0.557 0.557 0.968 0.977 Pause & Shift b d c e a d 0.502 2.07 0.502 0.849 1.224 0.849 Continue from N

Slide 10

Slide 10 text

Pause-Shift-Resume B C A D F E A D D B C D E A A F B C A D F E A D D B C D E A A F Transition (0.977, 0.968) (X , Y): X is 10 iteration PageRank Y is 23 iteration PageRank After 11 iteration on graph 2, Both converge to 3-digit precision (0.977, 0.968) (0.571, 0.556) 1.224 0.849 0.502 (2.33, 2.39) 2.07 0.849 0.502 (0.571, 0.556) (0.571, 0.556)

Slide 11

Slide 11 text

Online Rectification Model Many graph algorithms not resilient to changes Need to keep per-vertex state to handle changes Connected components on an evolving graph can be done if each vertex stores its component a b c d e a d x b c d e b d Use vertex state

Slide 12

Slide 12 text

Abstraction GraphStream[V,E]: Represents a series of Graph[V,E] snapshots where V = vertices, E = edges Graph[V ,E] @ T = 1 Graph[V ,E] @ T = 2 Graph[V ,E] @ T = 3 Graph[V ,E] @ T = 4 GraphStream[V,E]

Slide 13

Slide 13 text

Operations: transform class GraphStream { def transform(func: Graph => Graph): GraphStream } func: User provided function to do bulk operations on vertices and edges to create a new graph, allows aggregations over vertices and edges transform:Applies func over each snapshot Graphs in a GraphStream

Slide 14

Slide 14 text

Operations: transform class GraphStream { def transform(func: Graph => Graph): GraphStream } T = 1 T = 2 T = 3 T = 4 Original GraphStream Transformed GraphStream func func func func

Slide 15

Slide 15 text

Operations: sliding windows T = 1 T = 2 T = 3 T = 4 Original GraphStream Windowed GraphStream class GraphStream { def mergeWindows( aggregationFuncs, windowLength, slidingInterval): GraphStream } aggregationFuncs windowLen slidingInterval

Slide 16

Slide 16 text

Differential Computation: Pause-shift-resume and Online Rectification incorporated into an efficient Pregel-style computation implementation Effectively an extension of the Pregel iterative processing model for time-evolving graphs

Slide 17

Slide 17 text

Operations: StreamingBSP GraphStream Apply Pregel iterationFunc until next snapshot is available T = 1 class GraphStream { def StreamingBSP(..., iterationFunc, ...): GraphStream } Combine previous results with new snaphot, continue iterating T = 2 T = 3 Continue until convergence

Slide 18

Slide 18 text

PageRank using StreamingBSP PageRank computation on streaming graphs easily achieved by a simple call def pageRankEvolGraph(gs: GraphStream) = { def vprog(v: VertexId, msgSum: double) = 0.15+0.85*msgSum return gs.StreamingBSP(1, 100, EdgeDirection.Out, "10s") (vprog, triplet => triplet.src.pr/triplet.src.outDeg, (msgA, msgB) => msgA+msgB) } Listing 3: Page Rank Computation on Time-Evolving Graphs 4.4 Live Graph State Tracking Streaming graph applications may want to keep track of live graph state. For example, social network applications may keep track of Faster convergence than running PageRank from scratch on every snapshot

Slide 19

Slide 19 text

Operations: updateLocalState class GraphStream { def updateLocalState (stateUpdateFunc, initialState): LocalStateStream } GraphStream T = 1 initialState T = 2 T = 3 stateUpdateFunc Keep updating non-graph "state" as graph evolves

Slide 20

Slide 20 text

Implementation Implemented on Apache Spark platform - Spark Streaming: stream processing engine - GraphX: graph processing engine GraphTau implemented by combining Spark Streaming and Graphx - Novel optimizations to implement the GraphStream abstraction

Slide 21

Slide 21 text

Other Benefits Spark Streaming, GraphX built on Spark's RDDs RDDs guarantees fault-tolerance and consistency of datasets In addition, allows mixing data and graph parallel computations in GraphStream

Slide 22

Slide 22 text

Preliminary Results • Algorithms: – PageRank – Connected Components • Setup: 16 Amazon EC2 instances • Datasets: – Twitter follow graph: 41M vertices, ~1.5B edges – Live LTE network: 2M vertices, variable edges

Slide 23

Slide 23 text

Preliminary Results: PageRank Dataset: Twitter Graph broken in to parts: - 1 part = full graph - 5 parts = 20% of graph in each part Comparison: - Time to complete PageRank in GraphX on full graph - Time to complete streaming PageRank in GraphTau when the graph is streamed in parts

Slide 24

Slide 24 text

Preliminary Results: PageRank �� ���� ���� ���� ���� ����� �� �� �� �� ��� �������������������� ������������������������������������������ ������ ����� ������ ����� � GraphXon whole graph could not converge! GraphTau converged fast when 20% of the graph is streamed at a time Smaller batches lead to faster convergence

Slide 25

Slide 25 text

Preliminary Results: Cell IQ CellIQ (NSDI 2015): Prior work - Detection of persistent hotspots using incremental connected components - Built specialized system to do temporal analysis Re-implemented on general system GraphTau - Uses mergeByWindow for sliding window analysis - Strawman (baseline) runs non-incremental connected components on whole window of snapshots

Slide 26

Slide 26 text

Preliminary Results: Cell IQ 0 2 4 6 8 0 2 4 6 8 10 12 Analysis Time (s) Window Size (m) Strawman GraphTau CellIQ GraphTau managed to get performance comparable to specialized system, without domain specific optimizations

Slide 27

Slide 27 text

Takeways GraphTau General purpose processing engine for time-evolving graphs GraphStream abstraction that provides Consistent & fault-tolerant snapshot generation Co-ordinate snapshotting and computation Sliding window operations Mix data and graph parallel computations