Slide 1

Slide 1 text

Monarch: Gaining Command on Geo-Distributed Graph Analytics Anand Iyer ⋆, Aurojit Panda ▪, Mosharaf Chowdhury▴, Aditya Akella⬩, Scott Shenker ⋆, Ion Stoica ⋆ ⋆ UC Berkeley ▪ NYU ⬩ University of Wisconsin ▴ University of Michigan HotCloud, July 09, 2018

Slide 2

Slide 2 text

Graph Analytics Popular

Slide 3

Slide 3 text

Graph Analytics Popular Assume graph is aggregated to a single DC

Slide 4

Slide 4 text

Social Networks

Slide 5

Slide 5 text

Cellular Network Analytics

Slide 6

Slide 6 text

Financial Network Analytics Image courtesy: Neo4J

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

Generate data in a geo-distributed fashion

Slide 9

Slide 9 text

Can benefit from timely analysis Generate data in a geo-distributed fashion

Slide 10

Slide 10 text

How do we perform efficient geo-distributed graph analytics?

Slide 11

Slide 11 text

Apply query on samples of the input data Geo-Distributed Analytics (GDA) Slide courtesy: Clarinet authors

Slide 12

Slide 12 text

Apply query on samples of the input data Geo-Distributed Analytics (GDA) Clarinet [OSDI ‘16] Slide courtesy: Clarinet authors

Slide 13

Slide 13 text

Can we use the same idea on graphs? § GDA focuses on simple task placement/queries § Graph analytics iterative in nature § Flexibility over data placement and join sites § Graph partitioning difficult § Estimating intermediate data § Difficult in graph algorithms Geo-Distributed Analytics on Graphs

Slide 14

Slide 14 text

Can we use the same idea on graphs? § GDA focuses on simple task placement/queries § Graph analytics iterative in nature § Flexibility over data placement and join sites § Graph partitioning difficult § Estimating intermediate data § Difficult in graph algorithms Geo-Distributed Analytics on Graphs Key: Optimizing iterative graph-parallel processing

Slide 15

Slide 15 text

Graph Parallel Processing

Slide 16

Slide 16 text

Graph Parallel Processing Gather: Accumulate information from neighborhood

Slide 17

Slide 17 text

Graph Parallel Processing Gather: Accumulate information from neighborhood Apply: Apply the accumulated value

Slide 18

Slide 18 text

Graph Parallel Processing Gather: Accumulate information from neighborhood Apply: Apply the accumulated value Scatter: Update adjacent edges & vertices with new value

Slide 19

Slide 19 text

Our Proposal: Monarch DC 1 DC 3 DC 2 DC 4

Slide 20

Slide 20 text

Our Proposal: Monarch DC 1 DC 3 DC 2 DC 4 Sparsification

Slide 21

Slide 21 text

Our Proposal: Monarch DC 1 DC 3 DC 2 DC 4 Sparsification Execution Model

Slide 22

Slide 22 text

Our Proposal: Monarch DC 1 DC 3 DC 2 DC 4 Sparsification Execution Model WAN Awareness

Slide 23

Slide 23 text

Graph Sparsification § Sparsification extensively studied in graph theory § Idea: approximate the graph using a sparse, much smaller graph § Drop edges/vertices § Sparsify without accuracy loss § Only worry about reducing cross-DC entities § Leverage graph-parallel model and algorithm properties 0 1 4 2 3 0 1 4 2 3 DC1 DC2

Slide 24

Slide 24 text

Graph Sparsification § Sparsification extensively studied in graph theory § Idea: approximate the graph using a sparse, much smaller graph § Drop edges/vertices § Sparsify without accuracy loss § Only worry about reducing cross-DC entities § Leverage graph-parallel model and algorithm properties 0 1 4 2 3 0 1 4 2 3 DC1 DC2

Slide 25

Slide 25 text

Geo-Distributed Graph Computation Model DC 1 DC 3 DC 2 DC 4

Slide 26

Slide 26 text

Geo-Distributed Graph Computation Model DC 1 DC 3 DC 2 DC 4 Bootstrap

Slide 27

Slide 27 text

Geo-Distributed Graph Computation Model DC 1 DC 3 DC 2 DC 4 Bootstrap

Slide 28

Slide 28 text

Geo-Distributed Graph Computation Model DC 1 DC 3 DC 2 DC 4 Bootstrap Global Sync

Slide 29

Slide 29 text

Geo-Distributed Graph Computation Model DC 1 DC 3 DC 2 DC 4 Bootstrap Global Sync

Slide 30

Slide 30 text

Geo-Distributed Graph Computation Model DC 1 DC 3 DC 2 DC 4 Bootstrap Global Sync iGAS

Slide 31

Slide 31 text

Geo-Distributed Graph Computation Model DC 1 DC 3 DC 2 DC 4 Bootstrap Global Sync iGAS

Slide 32

Slide 32 text

Incremental GAS Model D C E F B A

Slide 33

Slide 33 text

Incremental GAS Model D C E F B A

Slide 34

Slide 34 text

Incremental GAS Model D C E F B A

Slide 35

Slide 35 text

Incremental GAS Model D C E F B A

Slide 36

Slide 36 text

Incremental GAS Model D C E F B A

Slide 37

Slide 37 text

Incremental GAS Model D C E F B A

Slide 38

Slide 38 text

Incremental GAS Model D C E F B A

Slide 39

Slide 39 text

Incremental GAS Model D C E F B A

Slide 40

Slide 40 text

Incremental GAS Model D C E F B A Which graph algorithms can use the iGAS model? How much state needs to be kept at the entities for accuracy?

Slide 41

Slide 41 text

Geo-Distributed Graph Computation Model DC 1 DC 3 DC 2 DC 4 Bootstrap Global Sync iGAS

Slide 42

Slide 42 text

Geo-Distributed Graph Computation Model DC 1 DC 3 DC 2 DC 4 Bootstrap Global Sync iGAS

Slide 43

Slide 43 text

Geo-Distributed Graph Computation Model DC 1 DC 3 DC 2 DC 4 Bootstrap Global Sync iGAS

Slide 44

Slide 44 text

Geo-Distributed Graph Computation Model DC 1 DC 3 DC 2 DC 4 Bootstrap Global Sync iGAS Apply GDA techniques on task placement and data movement

Slide 45

Slide 45 text

Evaluation of Potential § 16 node Apache Spark cluster across 4 regions § Modified GraphX to incorporate the proposed model � ��� ��� ��� ��� ��� ������ ������� � ���� ���� ���� ���� ����� ����� ����� ��������� ���� ��� ���� ����������� ���� ��������� ���� ��� ���� ���������� 0 100 200 300 400 500 4 regions 2 regions 0 2000 4000 6000 8000 10000 12000 14000 Execution Time (s) Data Transferred (MB) Execution Time Data Transfered

Slide 46

Slide 46 text

Other Open Questions § Convergence properties due to our modified execution model § Better execution models at bootstrap stage § How would the global sync work? § Multi-tenancy § Would it provide opportunities to leverage existing GDA techniques? § Graph updates § What is an incremental model in this case?

Slide 47

Slide 47 text

Conclusion § Several emerging applications produce graph data in a geo-distributed fashion § Can benefit from geo-distributed graph analytics. § Our proposal Monarch: § Early attempt at bringing geo-distributed analytics to graph processing. § Initial results are encouraging. http://www.cs.berkeley.edu/~api [email protected]