Monarch: Gaining Command on Geo-Distributed Graph Analytics

Monarch: Gaining Command on Geo-Distributed Graph Analytics

0ff46442256bf55681d64027c68beea7?s=128

Anand Iyer

July 09, 2018
Tweet

Transcript

  1. Monarch: Gaining Command on Geo-Distributed Graph Analytics Anand Iyer ⋆,

    Aurojit Panda ▪, Mosharaf Chowdhury▴, Aditya Akella⬩, Scott Shenker ⋆, Ion Stoica ⋆ ⋆ UC Berkeley ▪ NYU ⬩ University of Wisconsin ▴ University of Michigan HotCloud, July 09, 2018
  2. Graph Analytics Popular

  3. Graph Analytics Popular Assume graph is aggregated to a single

    DC
  4. Social Networks

  5. Cellular Network Analytics

  6. Financial Network Analytics Image courtesy: Neo4J

  7. None
  8. Generate data in a geo-distributed fashion

  9. Can benefit from timely analysis Generate data in a geo-distributed

    fashion
  10. How do we perform efficient geo-distributed graph analytics?

  11. Apply query on samples of the input data Geo-Distributed Analytics

    (GDA) Slide courtesy: Clarinet authors
  12. Apply query on samples of the input data Geo-Distributed Analytics

    (GDA) Clarinet [OSDI ‘16] Slide courtesy: Clarinet authors
  13. Can we use the same idea on graphs? § GDA

    focuses on simple task placement/queries § Graph analytics iterative in nature § Flexibility over data placement and join sites § Graph partitioning difficult § Estimating intermediate data § Difficult in graph algorithms Geo-Distributed Analytics on Graphs
  14. Can we use the same idea on graphs? § GDA

    focuses on simple task placement/queries § Graph analytics iterative in nature § Flexibility over data placement and join sites § Graph partitioning difficult § Estimating intermediate data § Difficult in graph algorithms Geo-Distributed Analytics on Graphs Key: Optimizing iterative graph-parallel processing
  15. Graph Parallel Processing

  16. Graph Parallel Processing Gather: Accumulate information from neighborhood

  17. Graph Parallel Processing Gather: Accumulate information from neighborhood Apply: Apply

    the accumulated value
  18. Graph Parallel Processing Gather: Accumulate information from neighborhood Apply: Apply

    the accumulated value Scatter: Update adjacent edges & vertices with new value
  19. Our Proposal: Monarch DC 1 DC 3 DC 2 DC

    4
  20. Our Proposal: Monarch DC 1 DC 3 DC 2 DC

    4 Sparsification
  21. Our Proposal: Monarch DC 1 DC 3 DC 2 DC

    4 Sparsification Execution Model
  22. Our Proposal: Monarch DC 1 DC 3 DC 2 DC

    4 Sparsification Execution Model WAN Awareness
  23. Graph Sparsification § Sparsification extensively studied in graph theory §

    Idea: approximate the graph using a sparse, much smaller graph § Drop edges/vertices § Sparsify without accuracy loss § Only worry about reducing cross-DC entities § Leverage graph-parallel model and algorithm properties 0 1 4 2 3 0 1 4 2 3 DC1 DC2
  24. Graph Sparsification § Sparsification extensively studied in graph theory §

    Idea: approximate the graph using a sparse, much smaller graph § Drop edges/vertices § Sparsify without accuracy loss § Only worry about reducing cross-DC entities § Leverage graph-parallel model and algorithm properties 0 1 4 2 3 0 1 4 2 3 DC1 DC2
  25. Geo-Distributed Graph Computation Model DC 1 DC 3 DC 2

    DC 4
  26. Geo-Distributed Graph Computation Model DC 1 DC 3 DC 2

    DC 4 Bootstrap
  27. Geo-Distributed Graph Computation Model DC 1 DC 3 DC 2

    DC 4 Bootstrap
  28. Geo-Distributed Graph Computation Model DC 1 DC 3 DC 2

    DC 4 Bootstrap Global Sync
  29. Geo-Distributed Graph Computation Model DC 1 DC 3 DC 2

    DC 4 Bootstrap Global Sync
  30. Geo-Distributed Graph Computation Model DC 1 DC 3 DC 2

    DC 4 Bootstrap Global Sync iGAS
  31. Geo-Distributed Graph Computation Model DC 1 DC 3 DC 2

    DC 4 Bootstrap Global Sync iGAS
  32. Incremental GAS Model D C E F B A

  33. Incremental GAS Model D C E F B A

  34. Incremental GAS Model D C E F B A

  35. Incremental GAS Model D C E F B A

  36. Incremental GAS Model D C E F B A

  37. Incremental GAS Model D C E F B A

  38. Incremental GAS Model D C E F B A

  39. Incremental GAS Model D C E F B A

  40. Incremental GAS Model D C E F B A Which

    graph algorithms can use the iGAS model? How much state needs to be kept at the entities for accuracy?
  41. Geo-Distributed Graph Computation Model DC 1 DC 3 DC 2

    DC 4 Bootstrap Global Sync iGAS
  42. Geo-Distributed Graph Computation Model DC 1 DC 3 DC 2

    DC 4 Bootstrap Global Sync iGAS
  43. Geo-Distributed Graph Computation Model DC 1 DC 3 DC 2

    DC 4 Bootstrap Global Sync iGAS
  44. Geo-Distributed Graph Computation Model DC 1 DC 3 DC 2

    DC 4 Bootstrap Global Sync iGAS Apply GDA techniques on task placement and data movement
  45. Evaluation of Potential § 16 node Apache Spark cluster across

    4 regions § Modified GraphX to incorporate the proposed model � ��� ��� ��� ��� ��� ������ ������� � ���� ���� ���� ���� ����� ����� ����� ��������� ���� ��� ���� ����������� ���� ��������� ���� ��� ���� ���������� 0 100 200 300 400 500 4 regions 2 regions 0 2000 4000 6000 8000 10000 12000 14000 Execution Time (s) Data Transferred (MB) Execution Time Data Transfered
  46. Other Open Questions § Convergence properties due to our modified

    execution model § Better execution models at bootstrap stage § How would the global sync work? § Multi-tenancy § Would it provide opportunities to leverage existing GDA techniques? § Graph updates § What is an incremental model in this case?
  47. Conclusion § Several emerging applications produce graph data in a

    geo-distributed fashion § Can benefit from geo-distributed graph analytics. § Our proposal Monarch: § Early attempt at bringing geo-distributed analytics to graph processing. § Initial results are encouraging. http://www.cs.berkeley.edu/~api api@cs.berkeley.edu