Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Monarch: Gaining Command on Geo-Distributed Graph Analytics

Monarch: Gaining Command on Geo-Distributed Graph Analytics

Anand Iyer

July 09, 2018
Tweet

More Decks by Anand Iyer

Other Decks in Research

Transcript

  1. Monarch: Gaining Command on
    Geo-Distributed Graph Analytics
    Anand Iyer ⋆, Aurojit Panda ▪, Mosharaf Chowdhury▴,
    Aditya Akella⬩, Scott Shenker ⋆, Ion Stoica ⋆
    ⋆ UC Berkeley ▪ NYU ⬩ University of Wisconsin ▴ University of Michigan
    HotCloud, July 09, 2018

    View Slide

  2. Graph Analytics Popular

    View Slide

  3. Graph Analytics Popular
    Assume graph is aggregated to a single DC

    View Slide

  4. Social Networks

    View Slide

  5. Cellular Network Analytics

    View Slide

  6. Financial Network Analytics
    Image courtesy: Neo4J

    View Slide

  7. View Slide

  8. Generate data in a geo-distributed
    fashion

    View Slide

  9. Can benefit from timely analysis
    Generate data in a geo-distributed
    fashion

    View Slide

  10. How do we perform efficient
    geo-distributed graph analytics?

    View Slide

  11. Apply query on samples of the input data
    Geo-Distributed Analytics (GDA)
    Slide courtesy: Clarinet authors

    View Slide

  12. Apply query on samples of the input data
    Geo-Distributed Analytics (GDA)
    Clarinet [OSDI ‘16]
    Slide courtesy: Clarinet authors

    View Slide

  13. Can we use the same idea on graphs?
    § GDA focuses on simple task placement/queries
    § Graph analytics iterative in nature
    § Flexibility over data placement and join sites
    § Graph partitioning difficult
    § Estimating intermediate data
    § Difficult in graph algorithms
    Geo-Distributed Analytics on Graphs

    View Slide

  14. Can we use the same idea on graphs?
    § GDA focuses on simple task placement/queries
    § Graph analytics iterative in nature
    § Flexibility over data placement and join sites
    § Graph partitioning difficult
    § Estimating intermediate data
    § Difficult in graph algorithms
    Geo-Distributed Analytics on Graphs
    Key: Optimizing iterative
    graph-parallel processing

    View Slide

  15. Graph Parallel Processing

    View Slide

  16. Graph Parallel Processing
    Gather: Accumulate information from neighborhood

    View Slide

  17. Graph Parallel Processing
    Gather: Accumulate information from neighborhood
    Apply: Apply the accumulated value

    View Slide

  18. Graph Parallel Processing
    Gather: Accumulate information from neighborhood
    Apply: Apply the accumulated value
    Scatter: Update adjacent edges & vertices with new
    value

    View Slide

  19. Our Proposal: Monarch
    DC 1
    DC 3
    DC 2
    DC 4

    View Slide

  20. Our Proposal: Monarch
    DC 1
    DC 3
    DC 2
    DC 4
    Sparsification

    View Slide

  21. Our Proposal: Monarch
    DC 1
    DC 3
    DC 2
    DC 4
    Sparsification
    Execution Model

    View Slide

  22. Our Proposal: Monarch
    DC 1
    DC 3
    DC 2
    DC 4
    Sparsification
    Execution Model
    WAN Awareness

    View Slide

  23. Graph Sparsification
    § Sparsification extensively studied in graph theory
    § Idea: approximate the graph using a sparse, much smaller graph
    § Drop edges/vertices
    § Sparsify without accuracy loss
    § Only worry about reducing cross-DC entities
    § Leverage graph-parallel model and
    algorithm properties
    0
    1 4
    2 3
    0
    1 4
    2 3
    DC1 DC2

    View Slide

  24. Graph Sparsification
    § Sparsification extensively studied in graph theory
    § Idea: approximate the graph using a sparse, much smaller graph
    § Drop edges/vertices
    § Sparsify without accuracy loss
    § Only worry about reducing cross-DC entities
    § Leverage graph-parallel model and
    algorithm properties
    0
    1 4
    2 3
    0
    1 4
    2 3
    DC1 DC2

    View Slide

  25. Geo-Distributed Graph Computation Model
    DC 1
    DC 3
    DC 2
    DC 4

    View Slide

  26. Geo-Distributed Graph Computation Model
    DC 1
    DC 3
    DC 2
    DC 4
    Bootstrap

    View Slide

  27. Geo-Distributed Graph Computation Model
    DC 1
    DC 3
    DC 2
    DC 4
    Bootstrap

    View Slide

  28. Geo-Distributed Graph Computation Model
    DC 1
    DC 3
    DC 2
    DC 4
    Bootstrap
    Global Sync

    View Slide

  29. Geo-Distributed Graph Computation Model
    DC 1
    DC 3
    DC 2
    DC 4
    Bootstrap
    Global Sync

    View Slide

  30. Geo-Distributed Graph Computation Model
    DC 1
    DC 3
    DC 2
    DC 4
    Bootstrap
    Global Sync
    iGAS

    View Slide

  31. Geo-Distributed Graph Computation Model
    DC 1
    DC 3
    DC 2
    DC 4
    Bootstrap
    Global Sync
    iGAS

    View Slide

  32. Incremental GAS Model
    D
    C
    E F
    B A

    View Slide

  33. Incremental GAS Model
    D
    C
    E F
    B A

    View Slide

  34. Incremental GAS Model
    D
    C
    E F
    B A

    View Slide

  35. Incremental GAS Model
    D
    C
    E F
    B A

    View Slide

  36. Incremental GAS Model
    D
    C
    E F
    B A

    View Slide

  37. Incremental GAS Model
    D
    C
    E F
    B A

    View Slide

  38. Incremental GAS Model
    D
    C
    E F
    B A

    View Slide

  39. Incremental GAS Model
    D
    C
    E F
    B A

    View Slide

  40. Incremental GAS Model
    D
    C
    E F
    B A
    Which graph algorithms can use the iGAS model?
    How much state needs to be kept at the entities for accuracy?

    View Slide

  41. Geo-Distributed Graph Computation Model
    DC 1
    DC 3
    DC 2
    DC 4
    Bootstrap
    Global Sync
    iGAS

    View Slide

  42. Geo-Distributed Graph Computation Model
    DC 1
    DC 3
    DC 2
    DC 4
    Bootstrap
    Global Sync
    iGAS

    View Slide

  43. Geo-Distributed Graph Computation Model
    DC 1
    DC 3
    DC 2
    DC 4
    Bootstrap
    Global Sync
    iGAS

    View Slide

  44. Geo-Distributed Graph Computation Model
    DC 1
    DC 3
    DC 2
    DC 4
    Bootstrap
    Global Sync
    iGAS
    Apply GDA techniques on task
    placement and data movement

    View Slide

  45. Evaluation of Potential
    § 16 node Apache Spark cluster across 4 regions
    § Modified GraphX to incorporate the proposed model

    ���
    ���
    ���
    ���
    ���
    ������ �������

    ����
    ����
    ����
    ����
    �����
    �����
    �����
    ��������� ���� ���
    ���� ����������� ����
    ��������� ����
    ���
    ���� ����������
    0
    100
    200
    300
    400
    500
    4 regions 2 regions
    0
    2000
    4000
    6000
    8000
    10000
    12000
    14000
    Execution Time (s)
    Data Transferred (MB)
    Execution Time
    Data Transfered

    View Slide

  46. Other Open Questions
    § Convergence properties due to our modified
    execution model
    § Better execution models at bootstrap stage
    § How would the global sync work?
    § Multi-tenancy
    § Would it provide opportunities to leverage existing GDA techniques?
    § Graph updates
    § What is an incremental model in this case?

    View Slide

  47. Conclusion
    § Several emerging applications produce graph data
    in a geo-distributed fashion
    § Can benefit from geo-distributed graph analytics.
    § Our proposal Monarch:
    § Early attempt at bringing geo-distributed analytics to graph
    processing.
    § Initial results are encouraging.
    http://www.cs.berkeley.edu/~api
    [email protected]

    View Slide