Upgrade to Pro — share decks privately, control downloads, hide ads and more …

An Interval-Centric Model for Distributed Compu...

An Interval-Centric Model for Distributed Computing over Temporal Graphs

Algorithms for temporal property graphs may be time-dependent (TD), navigating the structure and time con- currently, or time-independent (TI), operating separately on different snapshots. Currently, there is no unified and scalable programming abstraction to design TI and TD algorithms over large temporal graphs.

We propose an interval-centric computing model (ICM) for distributed and iterative processing of temporal graphs, where a vertex’s time-interval is a unit of data-parallel computation. It introduces a unique time-warp operator for temporal partitioning and grouping of messages that hides the complexity of designing temporal algorithms, while avoiding redundancy in user logic calls and messages sent.

GRAPHITE is our implementation of ICM over Apache Giraph, and we use it to design 12 TI and TD algorithms from literature. We rigorously evaluate its performance for diverse real-world temporal graphs – as large as 131M vertices and 5.5B edges, and as long as 219 snapshots. Our comparison with 4 baseline platforms on a 10-node commodity cluster shows that ICM shares compute and messaging across intervals to out-perform them by up to 25×, and matches them even in worst-case scenarios. GRAPHITE also exhibits weak-scaling with near-perfect efficiency.

These slides were presented at the 36th IEEE International Conference on Data Engineering (ICDE).

Swapnil Gandhi

April 17, 2020
Tweet

Other Decks in Research

Transcript

  1. DISTRIBUTED RESEARCH ON EMERGING APPLICATIONS & MACHINES Department of Computational

    & Data Sciences Indian Institute of Science, Bangalore DREAM:Lab DREAM:Lab An Interval-Centric Model for Distributed Computing over Temporal Graphs Yogesh Simmhan *Swapnil Gandhi
  2. DREAM:Lab Graphs are everywhere… § Web & Social Networks •

    Web graph, Citation Networks, Twitter, Facebook § Cybersecurity • Telecom call logs, financial transactions, Malware § Internet of Things • Transport, Power, Water networks § Bioinformatics • Gene sequencing, Gene expression networks, Protein- Protein Interactions, Brain Networks… 3
  3. DREAM:Lab Plenty of interest in processing them § Many open-source

    and research prototypes for distributed graph processing: Pregel, Apache Giraph, GraphX, GraphLab, Blogel, GoFFish, … 4
  4. DREAM:Lab Temporal Graphs § Temporal Graphs represent state at various

    points in time • Vertices and edges added/removed • Attribute values updated § Ability to collect and store large volume of data • Available data has fine granularity § Additional information associated with graph entities • Gives rise to new concepts, new problems and new computational challenges 6 B D A [3,4) 1 [1,4) 4 C [2,3) 3 [1,2) 3 [2,4) 4
  5. DREAM:Lab Temporal Graph Analytics § Broadly classified in two kinds

    : 1. Time-Independent (TI) algorithms 2. Time-Dependent (TD) algorithms 7
  6. DREAM:Lab Time-Independent Algorithms § Studies evolution of graph properties §

    Well-known existing static graph algorithms applied to snapshots of temporal graphs 8 B D A [3,4) 1 [1,4) 4 C [2,3) 3 [1,2) 3 [2,4) 4 Temporal Graph B D A 4 C 3 Snapshot S1
  7. DREAM:Lab Time-Dependent Algorithms § Discover patterns which respect temporal ordering

    § Temporal ordering is maintained by sequence of temporal edges if • Consecutive edge share a common vertex • Time-points of temporal edges are non-decreasing • Intuitively, a piece of information can propagate, only if interaction respects temporal ordering 9
  8. DREAM:Lab Time-Dependent Algorithms 10 B D A [2,3) 1 [1,2)

    2 C [1,2) 1 [3,4) 3 Does not follow Temporal Ordering Follows Temporal Ordering
  9. DREAM:Lab Interval-Centric Model (ICM) § New system and execution model

    • Think like an interval • Exposes time as a first-class citizen • Purpose built for Distributed Temporal Graph Processing § Contributions • Simple and Generic API • High Performance • Distributed and Scalable 14
  10. DREAM:Lab Challenges Easy to Code Efficient Implementation Transparent Distribution Existing

    Systems Custom Algorithms Interval- Centric Model 15 This Work
  11. DREAM:Lab Think like a Vertex § Vertex-Centric Programming Model •

    Program written from the perspective of a vertex • Executed on all vertices; conceptually in parallel • Computation of a vertex depends on its prior state and state of its neighbors • Vertices know about • Their own state • Their out-going edges • Messages received in previous superstep 17 A B C Superstep Si Superstep Si+1
  12. DREAM:Lab Avoiding Redundancy in ICM Observation: To compute state, a

    vertex-program only depends on messages received from other vertices and its prior state. 18 A M1 M2 S ∮( , ( , ) ) B M3 M4 S Vertex program returns equivalent output for A and B and one of them can be identified as redundant
  13. DREAM:Lab Avoiding Redundancy in ICM 19 B D A C

    B D A C B D A C S1 S2 S3 1 1 1 ∞ → 1 ∞ → 1 ∞ → 1 Time
  14. DREAM:Lab Avoiding Redundancy in ICM 20 B D A C

    B D A C B D A C S1 S2 S3 B B B A AA D D D C CC ∞ → 1 ∞ → 1 ∞ → 1 [1, 4) ∞ → [1, 4) 1 1 1 1 1 2 2 2 [1,4), 2 This results in an order of magnitude reduction in computation calls and messages sent Time
  15. DREAM:Lab TimeWarp § Allows user-logic to consistently operate over multiple

    vertex sub-intervals in parallel § Transparently performs temporal alignment, re- partitioning, replication and grouping of messages • Minimizes compute call; avoiding redundant computation • Uses one-pass algorithm; supports for online aggregation 21 m M [0,4) m1 [2,7) m2 [5,7) m3 [5,9) m4 [9,10) m5 TW S w1 [0,2) s1 m1 w2 [2,4) s1 m1 ,m2 w3 [4,5) s1 m2 w4 [5,7) s2 m2 ,m3 , m4 w5 [7,9) s2 m4 w6 [9,10) s3 m5 s S [0,5) s1 [5,9) s2 [9,10) s3 t =s ⋂m S M [0,4) s1 m1 [2,5) s1 m2 [5,7) s2 m2 [5,7) s2 m3 [5,9) s2 m4 [9,10) s3 m5 S M Time Join Time Warp ⋈ M M ⋈ m M [0,4) m1 [2,7) m2 [5,7) m3 [5,9) m4 [9,10) m5 TW S w1 [0,2) s1 m1 w2 [2,4) s1 m1 ,m2 w3 [4,5) s1 m2 w4 [5,7) s2 m2 ,m3 , m4 w5 [7,9) s2 m4 w6 [9,10) s3 m5 s S [0,5) s1 [5,9) s2 [9,10) s3 t =s ⋂m S M [0,4) s1 m1 [2,5) s1 m2 [5,7) s2 m2 [5,7) s2 m3 [5,9) s2 m4 [9,10) s3 m5 S M Time Join Time Warp ⋈ M M ⋈
  16. DREAM:Lab GRAPHITE API 22 Temporal Reachability [Wu et. al. ,

    ICDE 2016] using ICM Scatter permits user to send message with interval validity to vertex forming sink for edge e. The function has read-only access to vertex state associated with interval and edge properties Compute method is executed at every active interval of a vertex in each superstep. Compute can inspect/modify state associated with active interval and has access to all interval messages received from previous superstep.
  17. DREAM:Lab Experimental Setup § In-house cluster of 8 servers :

    • Each server : 14 Hyperthreads @ 2.1 GHz, 60 GBs RAM • Interconnected via 1 Gigabit Ethernet • JAVA 8.0, Hadoop 3.1.1 and Giraph 1.3.0 § 12 Algorithms [VLDB’14, SIGMOD’15, ICDE’16, VLDB’18] • Traversals : TSSSP, EAT, FAST, LD, TMST, RH, BFS • Clustering : WCC, SCC, LCC • Centrality : PR • Graph Mining : TC (4 Time-Independent : Orange, 8 Time-Dependent : Brown) 24
  18. DREAM:Lab Baselines § Time-Independent Analytics • MSB – Multi-Snapshot Baseline

    • CHL – Chlonos[1] [EuroSys’14] § Time-Dependent Analytics • TGB – Transformed Graph Baseline[2] [VLDB’14] • GOF – Vertex-centric GoFFishTS [3] [IPDPS’15] 25 [1] Han et. al. ”Chronos: a graph engine for temporal graph analysis ” in EuroSys 2014 [2] Wu et. al. ”Path Problems in Temporal Graphs” in VLDB 2014 [3] Simmhan et. al. ”Distributed programming over time-series graphs” in IPDPS 2015
  19. DREAM:Lab Dataset 26 Graph Domain #Snap shots Temporal Graph Average

    Lifespan Avg. # Property Changes per Edge |V| |E| V E Prop. GPlus Social 4 28.9M 462M 2.6 1 1 1 USRN Transport 96 24M 58M 96 96 4.82 20 Reddit Social 121 9.1M 523M 6.6 1.2 1.12 1.03 MAG Citation 219 116M 1B 20.9 15.8 5.26 2.98 Twitter Social 30 43.9M 2.1B 29 28.4 14.8 2 WebUK Web 12 131M 5.5B 9.9 9.4 4.7 2
  20. DREAM:Lab Performance Summary 27 GPlus Reddit USRN Twitter MAG WebUK

    TI MSB 0.95 1.14 0.97 24.79 12.99 5.80 Chlonos 0.96 1.08 0.98 13.29 10.89 6.27 TD TGB 0.95 1.13 2.32 19.90 DNL DNL GoFFish 0.96 1.05 6.49 6.75 4.60 3.71 Ratio of makespan time for baseline platforms over Graphite; averaged for TI and TD algorithms.1× means same performance and >1× means Graphite out-performs baseline.
  21. DREAM:Lab Evaluation (1/3) 28 LOG-LOG Scale Scatter Plot of #Compute

    Calls and #Messages, and their contribution to makespan time Reduced #ComputeCalls and #Messages → Reduced Makespan
  22. DREAM:Lab Evaluation (2/3) [GPlus , 29M/462M] 29 Graphite performs 3%-17%

    slower than baseline due to additional book-keeping costs and interval-message overheads Graphite and BL require equal compute calls
  23. DREAM:Lab Summary § Temporal Graph processing is challenging § Existing

    approaches not ideal § ICM is a scalable & distributed abstraction for programming arbitrary temporal graph algorithms • General and Simple API for users § Out-performs existing solutions by up to 25x , and matches them even in worst-case scenarios 32
  24. 33 [email protected] https://github.com/dream-lab/graphite DISTRIBUTED RESEARCH ON EMERGING APPLICATIONS & MACHINES

    Department of Computational & Data Sciences Indian Institute of Science, Bangalore DREAM:Lab DREAM:Lab