$30 off During Our Annual Pro Sale. View Details »

An Interval-Centric Model for Distributed Computing over Temporal Graphs

An Interval-Centric Model for Distributed Computing over Temporal Graphs

Algorithms for temporal property graphs may be time-dependent (TD), navigating the structure and time con- currently, or time-independent (TI), operating separately on different snapshots. Currently, there is no unified and scalable programming abstraction to design TI and TD algorithms over large temporal graphs.

We propose an interval-centric computing model (ICM) for distributed and iterative processing of temporal graphs, where a vertex’s time-interval is a unit of data-parallel computation. It introduces a unique time-warp operator for temporal partitioning and grouping of messages that hides the complexity of designing temporal algorithms, while avoiding redundancy in user logic calls and messages sent.

GRAPHITE is our implementation of ICM over Apache Giraph, and we use it to design 12 TI and TD algorithms from literature. We rigorously evaluate its performance for diverse real-world temporal graphs – as large as 131M vertices and 5.5B edges, and as long as 219 snapshots. Our comparison with 4 baseline platforms on a 10-node commodity cluster shows that ICM shares compute and messaging across intervals to out-perform them by up to 25×, and matches them even in worst-case scenarios. GRAPHITE also exhibits weak-scaling with near-perfect efficiency.

These slides were presented at the 36th IEEE International Conference on Data Engineering (ICDE).

Swapnil Gandhi

April 17, 2020
Tweet

Other Decks in Research

Transcript

  1. DISTRIBUTED RESEARCH ON EMERGING APPLICATIONS & MACHINES
    Department of Computational & Data Sciences
    Indian Institute of Science, Bangalore DREAM:Lab
    DREAM:Lab
    An Interval-Centric Model for
    Distributed Computing over
    Temporal Graphs
    Yogesh Simmhan
    *Swapnil Gandhi

    View Slide

  2. DREAM:Lab
    Interconnected World
    2
    Social Network
    Road Network
    Mumbai Rail Network

    View Slide

  3. DREAM:Lab
    Graphs are everywhere…
    § Web & Social Networks
    • Web graph, Citation Networks, Twitter, Facebook
    § Cybersecurity
    • Telecom call logs, financial transactions, Malware
    § Internet of Things
    • Transport, Power, Water networks
    § Bioinformatics
    • Gene sequencing, Gene expression networks, Protein-
    Protein Interactions, Brain Networks…
    3

    View Slide

  4. DREAM:Lab
    Plenty of interest in processing them
    § Many open-source and research prototypes
    for distributed graph processing:
    Pregel, Apache Giraph, GraphX, GraphLab,
    Blogel, GoFFish, …
    4

    View Slide

  5. DREAM:Lab
    But graphs vary over time…
    5
    Transportation Network Social Network

    View Slide

  6. DREAM:Lab
    Temporal Graphs
    § Temporal Graphs represent state at various
    points in time
    • Vertices and edges added/removed
    • Attribute values updated
    § Ability to collect and store large
    volume of data
    • Available data has fine granularity
    § Additional information associated
    with graph entities
    • Gives rise to new concepts, new problems and new
    computational challenges
    6
    B
    D
    A [3,4) 1
    [1,4) 4
    C
    [2,3) 3
    [1,2) 3
    [2,4) 4

    View Slide

  7. DREAM:Lab
    Temporal Graph Analytics
    § Broadly classified in two kinds :
    1. Time-Independent (TI) algorithms
    2. Time-Dependent (TD) algorithms
    7

    View Slide

  8. DREAM:Lab
    Time-Independent Algorithms
    § Studies evolution of graph properties
    § Well-known existing static graph algorithms
    applied to snapshots of temporal graphs
    8
    B
    D
    A [3,4) 1
    [1,4) 4
    C
    [2,3) 3
    [1,2) 3
    [2,4) 4
    Temporal Graph
    B
    D
    A
    4
    C
    3
    Snapshot S1

    View Slide

  9. DREAM:Lab
    Time-Dependent Algorithms
    § Discover patterns which respect temporal
    ordering
    § Temporal ordering is maintained by sequence
    of temporal edges if
    • Consecutive edge share a common vertex
    • Time-points of temporal edges are non-decreasing
    • Intuitively, a piece of information can
    propagate, only if interaction respects
    temporal ordering
    9

    View Slide

  10. DREAM:Lab
    Time-Dependent Algorithms
    10
    B
    D
    A [2,3) 1
    [1,2) 2
    C
    [1,2) 1
    [3,4) 3
    Does not follow
    Temporal Ordering
    Follows
    Temporal Ordering

    View Slide

  11. DREAM:Lab
    Challenges
    Easy to
    Code
    Efficient
    Implementation
    Transparent
    Distribution
    11

    View Slide

  12. DREAM:Lab
    Challenges
    Easy to
    Code
    Efficient
    Implementation
    Transparent
    Distribution
    Existing
    Systems
    12

    View Slide

  13. DREAM:Lab
    Challenges
    Easy to
    Code
    Efficient
    Implementation
    Transparent
    Distribution
    Existing
    Systems
    Custom
    Algorithms
    13

    View Slide

  14. DREAM:Lab
    Interval-Centric Model (ICM)
    § New system and execution model
    • Think like an interval
    • Exposes time as a first-class citizen
    • Purpose built for Distributed Temporal Graph
    Processing
    § Contributions
    • Simple and Generic API
    • High Performance
    • Distributed and Scalable
    14

    View Slide

  15. DREAM:Lab
    Challenges
    Easy to
    Code
    Efficient
    Implementation
    Transparent
    Distribution
    Existing
    Systems
    Custom
    Algorithms
    Interval-
    Centric Model
    15
    This Work

    View Slide

  16. DREAM:Lab
    Interval-Centric
    Computation
    A scalable & distributed abstraction for
    temporal graph processing
    16

    View Slide

  17. DREAM:Lab
    Think like a Vertex
    § Vertex-Centric Programming Model
    • Program written from the perspective of a vertex
    • Executed on all vertices; conceptually in parallel
    • Computation of a vertex depends on its prior state
    and state of its neighbors
    • Vertices know about
    • Their own state
    • Their out-going edges
    • Messages received in previous superstep
    17
    A
    B
    C
    Superstep Si
    Superstep Si+1

    View Slide

  18. DREAM:Lab
    Avoiding Redundancy in ICM
    Observation:
    To compute state, a vertex-program only
    depends on messages received from other
    vertices and its prior state.
    18
    A
    M1
    M2
    S ∮( , ( , ) ) B
    M3
    M4
    S
    Vertex program returns equivalent output for A and B
    and one of them can be identified as redundant

    View Slide

  19. DREAM:Lab
    Avoiding Redundancy in ICM
    19
    B
    D
    A
    C B
    D
    A
    C B
    D
    A
    C
    S1
    S2
    S3
    1 1 1
    ∞ → 1 ∞ → 1 ∞ → 1
    Time

    View Slide

  20. DREAM:Lab
    Avoiding Redundancy in ICM
    20
    B
    D
    A
    C B
    D
    A
    C B
    D
    A
    C
    S1
    S2
    S3
    B
    B
    B
    A
    AA D
    D
    D
    C
    CC
    ∞ → 1 ∞ → 1 ∞ → 1
    [1, 4) ∞ → [1, 4) 1
    1 1 1 1
    2 2 2
    [1,4), 2
    This results in an order of magnitude reduction in
    computation calls and messages sent
    Time

    View Slide

  21. DREAM:Lab
    TimeWarp
    § Allows user-logic to consistently operate over
    multiple vertex sub-intervals in parallel
    § Transparently performs temporal alignment, re-
    partitioning, replication and grouping of messages
    • Minimizes compute call; avoiding redundant computation
    • Uses one-pass algorithm; supports for online aggregation
    21
    m
    M
    [0,4) m1
    [2,7) m2
    [5,7) m3
    [5,9) m4
    [9,10) m5
    TW S
    w1
    [0,2) s1
    m1
    w2
    [2,4) s1
    m1
    ,m2
    w3
    [4,5) s1
    m2
    w4
    [5,7) s2
    m2
    ,m3
    , m4
    w5
    [7,9) s2
    m4
    w6
    [9,10) s3
    m5
    s
    S
    [0,5) s1
    [5,9) s2
    [9,10) s3
    t
    =s
    ⋂m
    S M
    [0,4) s1
    m1
    [2,5) s1
    m2
    [5,7) s2
    m2
    [5,7) s2
    m3
    [5,9) s2
    m4
    [9,10) s3
    m5
    S M Time Join Time Warp
    ⋈ M M

    m
    M
    [0,4) m1
    [2,7) m2
    [5,7) m3
    [5,9) m4
    [9,10) m5
    TW S
    w1
    [0,2) s1
    m1
    w2
    [2,4) s1
    m1
    ,m2
    w3
    [4,5) s1
    m2
    w4
    [5,7) s2
    m2
    ,m3
    , m4
    w5
    [7,9) s2
    m4
    w6
    [9,10) s3
    m5
    s
    S
    [0,5) s1
    [5,9) s2
    [9,10) s3
    t
    =s
    ⋂m
    S M
    [0,4) s1
    m1
    [2,5) s1
    m2
    [5,7) s2
    m2
    [5,7) s2
    m3
    [5,9) s2
    m4
    [9,10) s3
    m5
    S M Time Join Time Warp
    ⋈ M M

    View Slide

  22. DREAM:Lab
    GRAPHITE API
    22
    Temporal Reachability [Wu et. al. , ICDE 2016] using ICM
    Scatter permits user to send message with interval validity to vertex
    forming sink for edge e. The function has read-only access to vertex state
    associated with interval and edge properties
    Compute method is
    executed at every
    active interval of a
    vertex in each
    superstep. Compute
    can inspect/modify
    state associated
    with active interval
    and has access to all
    interval messages
    received from
    previous superstep.

    View Slide

  23. DREAM:Lab
    Experimental
    Evaluation
    23

    View Slide

  24. DREAM:Lab
    Experimental Setup
    § In-house cluster of 8 servers :
    • Each server : 14 Hyperthreads @ 2.1 GHz, 60 GBs RAM
    • Interconnected via 1 Gigabit Ethernet
    • JAVA 8.0, Hadoop 3.1.1 and Giraph 1.3.0
    § 12 Algorithms [VLDB’14, SIGMOD’15, ICDE’16, VLDB’18]
    • Traversals : TSSSP, EAT, FAST, LD, TMST, RH, BFS
    • Clustering : WCC, SCC, LCC
    • Centrality : PR
    • Graph Mining : TC
    (4 Time-Independent : Orange, 8 Time-Dependent : Brown)
    24

    View Slide

  25. DREAM:Lab
    Baselines
    § Time-Independent Analytics
    • MSB – Multi-Snapshot Baseline
    • CHL – Chlonos[1] [EuroSys’14]
    § Time-Dependent Analytics
    • TGB – Transformed Graph Baseline[2] [VLDB’14]
    • GOF – Vertex-centric GoFFishTS [3] [IPDPS’15]
    25
    [1] Han et. al. ”Chronos: a graph engine for temporal graph analysis ” in EuroSys 2014
    [2] Wu et. al. ”Path Problems in Temporal Graphs” in VLDB 2014
    [3] Simmhan et. al. ”Distributed programming over time-series graphs” in IPDPS 2015

    View Slide

  26. DREAM:Lab
    Dataset
    26
    Graph Domain
    #Snap
    shots
    Temporal Graph
    Average
    Lifespan
    Avg. #
    Property
    Changes
    per Edge
    |V| |E| V E Prop.
    GPlus Social 4 28.9M 462M 2.6 1 1 1
    USRN Transport 96 24M 58M 96 96 4.82 20
    Reddit Social 121 9.1M 523M 6.6 1.2 1.12 1.03
    MAG Citation 219 116M 1B 20.9 15.8 5.26 2.98
    Twitter Social 30 43.9M 2.1B 29 28.4 14.8 2
    WebUK Web 12 131M 5.5B 9.9 9.4 4.7 2

    View Slide

  27. DREAM:Lab
    Performance Summary
    27
    GPlus Reddit USRN Twitter MAG WebUK
    TI
    MSB 0.95 1.14 0.97 24.79 12.99 5.80
    Chlonos 0.96 1.08 0.98 13.29 10.89 6.27
    TD
    TGB 0.95 1.13 2.32 19.90 DNL DNL
    GoFFish 0.96 1.05 6.49 6.75 4.60 3.71
    Ratio of makespan time for baseline platforms over Graphite;
    averaged for TI and TD algorithms.1× means same performance and
    >1× means Graphite out-performs baseline.

    View Slide

  28. DREAM:Lab
    Evaluation (1/3)
    28
    LOG-LOG Scale Scatter Plot of #Compute Calls and #Messages,
    and their contribution to makespan time
    Reduced #ComputeCalls and #Messages → Reduced Makespan

    View Slide

  29. DREAM:Lab
    Evaluation (2/3) [GPlus , 29M/462M]
    29
    Graphite performs 3%-17% slower than baseline due to additional
    book-keeping costs and interval-message overheads
    Graphite
    and BL
    require
    equal
    compute
    calls

    View Slide

  30. DREAM:Lab
    Evaluation (3/3) [Twitter, 44M/2B]
    30
    Graphite outperforms baselines by a factor of 8x-24x
    Compute
    Calls
    26x less

    View Slide

  31. DREAM:Lab
    Weak Scaling
    31
    Each machine holds ≈ 10M vertices, ≈ 100M edges.

    View Slide

  32. DREAM:Lab
    Summary
    § Temporal Graph processing is challenging
    § Existing approaches not ideal
    § ICM is a scalable & distributed abstraction for
    programming arbitrary temporal graph algorithms
    • General and Simple API for users
    § Out-performs existing solutions by up to 25x , and
    matches them even in worst-case scenarios
    32

    View Slide

  33. 33
    [email protected]
    https://github.com/dream-lab/graphite
    DISTRIBUTED RESEARCH ON EMERGING APPLICATIONS & MACHINES
    Department of Computational & Data Sciences
    Indian Institute of Science, Bangalore DREAM:Lab
    DREAM:Lab

    View Slide