Upgrade to Pro — share decks privately, control downloads, hide ads and more …

VENUS: Vertex-Centric Streamlined Graph Computation on a Single PC

lqhl
April 16, 2015

VENUS: Vertex-Centric Streamlined Graph Computation on a Single PC

Presentation @ ICDE 2015. Seoul, Korea. April 16, 2015.

lqhl

April 16, 2015
Tweet

Other Decks in Research

Transcript

  1. VENUS: Vertex-Centric Streamlined Graph Computation on a Single PC Jiefeng

    Cheng1, Qin Liu2, Zhenguo Li1, Wei Fan1, John C.S. Lui2, Cheng He1 1Huawei Noah’s Ark Lab 2 The Chinese University of Hong Kong ICDE’15
  2. Graph is everywhere Ø We have large graphs •  Web graph

    •  Social graph •  User-movie ratings graph •  … Ø Graph Computation •  PageRank •  Community detection •  ALS for collaborative filtering •  …
  3. Mining from Big Graphs: two feasible ways Ø Distributed systems • 

    Pregel[SIGMOD’10], GraphLab[OSDI’12], GraphX[OSDI’14], Giraph, ... •  Expensive cluster, complex setup, writing distributed programs Ø Single-machine system •  Disk: GraphChi[OSDI’12], X-Stream[SOSP’13] •  SSD: TurboGraph[KDD’13], FlashGraph[FAST’15] •  Computation time close to distributed systems •  PageRank on Twitter graph (41M nodes, 1.4B edges) •  Spark: 8.1min with 50 machines (each with 2 CPUs, 7.5G RAM) [Stanton KDD’12] •  VENUS: 8 min on a single machine with quad-core CPU, 16G RAM •  Affordable, easy to program/debug
  4. Existing Systems Ø Vertex-centric programming model: popularized by Pregel / GraphLab

    / GraphChi •  Each vertex updates itself based on its neighborhood Ø GraphChi •  Updated data on each vertex must be propagated to its neighbors through disk •  Extensive disk I/O Ø X-Stream •  Different API: edge-centric programming •  Less expressive, re-implement common algorithms •  Also use disk to propagate updates
  5. Our Contributions Ø Design and implement a disk-based system, VENUS • 

    A new vertex-centric streamlined processing model •  Separate mutable vertex data and immutable edge data •  Read/Write less data compared to other systems Ø Evaluation on large graphs •  Outperform GraphChi and X-Stream •  Verify that our design reduce data access
  6. Vertex-Centric Programming Ø Consider GraphChi for each iteration for each vertex

    v update(v) void update(v) fetch data from each in-edge update data on v spread data to each out-edge Duplicated data v
  7. Vertex-Centric Programming Ø VENUS: •  Only store mutable values on vertices

    Ø Pros •  Less data access •  Enable ``streamlined’’ processing Ø Cons •  Limited expressiveness void update(v) fetch data from each in-edge update data on v spread data to each out-edge in-neighbor v
  8. VENUS Architecture Ø Disk storage (offline) •  Sharding •  Separation of

    edge data and vertex data Ø Computing model (online) •  Load edge data sequentially •  Execute the update function on each vertex •  How to load vertex data and propagate updates
  9. Sharding Ø Graph cannot fit in RAM? •  Split the graph

    into shards Ø Each shard corresponds to an interval of vertices: •  G-shard: immutable structure of graph •  In-edges of nodes in the interval •  V-shard: mutable vertex values •  Vertex values of all vertices in the shard Ø Structure table: all g-shards Ø Value table: all vertex data Vertex ID 1 2 3 4 5 6 7 8 9 10 11 12 Data
  10. Interval I1 =[1,4] I2 =[5,8] I3 =[9,12] G-shard 7,9,10  →

     1   6,10  →  2   1,2,6  →  3   1,2,6,7,10  →  4 6,7,8,11  →  5   1,10  →  6   3,10,11  →  7   3,6,11  →  8   2,3,4,10,11  →    9   11  →  10   4,6  →  11   2,3,9,10,11  →  12 V-shard I1 ∪{6,7,9,10}   I2 ∪{1,3,10,11} I3 ∪{2,3,4,6} and v an out- v and an out- nd destination -centric in na- each iteration Once a vertex out-neighbors. certain condi- izes the graph 11 10 7 5 8 12 9 6 4 2 1 3 Fig. 2. Example Graph TABLE 1 Sharding Example: VENUS Interval I1 = [1 , 4] I2 = [5 , 8] I3 = [9 , 12] v-shard I1 [ { 6 , 7 , 9 , 10 } I2 [ { 1 , 3 , 10 , 11 } I3 [ { 2 , 3 , 4 , 6 }
  11. Vertex-Centric Streamlined Processing Ø V-shards are much smaller than g-shards • 

    Load each v-shard entirely into memory Ø Scan each g-shard sequentially •  Execute the update function in parallel SUBMITTED TO IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING Interval I1 = Shard 1 ! 2 ! 6 ! 7 ! 9 ! 10 ! sharding struct
  12. Execution Load v-shard 1 7,9,10 → 1 6,10 → 2

    1,2,6 → 3 1,2,6,7,10 → 4 Update v-shard 1 Load v-shard 2 6,7,8,11 → 5 1,10 → 6 3,10,11 → 7 3,6,11 → 8 Update v-shard 2 Load v-shard 3 2,3,4,10,11 → 9 11 → 10 4,6 → 11 2,3,9,10,11 → 12 Update  v-­‐shard  3 Loading Execution Parallelize execution and loading
  13. Load and Update v-shards Ø Two I/O efficient algorithms •  Algorithm

    1: Extension of PSW in GraphChi (skip) •  Algorithm 2: Merge-Join •  Load: merge-join between value table and v-shard •  Update: write values of [1,4] back to vertex table Ø  Use value buffer to cache value table ID 1 2 3 4 5 6 7 8 9 10 11 12 Data Value table on disk ID 1 2 3 4 6 7 9 10 Vertices in v-shard 1 on disk ID 1 2 3 4 6 7 9 10 Data Loaded v-shard 1
  14. Evaluation of VENUS Ø Setup: a commodity PC •  quad-core 3.4GHz

    CPU •  16GB RAM and 4TB hard disk Ø Main competitors: •  GraphChi and X-Stream Ø Applications: •  PageRank •  WCC: weakly connected components •  CD: community detection •  ALS: alternating least square for collaborative filtering •  Shortest path, label propagations, etc.
  15. PageRank on Twitter Ø Twitter follow-graph: 41M nodes, 1.4B edges 0.5

    1 2 4 8 0 500 1,000 1,500 2,000 Memory (GB) Elapsed Time (sec.) GraphChi X-Stream VENUS-I VENUS-II
  16. Cost of Updates Propagation: Data Write and Read 2 0.5

    1 2 4 8 0 5 10 15 20 25 Memory (GB) Data Size of Write (GB) GraphChi X-Stream VENUS-I VENUS-II (a) Data Size of Write 0.5 1 2 4 8 0 5 10 15 20 25 30 35 Memory (GB) Data Size of Read (GB) GraphChi X-Stream VENUS-I VENUS-II (b) Data Size of Read PSW ECP VSP-I VSP-II PSW ECP VSP-I VSP-II PSW ECP VSP-I VSP-II
  17. Applications: WCC, CD, ALS (a) Data Size of Write (b)

    Data Size of Read Fig. 2. PageRank Data Access WCC CD 0 1,000 2,000 3,000 4,000 Task Elapsed Time (sec.) GraphChi X-Stream VENUS-I VENUS-II (a) WCC and CD on Twitter Netflix KDD-Cup 0 500 1,000 1,500 Dataset Elapsed Time (sec.) GraphChi X-Stream VENUS-I VENUS-II (b) ALS on Netflix and KDDCup Fig. 3. More Tasks Failed to implement CD on X-Stream, due to its edge-centric programming model
  18. Web-Scale Graph Ø Clueweb12: web scale graph •  978 million nodes,

    42.5 billion edges •  402 GB on disk •  2 iterations of PageRank Ø Computation time •  GraphChi: 4.3 hours •  X-Stream: 7.4 hours •  VENUS-I: 2 hours •  VENUS-II: 1.8 hours 0 0.5 1 1.5 2 2.5 ·104 PageRank on clueweb12 Elapsed Time (sec.) GraphChi X-Stream VENUS-I VENUS-II (a) Overall Time
  19. Conclusion Ø Present a disk-based graph computation system, VENUS Ø Our design

    of graph storage and execution can reduce data access and I/O Ø Evaluations show it outperforms GraphChi and X-Stream Ø Also VENUS can handle billion-scale problems
  20. Value Buffer Ø To reduce I/O operations in loading/updating v- shards

    •  Split value table into multiple pages •  Use value buffer to cache loaded pages •  Use LRU for replacement