VENUS: Vertex-Centric Streamlined Graph Computation on a Single PC

VENUS: Vertex-Centric Streamlined Graph Computation on a Single PC Jiefeng
Cheng1, Qin Liu2, Zhenguo Li1, Wei Fan1, John C.S. Lui2, Cheng He1 1Huawei Noah’s Ark Lab 2 The Chinese University of Hong Kong ICDE’15

Graph is everywhere Ø We have large graphs •  Web graph
•  Social graph •  User-movie ratings graph •  … Ø Graph Computation •  PageRank •  Community detection •  ALS for collaborative filtering •  …

Mining from Big Graphs: two feasible ways Ø Distributed systems • 
Pregel[SIGMOD’10], GraphLab[OSDI’12], GraphX[OSDI’14], Giraph, ... •  Expensive cluster, complex setup, writing distributed programs Ø Single-machine system •  Disk: GraphChi[OSDI’12], X-Stream[SOSP’13] •  SSD: TurboGraph[KDD’13], FlashGraph[FAST’15] •  Computation time close to distributed systems •  PageRank on Twitter graph (41M nodes, 1.4B edges) •  Spark: 8.1min with 50 machines (each with 2 CPUs, 7.5G RAM) [Stanton KDD’12] •  VENUS: 8 min on a single machine with quad-core CPU, 16G RAM •  Affordable, easy to program/debug

Existing Systems Ø Vertex-centric programming model: popularized by Pregel / GraphLab
/ GraphChi •  Each vertex updates itself based on its neighborhood Ø GraphChi •  Updated data on each vertex must be propagated to its neighbors through disk •  Extensive disk I/O Ø X-Stream •  Different API: edge-centric programming •  Less expressive, re-implement common algorithms •  Also use disk to propagate updates

Our Contributions Ø Design and implement a disk-based system, VENUS • 
A new vertex-centric streamlined processing model •  Separate mutable vertex data and immutable edge data •  Read/Write less data compared to other systems Ø Evaluation on large graphs •  Outperform GraphChi and X-Stream •  Verify that our design reduce data access

Vertex-Centric Programming Ø Consider GraphChi for each iteration for each vertex
v update(v) void update(v) fetch data from each in-edge update data on v spread data to each out-edge Duplicated data v

Vertex-Centric Programming Ø VENUS: •  Only store mutable values on vertices
Ø Pros •  Less data access •  Enable ``streamlined’’ processing Ø Cons •  Limited expressiveness void update(v) fetch data from each in-edge update data on v spread data to each out-edge in-neighbor v

VENUS Architecture IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

VENUS Architecture Ø Disk storage (offline) •  Sharding •  Separation of
edge data and vertex data Ø Computing model (online) •  Load edge data sequentially •  Execute the update function on each vertex •  How to load vertex data and propagate updates

Sharding Ø Graph cannot fit in RAM? •  Split the graph
into shards Ø Each shard corresponds to an interval of vertices: •  G-shard: immutable structure of graph •  In-edges of nodes in the interval •  V-shard: mutable vertex values •  Vertex values of all vertices in the shard Ø Structure table: all g-shards Ø Value table: all vertex data Vertex ID 1 2 3 4 5 6 7 8 9 10 11 12 Data

Interval I1 =[1,4] I2 =[5,8] I3 =[9,12] G-shard 7,9,10 →
1 6,10 → 2 1,2,6 → 3 1,2,6,7,10 → 4 6,7,8,11 → 5 1,10 → 6 3,10,11 → 7 3,6,11 → 8 2,3,4,10,11 → 9 11 → 10 4,6 → 11 2,3,9,10,11 → 12 V-shard I1 ∪{6,7,9,10} I2 ∪{1,3,10,11} I3 ∪{2,3,4,6} and v an out- v and an out- nd destination -centric in na- each iteration Once a vertex out-neighbors. certain condi- izes the graph 11 10 7 5 8 12 9 6 4 2 1 3 Fig. 2. Example Graph TABLE 1 Sharding Example: VENUS Interval I1 = [1 , 4] I2 = [5 , 8] I3 = [9 , 12] v-shard I1 [ { 6 , 7 , 9 , 10 } I2 [ { 1 , 3 , 10 , 11 } I3 [ { 2 , 3 , 4 , 6 }

Vertex-Centric Streamlined Processing Ø V-shards are much smaller than g-shards • 
Load each v-shard entirely into memory Ø Scan each g-shard sequentially •  Execute the update function in parallel SUBMITTED TO IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING Interval I1 = Shard 1 ! 2 ! 6 ! 7 ! 9 ! 10 ! sharding struct

Execution Load v-shard 1 7,9,10 → 1 6,10 → 2
1,2,6 → 3 1,2,6,7,10 → 4 Update v-shard 1 Load v-shard 2 6,7,8,11 → 5 1,10 → 6 3,10,11 → 7 3,6,11 → 8 Update v-shard 2 Load v-shard 3 2,3,4,10,11 → 9 11 → 10 4,6 → 11 2,3,9,10,11 → 12 Update v-‐shard 3 Loading Execution Parallelize execution and loading

Load and Update v-shards Ø Two I/O efficient algorithms •  Algorithm
1: Extension of PSW in GraphChi (skip) •  Algorithm 2: Merge-Join •  Load: merge-join between value table and v-shard •  Update: write values of [1,4] back to vertex table Ø  Use value buffer to cache value table ID 1 2 3 4 5 6 7 8 9 10 11 12 Data Value table on disk ID 1 2 3 4 6 7 9 10 Vertices in v-shard 1 on disk ID 1 2 3 4 6 7 9 10 Data Loaded v-shard 1

Evaluation of VENUS Ø Setup: a commodity PC •  quad-core 3.4GHz
CPU •  16GB RAM and 4TB hard disk Ø Main competitors: •  GraphChi and X-Stream Ø Applications: •  PageRank •  WCC: weakly connected components •  CD: community detection •  ALS: alternating least square for collaborative filtering •  Shortest path, label propagations, etc.

PageRank on Twitter Ø Twitter follow-graph: 41M nodes, 1.4B edges 0.5
1 2 4 8 0 500 1,000 1,500 2,000 Memory (GB) Elapsed Time (sec.) GraphChi X-Stream VENUS-I VENUS-II

Cost of Updates Propagation: Data Write and Read 2 0.5
1 2 4 8 0 5 10 15 20 25 Memory (GB) Data Size of Write (GB) GraphChi X-Stream VENUS-I VENUS-II (a) Data Size of Write 0.5 1 2 4 8 0 5 10 15 20 25 30 35 Memory (GB) Data Size of Read (GB) GraphChi X-Stream VENUS-I VENUS-II (b) Data Size of Read PSW ECP VSP-I VSP-II PSW ECP VSP-I VSP-II PSW ECP VSP-I VSP-II

Applications: WCC, CD, ALS (a) Data Size of Write (b)
Data Size of Read Fig. 2. PageRank Data Access WCC CD 0 1,000 2,000 3,000 4,000 Task Elapsed Time (sec.) GraphChi X-Stream VENUS-I VENUS-II (a) WCC and CD on Twitter Netﬂix KDD-Cup 0 500 1,000 1,500 Dataset Elapsed Time (sec.) GraphChi X-Stream VENUS-I VENUS-II (b) ALS on Netﬂix and KDDCup Fig. 3. More Tasks Failed to implement CD on X-Stream, due to its edge-centric programming model

Web-Scale Graph Ø Clueweb12: web scale graph •  978 million nodes,
42.5 billion edges •  402 GB on disk •  2 iterations of PageRank Ø Computation time •  GraphChi: 4.3 hours •  X-Stream: 7.4 hours •  VENUS-I: 2 hours •  VENUS-II: 1.8 hours 0 0.5 1 1.5 2 2.5 ·104 PageRank on clueweb12 Elapsed Time (sec.) GraphChi X-Stream VENUS-I VENUS-II (a) Overall Time

Conclusion Ø Present a disk-based graph computation system, VENUS Ø Our design
of graph storage and execution can reduce data access and I/O Ø Evaluations show it outperforms GraphChi and X-Stream Ø Also VENUS can handle billion-scale problems

Thank you! Q&A

Value Buffer Ø To reduce I/O operations in loading/updating v- shards
•  Split value table into multiple pages •  Use value buffer to cache loaded pages •  Use LRU for replacement

VENUS: Vertex-Centric Streamlined Graph Computation on a Single PC

VENUS: Vertex-Centric Streamlined Graph Computation on a Single PC

lqhl

Other Decks in Research

Featured

Transcript

VENUS: Vertex-Centric Streamlined Graph Computation on a Single PC Jiefeng

Graph is everywhere Ø We have large graphs •  Web graph

Mining from Big Graphs: two feasible ways Ø Distributed systems •

Existing Systems Ø Vertex-centric programming model: popularized by Pregel / GraphLab

Our Contributions Ø Design and implement a disk-based system, VENUS •

Vertex-Centric Programming Ø Consider GraphChi for each iteration for each vertex

Vertex-Centric Programming Ø VENUS: •  Only store mutable values on vertices

VENUS Architecture IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

VENUS Architecture Ø Disk storage (offline) •  Sharding •  Separation of

Sharding Ø Graph cannot fit in RAM? •  Split the graph

Interval I1 =[1,4] I2 =[5,8] I3 =[9,12] G-shard 7,9,10 →

Vertex-Centric Streamlined Processing Ø V-shards are much smaller than g-shards •

Execution Load v-shard 1 7,9,10 → 1 6,10 → 2

Load and Update v-shards Ø Two I/O efficient algorithms •  Algorithm

Evaluation of VENUS Ø Setup: a commodity PC •  quad-core 3.4GHz

PageRank on Twitter Ø Twitter follow-graph: 41M nodes, 1.4B edges 0.5

Cost of Updates Propagation: Data Write and Read 2 0.5

Applications: WCC, CD, ALS (a) Data Size of Write (b)

Web-Scale Graph Ø Clueweb12: web scale graph •  978 million nodes,

Conclusion Ø Present a disk-based graph computation system, VENUS Ø Our design

Thank you! Q&A

Value Buffer Ø To reduce I/O operations in loading/updating v- shards