Large-scale Recommender Systems on Just a PC (with GraphChi)

Large-scale Recommender Systems on Just a PC (with GraphChi) Data
Science London Dec 10, 2014 Aapo Kyrölä Ph.D., Carnegie Mellon University 2014 (now: Facebook) hBp://www.cs.cmu.edu/~akyrola TwiBer: @kyrpov Big Data – small machine

Contents 1.  Why “Just a PC” 2.  Introduction to GraphChi
3.  Recsys on GraphChi –  Examples: ALS, item-CF (triangle counting), random walks for link prediction 4.  Graphchi-DB (very briefly)

Why on a single machine? Can’t we just use
the Cloud? Large-Scale Recommender Systems on Just a PC

Why use a cluster? Two reasons: 1.  One computer cannot
handle my problem in a reasonable time. 2.  I need to solve the problem very fast.

Why use a cluster? Two reasons: 1.  One computer cannot
handle my problem in a reasonable time. 2.  I need to solve the problem very fast. Our work expands the space of feasible (graph) problems on one machine: -‐  Our experiments use the same graphs, or bigger, than previous papers on distributed graph computaPon. (+ we can do TwiBer graph on a laptop) -‐  Most data not that “big” anyway. Our work raises the bar on required performance for a “complicated” system.

Benefits of single machine systems Assuming it can handle your
big problems… 1.  Programmer productivity –  Global state –  Can use “real data” for development 2.  Inexpensive to install, administer, less power. 3.  Scalability: –  10x machines doing a full job each = 10x throughput

GRAPH COMPUTATION AND GRAPHCHI

Why graphs for recommender systems? •  Graph = matrix: edge(u,v)
= M[u,v] –  Note: always sparse graphs •  Intuitive, human-understandable representation –  Easy to visualize and explain. •  Unifies collaborative filtering (typically matrix based) with recommendation in social networks. –  Random walk algorithms. •  Local view à vertex-centric computation

Vertex-Centric Computational Model •  Graph G = (V, E) – 
directed edges: e = (source, destination) –  each edge and vertex associated with a value (user-defined type) –  vertex and edge values can be modified •  (structure modification also supported) Data Data Data Data Data Data Data Data Data Data 9 GraphChi – Aapo Kyrola A B

Data Data Data Data Data
Data Data Data Data Data Vertex-centric Programming •  “Think like a vertex” •  Popularized by the Pregel and GraphLab projects MyFunc(vertex) { // modify neighborhood } Data Data Data Data Data

6. Before 8. A er 7. A er What is
GraphChi 8. A er 7. A er Both in OSDI’12!

The Main Challenge of Disk-based Graph Computation: Random Access ~
100K reads / sec (commodity) ~ 1M reads / sec (high-‐end arrays) << 5-‐10 M random edges / sec to achieve “reasonable performance” 100s reads/writes per sec

•  Vertices are numbered from 1 to n –  P
intervals, each associated with a shard on disk. –  sub-graph = interval of vertices GraphChi’s Data Storage shard(1) interval(1) interval(2) interval(P) shard(2) shard(P) 1 n v1 v2 13 Expensive graph parPPoning not required

Parallel Sliding Windows Only P large reads for each interval
(sub-graph). P2 reads on one full pass. or Details: Kyrola, Blelloch, Guestrin: “Large-‐scale graph computaPon on just a PC” (OSDI 2012)

Performance GraphChi can compute on the full Twi(er follow-‐graph
with just a standard laptop (2012). ~ as fast as a very large Hadoop cluster! (size of the graph Fall 2013, > 20B edges [Gupta et al 2013])

RECSYS MODEL TRAINING WITH GRAPHCHI

Overview of Recommender Systems for GraphChi 1.  Collaborative Filtering toolkit
1.  Example 1: ALS 2.  Example 2: Item-based CF 2.  Link prediction in large networks – Random-walk based approaches

GraphChi’s Collaborative Filtering Toolkit •  Developed by Danny Bickson (CMU
/ GraphLab Inc) •  Includes: –  Alternative Least Squares (ALS) –  Sparse-ALS –  SVD++ –  LibFM (factorization machines) –  GenSGD –  Item-similarity based methods –  PMF –  CliMF (contributed by Mark Levy) –  …. Note: In the C++ -‐version. See Danny’s blog for more informaPon: hBp:// bickson.blogspot.com/ 2012/12/collaboraPve-‐ ﬁltering-‐with-‐graphchi.html

Example: Alternative Least Squares Matrix Factorization (ALS) Reference: Y.
Zhou, D. Wilkinson, R. Schreiber, R. Pan: “Large-‐Scale Parallel CollaboraPve Filtering for the Neolix Prize” (2008) •  Task: Predict ratings for items (movies) by users. •  Model: – Latent factor model (see next slide)

ALS: Product – Item bipartite graph City of God
Wild Strawberries The CelebraPon La Dolce Vita Women on the Verge of a Nervous Breakdown 4 3 2 5 0.4 2.3 -‐1.8 2.9 1.2 -‐3.2 2.8 0.9 0.2 4.1 8.7 2.9 0.04 2.1 3.141 2.3 2.5 3.9 0.02 0.04 User’s raPng of a movie modeled as a dot-‐product: <factor(user), factor(movie)>!

ALS: GraphChi implementation •  Update function handles one vertex a
time (user or movie) •  For each user: –  Estimate latent(user): minimize least squares of dot-product predicted ratings •  GraphChi executes the update function for each vertex (in parallel), and loads edges (ratings) from disk –  Latent factors in memory: need O(V) memory. –  If factors don’t fit in memory, can replicate to edges. and thus store on disk Scales to very large problems!

ALS: Performance Matrix FactorizaPon (AlternaPve Least Squares) GraphLab v1
(8 cores) GraphChi (Mac Mini) 0 2 4 6 8 10 12 Minutes Ne1lix (99M edges), D=20 Remark: Neolix is not a big problem, but GraphChi will scale at most linearly with input size (ALS is CPU bounded, so should be sub-‐linear in #raPngs).

Example: Item Based-CF •  Task: compute a similarity score [e,g.
Jaccard] for each movie-pair that has at least one viewer in common. – Similarity(X, Y) ~ # common viewers •  Problem: enumerating all pairs takes too much time.

City of God Wild Strawberries The CelebraPon
La Dolce Vita Women on the Verge of a Nervous Breakdown 3 SoluPon: Enumerate all triangles of the graph. New problem: how to enumerate triangles if the graph does not ﬁt in RAM?

PIVOTS Algorithm: •  Let pivots be a subset
of the verPces; •  Load the list of neighbors of pivots into RAM •  Use GraphChi to load all verPces from disk, one by one, and compare their neighbors to neighboring pivots’ neighbor list •  Repeat with a new set of pivots. Triangle Enumeration in GraphChi

Triangle Counting Performance Triangle CounPng Hadoop (1636 machines)
GraphChi (Mac Mini) 0 50 100 150 200 250 300 350 400 450 Minutes twiBer-‐2010 (1.5B edges)

RECOMMENDATIONS IN SOCIAL NETWORKS

Random Walk Engine •  Simulating random walks to quickly rank
most important (non-friend) persons for a person: – Example: Pick top 10 nodes visited by 10,000-step random walk (with restart). •  Used by Twitter as first step in their “Who to Follow” –algorithm (Gupta et al., WWW’13)

Random walk in an in-memory graph •  Compute one walk
a time (multiple in parallel, of course): DrunkardMob -‐ RecSys '13

Problem: What if Graph does not fit in memory? TwiBer
network visualizaPon, by Akshay Java, 2009 Distributed graph systems: -‐ Each hop across parPPon boundary is costly. Disk-‐based “single-‐ machine” graph systems: -‐  “Paging” from disk is costly. DrunkardMob -‐ RecSys '13

Random walks in GraphChi •  DrunkardMob –algorithm (Kyrola, ACM RecSys
‘13) –  Reverse thinking: simulate m/billions of short walks in parallel. –  Handle one vertex a time (instead of one walk a time). Note: Need to store only current posiPon of each walk in memory (4B/walk) ! DrunkardMob -‐ RecSys '13

Comparison to in-memory walks 0.0e+00 1.0e+09 2.0e+09 3.0e+09 0 1000
3000 5000 Number of walks Seconds DrunkardMob in−memory walks (Cassovary) (a) Comparison to in-memory walks 0e+00 0 2000 4000 6000 Gra Seconds • • • • • • • • • • • • • • • • (b) Runn CompePPve with in-‐memory walks. However, if you can ﬁt your graph in memory – no need for DrunkardMob. DrunkardMob -‐ RecSys '13

8. A er 7. A er -‐DB

GraphChi (OSDI ’12) Batch computaGon on graphs with
billions of edges on just a PC / laptop GraphChi-‐DB Database funcGonality Updates (Online) Insert edge/vertex Update edge/vertex value Delete edge/vertex (No high level transacGons) Associated data -‐  Edge type (label) -‐  Edge properPes -‐  Vertex properPes -‐  Vardata-‐columns Queries (graph-‐style) -‐  In/out neighbor queries -‐  Two-‐hop queries -‐  Point queries -‐  Shortest paths -‐  Graph sampling à Incremental computaPon on evolving graphs

Highlights •  Fast edge ingest by using Log-Structured Merge –tree
(similar to RocksDB, LevelDB) •  Fast in- and out-edge queries using sparse and compressed indices – Storage model optimized for large graphs. •  Columnar data storage for fast analytical computation and schema changes Read more from my thesis / arxiv.

Comparison: Database Size 36 Baseline: 4 + 4 bytes
/ edge. 0 10 20 30 40 50 60 70 MySQL (data + indices) Neo4j GraphChi-‐DB BASELINE Database ﬁle size (twijer-‐2010 graph, 1.5B edges)

Comparison: Ingest 37 System Time to ingest 1.5B
edges GraphChi-‐DB (ONLINE) 1 hour 45 mins Neo4j (batch) 45 hours MySQL (batch) 3 hour 30 minutes (including index creaPon) If running Pagerank simultaneously, GraphChi-‐DB takes 3 hour 45 minutes

Comparison: Friends-of-Friends Query See thesis for shortest-‐path comparison. 22.4
759.8 5.9 0 50 100 150 200 GraphChi-‐DB Neo4j MySQL milliseconds 50-‐percenPle 1264 1631 4776 0 1000 2000 3000 4000 5000 6000 GraphChi-‐DB GraphChi-‐DB + Pagerank MySQL milliseconds 99-‐percenPle Latency percenGles over 100K random queries Graph: 1.5B edges GraphChi-‐DB is the most scalable DB with large power-‐law graphs

SUMMARY

Summary •  Single PC can handle very large datasets: – 
Easier to work with, better economics. •  GraphChi and Parallel Sliding Window –algorithm allow processing graphs in big chunks from disk •  GraphChi’s collaborative filtering toolkit for matrix- and graph-oriented recommendation algorithms –  Scales to big problems, high efficiency by storing critical data in memory. •  GraphChi-DB adds online database features: –  Graph database that can do analytical computation.

GraphChi in GitHub •  http://github.com/graphchi-cpp – Includes collaborative filtering toolkit • 
http://github.com/graphchi-java •  http://github.com/graphchiDB-scala Thank you! [email protected] twiBer: @kyrpov See also GraphLab Create by graphlab.com!

Random Access Problem 42 Disk File: edge-‐values A:
in-‐edges A: out-‐edges A B B: in-‐edges B: out-‐edges x Moral: You can either access in-‐ or out-‐edges sequenPally, but not both! Random write! Random read! Processing sequenPally

Efficient Scaling Task 7 Task 6 Task 5
Task 4 Task 3 Task 2 Task 1 Time T Distributed Graph System Single-‐computer system (capable of big tasks) Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 Time T T11 T10 T9 T8 T7 T6 T5 T4 T3 T2 T1 6 machines 12 machines Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 Task 10 Task 11 Task 12 (Signiﬁcantly) less than 2x throughput with 2x machines Exactly 2x throughput with 2x machines

GraphChi Program Execution For T iteraPons: For p=1 to
P For v in interval(p) updateFuncPon(v) For T iteraPons: For v=1 to V updateFuncPon(v) “Asynchronous”: updates immediately visible (vs. bulk-‐synchronous).

Large-scale Recommender Systems on Just a PC (w...

Large-scale Recommender Systems on Just a PC (with GraphChi)

More Decks by Data Science London

Other Decks in Technology

Featured

Transcript