Slide 1

Slide 1 text

Large-scale Recommender Systems on Just a PC (with GraphChi) Data Science London Dec 10, 2014 Aapo  Kyrölä   Ph.D.,  Carnegie  Mellon  University  2014   (now:  Facebook)     hBp://www.cs.cmu.edu/~akyrola   TwiBer:  @kyrpov   Big  Data  –  small  machine  

Slide 2

Slide 2 text

Contents 1.  Why “Just a PC” 2.  Introduction to GraphChi 3.  Recsys on GraphChi –  Examples: ALS, item-CF (triangle counting), random walks for link prediction 4.  Graphchi-DB (very briefly)

Slide 3

Slide 3 text

Why on a single machine? Can’t    we  just  use  the   Cloud?   Large-Scale Recommender Systems on Just a PC

Slide 4

Slide 4 text

Why use a cluster? Two reasons: 1.  One computer cannot handle my problem in a reasonable time. 2.  I need to solve the problem very fast.

Slide 5

Slide 5 text

Why use a cluster? Two reasons: 1.  One computer cannot handle my problem in a reasonable time. 2.  I need to solve the problem very fast. Our  work  expands  the  space  of  feasible  (graph)  problems  on   one  machine:   -­‐  Our  experiments  use  the  same  graphs,  or  bigger,  than  previous   papers  on  distributed  graph  computaPon.  (+  we  can  do  TwiBer   graph  on  a  laptop)   -­‐  Most  data  not  that  “big”  anyway.   Our  work  raises  the  bar  on  required  performance  for  a   “complicated”  system.  

Slide 6

Slide 6 text

Benefits of single machine systems Assuming it can handle your big problems… 1.  Programmer productivity –  Global state –  Can use “real data” for development 2.  Inexpensive to install, administer, less power. 3.  Scalability: –  10x machines doing a full job each = 10x throughput

Slide 7

Slide 7 text

GRAPH COMPUTATION AND GRAPHCHI

Slide 8

Slide 8 text

Why graphs for recommender systems? •  Graph = matrix: edge(u,v) = M[u,v] –  Note: always sparse graphs •  Intuitive, human-understandable representation –  Easy to visualize and explain. •  Unifies collaborative filtering (typically matrix based) with recommendation in social networks. –  Random walk algorithms. •  Local view à vertex-centric computation

Slide 9

Slide 9 text

Vertex-Centric Computational Model •  Graph G = (V, E) –  directed edges: e = (source, destination) –  each edge and vertex associated with a value (user-defined type) –  vertex and edge values can be modified •  (structure modification also supported) Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   9   GraphChi  –  Aapo  Kyrola   A   B  

Slide 10

Slide 10 text

Data   Data   Data   Data   Data   Data   Data   Data   Data   Data   Vertex-centric Programming •  “Think like a vertex” •  Popularized by the Pregel and GraphLab projects MyFunc(vertex)     {  //  modify  neighborhood  }   Data   Data   Data   Data   Data  

Slide 11

Slide 11 text

6. Before 8. A er 7. A er What is GraphChi 8. A er 7. A er Both  in  OSDI’12!  

Slide 12

Slide 12 text

The Main Challenge of Disk-based Graph Computation: Random Access ~  100K  reads  /  sec  (commodity)   ~  1M  reads  /  sec  (high-­‐end  arrays)     <<    5-­‐10  M  random   edges  /  sec  to  achieve   “reasonable   performance”   100s  reads/writes  per  sec    

Slide 13

Slide 13 text

•  Vertices are numbered from 1 to n –  P intervals, each associated with a shard on disk. –  sub-graph = interval of vertices GraphChi’s Data Storage shard(1)   interval(1)   interval(2)   interval(P)   shard(2)   shard(P)   1   n   v1   v2   13   Expensive  graph  parPPoning  not  required  

Slide 14

Slide 14 text

Parallel Sliding Windows Only P large reads for each interval (sub-graph). P2 reads on one full pass. or   Details:    Kyrola,  Blelloch,  Guestrin:  “Large-­‐scale  graph  computaPon  on  just  a  PC”  (OSDI  2012)  

Slide 15

Slide 15 text

Performance GraphChi  can  compute  on  the   full  Twi(er  follow-­‐graph  with   just  a  standard  laptop  (2012).   ~  as  fast  as  a  very  large  Hadoop  cluster!   (size  of  the  graph  Fall  2013,  >  20B  edges  [Gupta  et  al  2013])  

Slide 16

Slide 16 text

RECSYS MODEL TRAINING WITH GRAPHCHI

Slide 17

Slide 17 text

Overview of Recommender Systems for GraphChi 1.  Collaborative Filtering toolkit 1.  Example 1: ALS 2.  Example 2: Item-based CF 2.  Link prediction in large networks – Random-walk based approaches

Slide 18

Slide 18 text

GraphChi’s Collaborative Filtering Toolkit •  Developed by Danny Bickson (CMU / GraphLab Inc) •  Includes: –  Alternative Least Squares (ALS) –  Sparse-ALS –  SVD++ –  LibFM (factorization machines) –  GenSGD –  Item-similarity based methods –  PMF –  CliMF (contributed by Mark Levy) –  …. Note:  In  the  C++  -­‐version.   See  Danny’s  blog  for  more   informaPon:     hBp:// bickson.blogspot.com/ 2012/12/collaboraPve-­‐ filtering-­‐with-­‐graphchi.html  

Slide 19

Slide 19 text

Example: Alternative Least Squares Matrix Factorization (ALS) Reference:    Y.  Zhou,  D.  Wilkinson,  R.  Schreiber,  R.  Pan:  “Large-­‐Scale   Parallel  CollaboraPve  Filtering  for  the  Neolix  Prize”  (2008)   •  Task: Predict ratings for items (movies) by users. •  Model: – Latent factor model (see next slide)

Slide 20

Slide 20 text

ALS: Product – Item bipartite graph City  of  God   Wild  Strawberries   The  CelebraPon   La  Dolce  Vita   Women  on  the  Verge  of  a   Nervous  Breakdown   4   3   2   5   0.4   2.3   -­‐1.8   2.9   1.2   -­‐3.2   2.8   0.9   0.2   4.1   8.7   2.9   0.04   2.1   3.141   2.3   2.5   3.9   0.02   0.04   User’s  raPng  of  a  movie  modeled  as  a  dot-­‐product:          !

Slide 21

Slide 21 text

ALS: GraphChi implementation •  Update function handles one vertex a time (user or movie) •  For each user: –  Estimate latent(user): minimize least squares of dot-product predicted ratings •  GraphChi executes the update function for each vertex (in parallel), and loads edges (ratings) from disk –  Latent factors in memory: need O(V) memory. –  If factors don’t fit in memory, can replicate to edges. and thus store on disk Scales  to  very  large  problems!  

Slide 22

Slide 22 text

ALS: Performance Matrix  FactorizaPon  (AlternaPve  Least  Squares)   GraphLab  v1   (8  cores)   GraphChi  (Mac   Mini)   0   2   4   6   8   10   12   Minutes   Ne1lix  (99M  edges),  D=20   Remark:  Neolix  is  not  a  big  problem,  but   GraphChi  will  scale  at  most  linearly  with   input  size  (ALS  is  CPU  bounded,  so  should   be  sub-­‐linear  in  #raPngs).  

Slide 23

Slide 23 text

Example: Item Based-CF •  Task: compute a similarity score [e,g. Jaccard] for each movie-pair that has at least one viewer in common. – Similarity(X, Y) ~ # common viewers •  Problem: enumerating all pairs takes too much time.

Slide 24

Slide 24 text

City  of  God   Wild  Strawberries   The  CelebraPon   La  Dolce  Vita   Women  on  the  Verge  of  a   Nervous  Breakdown   3   SoluPon:  Enumerate  all   triangles  of  the  graph.     New  problem:  how  to   enumerate  triangles  if  the   graph  does  not  fit  in  RAM?    

Slide 25

Slide 25 text

PIVOTS   Algorithm:   •  Let  pivots  be  a  subset  of  the  verPces;   •  Load  the  list  of  neighbors  of  pivots  into  RAM   •  Use  GraphChi  to  load  all  verPces  from  disk,   one  by  one,  and  compare  their  neighbors  to   neighboring  pivots’  neighbor  list   •  Repeat  with  a  new  set  of  pivots.   Triangle Enumeration in GraphChi

Slide 26

Slide 26 text

Triangle Counting Performance Triangle  CounPng   Hadoop  (1636   machines)   GraphChi  (Mac   Mini)   0   50   100   150   200   250   300   350   400   450   Minutes   twiBer-­‐2010  (1.5B  edges)  

Slide 27

Slide 27 text

RECOMMENDATIONS IN SOCIAL NETWORKS

Slide 28

Slide 28 text

Random Walk Engine •  Simulating random walks to quickly rank most important (non-friend) persons for a person: – Example: Pick top 10 nodes visited by 10,000-step random walk (with restart). •  Used by Twitter as first step in their “Who to Follow” –algorithm (Gupta et al., WWW’13)

Slide 29

Slide 29 text

Random walk in an in-memory graph •  Compute one walk a time (multiple in parallel, of course): DrunkardMob  -­‐  RecSys  '13  

Slide 30

Slide 30 text

Problem: What if Graph does not fit in memory? TwiBer  network  visualizaPon,   by  Akshay  Java,  2009   Distributed  graph   systems:   -­‐  Each  hop  across   parPPon  boundary   is  costly.   Disk-­‐based  “single-­‐ machine”  graph   systems:   -­‐  “Paging”  from  disk   is  costly.   DrunkardMob  -­‐  RecSys  '13  

Slide 31

Slide 31 text

Random walks in GraphChi •  DrunkardMob –algorithm (Kyrola, ACM RecSys ‘13) –  Reverse thinking: simulate m/billions of short walks in parallel. –  Handle one vertex a time (instead of one walk a time). Note:  Need  to  store  only   current  posiPon  of  each  walk   in  memory  (4B/walk)  !   DrunkardMob  -­‐  RecSys  '13  

Slide 32

Slide 32 text

Comparison to in-memory walks 0.0e+00 1.0e+09 2.0e+09 3.0e+09 0 1000 3000 5000 Number of walks Seconds DrunkardMob in−memory walks (Cassovary) (a) Comparison to in-memory walks 0e+00 0 2000 4000 6000 Gra Seconds ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● (b) Runn CompePPve  with  in-­‐memory  walks.  However,  if  you  can  fit   your  graph  in  memory  –  no  need  for  DrunkardMob.   DrunkardMob  -­‐  RecSys  '13  

Slide 33

Slide 33 text

8. A er 7. A er -­‐DB  

Slide 34

Slide 34 text

GraphChi  (OSDI  ’12)   Batch  computaGon  on  graphs  with   billions  of  edges  on  just  a  PC  /  laptop     GraphChi-­‐DB   Database  funcGonality   Updates  (Online)   Insert  edge/vertex   Update  edge/vertex  value   Delete  edge/vertex   (No  high  level   transacGons)   Associated  data   -­‐  Edge  type  (label)   -­‐  Edge  properPes   -­‐  Vertex  properPes   -­‐  Vardata-­‐columns   Queries  (graph-­‐style)   -­‐  In/out  neighbor   queries   -­‐  Two-­‐hop  queries   -­‐  Point  queries   -­‐  Shortest  paths   -­‐  Graph  sampling   à  Incremental   computaPon  on   evolving  graphs  

Slide 35

Slide 35 text

Highlights •  Fast edge ingest by using Log-Structured Merge –tree (similar to RocksDB, LevelDB) •  Fast in- and out-edge queries using sparse and compressed indices – Storage model optimized for large graphs. •  Columnar data storage for fast analytical computation and schema changes Read  more  from  my  thesis  /  arxiv.  

Slide 36

Slide 36 text

Comparison: Database Size 36   Baseline:  4  +  4  bytes  /   edge.   0   10   20   30   40   50   60   70   MySQL  (data  +  indices)   Neo4j   GraphChi-­‐DB   BASELINE   Database  file  size  (twijer-­‐2010  graph,  1.5B  edges)  

Slide 37

Slide 37 text

Comparison: Ingest 37   System   Time  to  ingest  1.5B  edges   GraphChi-­‐DB    (ONLINE)   1  hour  45  mins   Neo4j  (batch)   45  hours   MySQL  (batch)   3  hour  30  minutes     (including  index  creaPon)   If  running  Pagerank  simultaneously,  GraphChi-­‐DB   takes  3  hour  45  minutes  

Slide 38

Slide 38 text

Comparison: Friends-of-Friends Query See  thesis  for  shortest-­‐path  comparison.   22.4   759.8   5.9   0   50   100   150   200   GraphChi-­‐DB   Neo4j   MySQL   milliseconds   50-­‐percenPle   1264   1631   4776   0   1000   2000   3000   4000   5000   6000   GraphChi-­‐DB   GraphChi-­‐DB   +  Pagerank   MySQL   milliseconds   99-­‐percenPle   Latency  percenGles  over  100K  random  queries   Graph:  1.5B   edges   GraphChi-­‐DB   is  the  most   scalable  DB   with  large   power-­‐law   graphs  

Slide 39

Slide 39 text

SUMMARY

Slide 40

Slide 40 text

Summary •  Single PC can handle very large datasets: –  Easier to work with, better economics. •  GraphChi and Parallel Sliding Window –algorithm allow processing graphs in big chunks from disk •  GraphChi’s collaborative filtering toolkit for matrix- and graph-oriented recommendation algorithms –  Scales to big problems, high efficiency by storing critical data in memory. •  GraphChi-DB adds online database features: –  Graph database that can do analytical computation.

Slide 41

Slide 41 text

GraphChi in GitHub •  http://github.com/graphchi-cpp – Includes collaborative filtering toolkit •  http://github.com/graphchi-java •  http://github.com/graphchiDB-scala Thank  you!     [email protected]   twiBer:  @kyrpov   See  also  GraphLab  Create   by  graphlab.com!  

Slide 42

Slide 42 text

Random Access Problem 42   Disk File:  edge-­‐values   A:  in-­‐edges   A:  out-­‐edges   A   B   B:  in-­‐edges   B:  out-­‐edges   x   Moral:  You  can  either  access  in-­‐  or  out-­‐edges   sequenPally,  but  not  both!   Random  write!   Random  read!   Processing  sequenPally  

Slide 43

Slide 43 text

Efficient Scaling Task  7   Task  6   Task  5   Task  4   Task  3   Task  2   Task  1   Time   T Distributed  Graph   System   Single-­‐computer   system  (capable  of  big  tasks)   Task  1   Task  2   Task  3   Task  4   Task  5   Task  6   Time   T T11   T10   T9   T8   T7   T6   T5   T4   T3   T2   T1   6  machines   12  machines   Task  1   Task  2   Task  3   Task  4   Task  5   Task  6   Task  10   Task  11   Task  12   (Significantly)  less   than  2x  throughput   with  2x  machines   Exactly  2x   throughput  with  2x   machines  

Slide 44

Slide 44 text

No content

Slide 45

Slide 45 text

GraphChi Program Execution For  T  iteraPons:   For  p=1  to  P     For  v  in  interval(p)    updateFuncPon(v)     For  T  iteraPons:   For  v=1  to  V   updateFuncPon(v)     “Asynchronous”:  updates  immediately   visible  (vs.  bulk-­‐synchronous).