Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Big Practical Recommendations with Alternating ...

Big Practical Recommendations with Alternating Least Squares

Sean Owen CEO @Myrrix at @ds_dln #strataconf 02/10/12

Data Science London

October 10, 2012
Tweet

More Decks by Data Science London

Other Decks in Technology

Transcript

  1. WHERE’S BIG LEARNING? „  Next: Application Layer „  Analytics „ 

    Machine Learning „  Like Apache Mahout „  Common Big Data app today „  Clustering, recommenders,
  2. A RECOMMENDER SHOULD … „  Answer in Real-time „  Ingest

    new data, now „  Modify recommendations based on newest data „  No “cold start” for new data „  Scale Horizontally „  For queries per second „  For size of data set „  Accept Diverse Input „  Not just people and products „  Not just explicit ratings „  Clicks, views, buys „  Side information „  Be “Pretty Accurate”
  3. A PRACTICAL ALGORITHM MATRIX FACTORIZATION BENEFITS „  Factor user-item matrix

    to user- feature + feature-item matrix „  Well understood in ML, as: „  Principal Component Analysis „  Latent Semantic Indexing „  Several algorithms, like: „  Singular Value Decomposition „  Alternating Least Squares „  Models intuition „  Factorization is batch parallelizable „  Reconstruction (recs) in
  4. A PRACTICAL IMPLEMENTATION ALTERNATING LEAST SQUARES BENEFITS „  Simple factorization

    P ≈ X YT „  Approximate: X, Y are “skinny” (low-rank) „  Faster than the SVD „  Trivially parallel, iterative „  Dumber than the SVD „  No singular values,
  5. ALS ALGORITHM 1 „  Input: (user, item, strength) tuples „ 

    Anything you can quantify is input „  Strength is positive „  Many tuples per user-item „  R is sparse user-item
  6. ALS ALGORITHM 3 „  P is m x n „ 

    Choose k << m, n „  Factor P as Q = X YT, Q ≈ P „  X is m x k ; YT is k x n „  Find best approximation Q „  Minimize L2 norm of diff: || P-Q ||2 „  Minimal squared error:
  7. ALS ALGORITHM 4 „  Optimizing X, Y simultaneously is non-convex,

    hard „  If X or Y are fixed, system of linear equations: convex, easy „  Initialize Y with random values „  Solve for X „  Fix X, solve for Y „  Repeat (“Alternating”) YT X
  8. ALS ALGORITHM 6 „  With fixed Y, compute optimal X

    „  Each row xu is independent „  Define Cu as diagonal matrix of cu (user strength weights) „  xu = (YTCu Y + λI)-1 YTCu pu „  Compare to simple least-squares regression solution (YTY)-1 YTpu „  Adds Tikhonov / ridge regression regularization term λI „  Attaches cu weights to YT „  See paper for how YTCu Y is computed efficiently;
  9. 1 1 1 0 0 0 0 1 0 0

    0 1 0 1 1 1 0 1 0 1 0 0 0 1 0 1 1 0 0 0 EXAMPLE FACTORIZATION „  k = 3, λ = 2, α = 40, 10 iterations ≈ 0.96 0.99 0.99 0.38 0.93 0.44 0.39 0.98 -0.11 0.39 0.70 0.99 0.42 0.98 0.98 1.00 1.04 0.99 0.44 0.98 0.11 0.51 -0.13 1.00 0.57 0.97 1.00 0.68 0.47 0.91 Q = X•YT
  10. EXAMPLES STACKOVERFLOW TAGS WIKIPEDIA LINKS „  Recommend tags to questions

    „  Tag questions automatically, improve tag coverage „  3.5M questions x 30K tags „  4.3 hours x 5 machines on Amazon EMR „  $3.03 ≈ $0.08 per 100,000 recs „  Recommend new linked articles from existing links „  Propose missing, related links „  2.5M articles x 1.8M articles „  28 hours x 2 PCs on