Big Practical Recommendations with Alternating Least Squares

Big, Practical Recommendations

WHERE’S BIG LEARNING?   Next: Application Layer   Analytics  
Machine Learning   Like Apache Mahout   Common Big Data app today   Clustering, recommenders,

A RECOMMENDER SHOULD …   Answer in Real-time   Ingest
new data, now   Modify recommendations based on newest data   No “cold start” for new data   Scale Horizontally   For queries per second   For size of data set   Accept Diverse Input   Not just people and products   Not just explicit ratings   Clicks, views, buys   Side information   Be “Pretty Accurate”

NEED: 2-TIER ARCHITECTURE   Real-time Serving Layer   Quick results
based on

A PRACTICAL ALGORITHM MATRIX FACTORIZATION BENEFITS   Factor user-item matrix
to user- feature + feature-item matrix   Well understood in ML, as:   Principal Component Analysis   Latent Semantic Indexing   Several algorithms, like:   Singular Value Decomposition   Alternating Least Squares   Models intuition   Factorization is batch parallelizable   Reconstruction (recs) in

A PRACTICAL IMPLEMENTATION ALTERNATING LEAST SQUARES BENEFITS   Simple factorization
P ≈ X YT   Approximate: X, Y are “skinny” (low-rank)   Faster than the SVD   Trivially parallel, iterative   Dumber than the SVD   No singular values,

ALS ALGORITHM 1   Input: (user, item, strength) tuples  
Anything you can quantify is input   Strength is positive   Many tuples per user-item   R is sparse user-item

ALS ALGORITHM 2   Follow “Collaborative Filtering for Implicit Feedback
Datasets”

ALS ALGORITHM 3   P is m x n  
Choose k << m, n   Factor P as Q = X YT, Q ≈ P   X is m x k ; YT is k x n   Find best approximation Q   Minimize L2 norm of diff: || P-Q ||2   Minimal squared error:

ALS ALGORITHM 4   Optimizing X, Y simultaneously is non-convex,
hard   If X or Y are ﬁxed, system of linear equations: convex, easy   Initialize Y with random values   Solve for X   Fix X, solve for Y   Repeat (“Alternating”) YT X

ALS ALGORITHM 5   Deﬁne regularization weights cui = 1
+ α rui   Minimize:

ALS ALGORITHM 6   With fixed Y, compute optimal X
  Each row xu is independent   Define Cu as diagonal matrix of cu (user strength weights)   xu = (YTCu Y + λI)-1 YTCu pu   Compare to simple least-squares regression solution (YTY)-1 YTpu   Adds Tikhonov / ridge regression regularization term λI   Attaches cu weights to YT   See paper for how YTCu Y is computed efficiently;

1 1 1 0 0 0 0 1 0 0
0 1 0 1 1 1 0 1 0 1 0 0 0 1 0 1 1 0 0 0 EXAMPLE FACTORIZATION   k = 3, λ = 2, α = 40, 10 iterations ≈ 0.96 0.99 0.99 0.38 0.93 0.44 0.39 0.98 -0.11 0.39 0.70 0.99 0.42 0.98 0.98 1.00 1.04 0.99 0.44 0.98 0.11 0.51 -0.13 1.00 0.57 0.97 1.00 0.68 0.47 0.91 Q = X•YT

FOLD-IN   Need immediate, if approximate, updates for new data
  New user u needs new row

THIS IS MYRRIX   Soft-launched   Serving Layer available

APPENDIX

EXAMPLES STACKOVERFLOW TAGS WIKIPEDIA LINKS   Recommend tags to questions
  Tag questions automatically, improve tag coverage   3.5M questions x 30K tags   4.3 hours x 5 machines on Amazon EMR   $3.03 ≈ $0.08 per 100,000 recs   Recommend new linked articles from existing links   Propose missing, related links   2.5M articles x 1.8M articles   28 hours x 2 PCs on

Big Practical Recommendations with Alternating ...

Big Practical Recommendations with Alternating Least Squares

Data Science London

More Decks by Data Science London

Other Decks in Technology

Featured

Transcript