Slide 1

Slide 1 text

Big, Practical Recommendations

Slide 2

Slide 2 text

WHERE’S BIG LEARNING? „  Next: Application Layer „  Analytics „  Machine Learning „  Like Apache Mahout „  Common Big Data app today „  Clustering, recommenders,

Slide 3

Slide 3 text

A RECOMMENDER SHOULD … „  Answer in Real-time „  Ingest new data, now „  Modify recommendations based on newest data „  No “cold start” for new data „  Scale Horizontally „  For queries per second „  For size of data set „  Accept Diverse Input „  Not just people and products „  Not just explicit ratings „  Clicks, views, buys „  Side information „  Be “Pretty Accurate”

Slide 4

Slide 4 text

NEED: 2-TIER ARCHITECTURE „  Real-time Serving Layer „  Quick results based on

Slide 5

Slide 5 text

A PRACTICAL ALGORITHM MATRIX FACTORIZATION BENEFITS „  Factor user-item matrix to user- feature + feature-item matrix „  Well understood in ML, as: „  Principal Component Analysis „  Latent Semantic Indexing „  Several algorithms, like: „  Singular Value Decomposition „  Alternating Least Squares „  Models intuition „  Factorization is batch parallelizable „  Reconstruction (recs) in

Slide 6

Slide 6 text

A PRACTICAL IMPLEMENTATION ALTERNATING LEAST SQUARES BENEFITS „  Simple factorization P ≈ X YT „  Approximate: X, Y are “skinny” (low-rank) „  Faster than the SVD „  Trivially parallel, iterative „  Dumber than the SVD „  No singular values,

Slide 7

Slide 7 text

ALS ALGORITHM 1 „  Input: (user, item, strength) tuples „  Anything you can quantify is input „  Strength is positive „  Many tuples per user-item „  R is sparse user-item

Slide 8

Slide 8 text

ALS ALGORITHM 2 „  Follow “Collaborative Filtering for Implicit Feedback Datasets”

Slide 9

Slide 9 text

ALS ALGORITHM 3 „  P is m x n „  Choose k << m, n „  Factor P as Q = X YT, Q ≈ P „  X is m x k ; YT is k x n „  Find best approximation Q „  Minimize L2 norm of diff: || P-Q ||2 „  Minimal squared error:

Slide 10

Slide 10 text

ALS ALGORITHM 4 „  Optimizing X, Y simultaneously is non-convex, hard „  If X or Y are fixed, system of linear equations: convex, easy „  Initialize Y with random values „  Solve for X „  Fix X, solve for Y „  Repeat (“Alternating”) YT X

Slide 11

Slide 11 text

ALS ALGORITHM 5 „  Define regularization weights cui = 1 + α rui „  Minimize:

Slide 12

Slide 12 text

ALS ALGORITHM 6 „  With fixed Y, compute optimal X „  Each row xu is independent „  Define Cu as diagonal matrix of cu (user strength weights) „  xu = (YTCu Y + λI)-1 YTCu pu „  Compare to simple least-squares regression solution (YTY)-1 YTpu „  Adds Tikhonov / ridge regression regularization term λI „  Attaches cu weights to YT „  See paper for how YTCu Y is computed efficiently;

Slide 13

Slide 13 text

1 1 1 0 0 0 0 1 0 0 0 1 0 1 1 1 0 1 0 1 0 0 0 1 0 1 1 0 0 0 EXAMPLE FACTORIZATION „  k = 3, λ = 2, α = 40, 10 iterations ≈ 0.96 0.99 0.99 0.38 0.93 0.44 0.39 0.98 -0.11 0.39 0.70 0.99 0.42 0.98 0.98 1.00 1.04 0.99 0.44 0.98 0.11 0.51 -0.13 1.00 0.57 0.97 1.00 0.68 0.47 0.91 Q = X•YT

Slide 14

Slide 14 text

FOLD-IN „  Need immediate, if approximate, updates for new data „  New user u needs new row

Slide 15

Slide 15 text

THIS IS MYRRIX „  Soft-launched „  Serving Layer available

Slide 16

Slide 16 text

APPENDIX

Slide 17

Slide 17 text

EXAMPLES STACKOVERFLOW TAGS WIKIPEDIA LINKS „  Recommend tags to questions „  Tag questions automatically, improve tag coverage „  3.5M questions x 30K tags „  4.3 hours x 5 machines on Amazon EMR „  $3.03 ≈ $0.08 per 100,000 recs „  Recommend new linked articles from existing links „  Propose missing, related links „  2.5M articles x 1.8M articles „  28 hours x 2 PCs on