new data, now Modify recommendations based on newest data No “cold start” for new data Scale Horizontally For queries per second For size of data set Accept Diverse Input Not just people and products Not just explicit ratings Clicks, views, buys Side information Be “Pretty Accurate”
to user- feature + feature-item matrix Well understood in ML, as: Principal Component Analysis Latent Semantic Indexing Several algorithms, like: Singular Value Decomposition Alternating Least Squares Models intuition Factorization is batch parallelizable Reconstruction (recs) in
P ≈ X YT Approximate: X, Y are “skinny” (low-rank) Faster than the SVD Trivially parallel, iterative Dumber than the SVD No singular values,
Choose k << m, n Factor P as Q = X YT, Q ≈ P X is m x k ; YT is k x n Find best approximation Q Minimize L2 norm of diff: || P-Q ||2 Minimal squared error:
hard If X or Y are fixed, system of linear equations: convex, easy Initialize Y with random values Solve for X Fix X, solve for Y Repeat (“Alternating”) YT X
Each row xu is independent Define Cu as diagonal matrix of cu (user strength weights) xu = (YTCu Y + λI)-1 YTCu pu Compare to simple least-squares regression solution (YTY)-1 YTpu Adds Tikhonov / ridge regression regularization term λI Attaches cu weights to YT See paper for how YTCu Y is computed efficiently;
Tag questions automatically, improve tag coverage 3.5M questions x 30K tags 4.3 hours x 5 machines on Amazon EMR $3.03 ≈ $0.08 per 100,000 recs Recommend new linked articles from existing links Propose missing, related links 2.5M articles x 1.8M articles 28 hours x 2 PCs on