Simple Matrix Factorization for Recommendation in Mahout

Simple Matrix Factorization for Recommendation Sean Owen • Apache Mahout
6 42 8 78 14 98 1 7 8

Apache Mahout •  Scalable machine learning •  (Mostly) Hadoop-based • 
Clustering, classification and recommender engines •  Nearest-neighbor •  User-based •  Item-based •  Slope-one •  Clustering-based •  Latent factor •  SVD-based •  ALS •  More! mahout.apache.org

Matrix = Associations Rose Navy Olive Alice 0 +4 0
Bob 0 0 +2 Carol -1 0 -2 Dave +3 0 0 ¡  Things are associated Like people to colors ¡  Associations have strengths Like preferences and dislikes ¡  Can quantify associations Alice loves navy = +4, Carol dislikes olive = -2 ¡  We don’t know all associations Many implicit zeroes

From One Matrix, Two ¡  Like numbers, matrices can be
factored ¡  m•n matrix = m•k times k•n ¡  Associations can decompose into others ¡  Alice likes navy = Alice loves blues, and blues includes navy P m = n X m k k n • Y’

In Terms of Few Features ¡  Can explain associations by
appealing to underlying intermediate features (e.g. “blue-ness”) ¡  Relatively few (one “blue-ness”, but many shades) (Alice) (Blue) (Navy)

Losing Information is Helpful ¡  When k (= features) is
small, information is lost ¡  Factorization is approximate (Alice appears to like blue-ish periwinkle too) (Alice) (Blue) (Navy) (Periwinkle)

How to Compute? P m = n X k k
• Y’ n m

Skip the Singular Value Decomposition for now … A m
= n S k k • T’ n m • Є

Alternating Least Squares ¡  Collaborative Filtering for Implicit Feedback Datasets
www2.research.att.com/~yifanhu/PUB/cf.pdf ¡  R = matrix of user-item interactions “strengths” ¡  P = R reduced to 0 and 1 ¡  Factor as approximate P ≈ X•Y’ ¡  Start with random Y ¡  Compute X such that X•Y’ best approximates P (Frobenius / L2 norm) ¡  Repeat for Y ¡  Iterate, Iterate, Iterate ¡  Large values in X•Y’ are good recommendations (Least Squares) (Alternating)

Example 1 4 3 3 4 3 2 5 2
3 5 2 4 1 1 1 0 0 0 0 1 0 0 0 1 0 1 1 1 0 1 0 1 0 0 0 1 0 1 1 0 0 0 P R

k = 3, Е=2, Ћ=40 1 iteration 1 1 1
0 0 0 0 1 0 0 0 1 0 1 1 1 0 1 0 1 0 0 0 1 0 1 1 0 0 0 2.18 -0.01 0.35 1.83 -0.11 -0.68 0.79 1.15 -1.80 0.97 -1.90 -2.12 1.01 -0.25 -1.77 2.33 -8.00 1.06 0.43 0.48 0.48 0.16 0.10 -0.27 0.39 -0.13 0.03 0.05 -0.03 -0.09 -0.13 -0.47 -0.47 ≈ Y’ X

k = 3, Е=2, Ћ=40 1 iteration 1 1 1
0 0 0 0 1 0 0 0 1 0 1 1 1 0 1 0 1 0 0 0 1 0 1 1 0 0 0 ≈ 0.94 1.00 1.00 0.18 0.07 0.84 0.89 0.99 0.60 0.50 0.07 0.99 0.46 1.01 0.98 1.00 -0.09 1.00 1.08 0.99 0.55 0.54 0.75 0.98 0.92 1.01 0.99 0.98 -0.13 -0.25 X•Y’

k = 3, Е=2, Ћ=40 10 iterations 1 1 1
0 0 0 0 1 0 0 0 1 0 1 1 1 0 1 0 1 0 0 0 1 0 1 1 0 0 0 ≈ 0.96 0.99 0.99 0.38 0.93 0.44 0.39 0.98 -0.11 0.39 0.70 0.99 0.42 0.98 0.98 1.00 1.04 0.99 0.44 0.98 0.11 0.51 -0.13 1.00 0.57 0.97 1.00 0.68 0.47 0.91 X•Y’

Interesting Because… This is all very parallelizable by row, column

BONUS: Folding in New Data ¡  Model building takes time
¡  Sometimes need immediate, if approximate, updates for new data ¡  For new user U, need new row, XU •Y’ = QU , but have PU ¡  What is XU ? ¡  Apply some right inverse: X•Y’•(Y’)-1 = Q•(Y’)-1 = so X = Q•(Y’)-1 ¡  OK, what is (Y’)-1? ¡  Of course (Y’•Y)•(Y’•Y)-1 = I ¡  So Y’•(Y•(Y’•Y)-1) = I and right inverse is Y•(Y’•Y)-1 ¡  Xu = QU •Y•(Y’•Y)-1 and so Xu ≈ Pu •Y•(Y’•Y)-1 ⌃

In Mahout ¡  org.apache.mahout.cf.  taste.hadoop.als.  ParallelALSFactorizationJob" ¡  Alternating least squares
¡  Distributed, Hadoop- based ¡  org.apache.mahout.cf.  taste.impl.recommender.  svd.SVDRecommender" ¡  SVD-based ¡  Non-distributed, not Hadoop ¡  MAHOUT-737 ¡  Alternate implementation of alternating least squares ¡  And more… ¡  DistributedLanczosSolver" ¡  SequentialOutOfCoreSvd" ¡  …

Myrrix ¡  Complete product ¡  Real-time Serving Layer ¡  Hadoop-based
Computation Layer ¡  Tuned, documented ¡  Free / open: Serving Layer, for small data ¡  Commercial: add Computation Layer for big data; Hosting ¡  Matrix factorization-based, attractive properties ¡  http://myrrix.com

Thank You srowen at myrrix.com  mahout.apache.org"

Simple Matrix Factorization for Recommendation ...

Simple Matrix Factorization for Recommendation in Mahout

Data Science London

More Decks by Data Science London

Other Decks in Technology

Featured

Transcript

Simple Matrix Factorization for Recommendation Sean Owen • Apache Mahout

Apache Mahout •  Scalable machine learning •  (Mostly) Hadoop-based •

Matrix = Associations Rose Navy Olive Alice 0 +4 0

From One Matrix, Two ¡  Like numbers, matrices can be

In Terms of Few Features ¡  Can explain associations by

Losing Information is Helpful ¡  When k (= features) is

How to Compute? P m = n X k k

Skip the Singular Value Decomposition for now … A m

Alternating Least Squares ¡  Collaborative Filtering for Implicit Feedback Datasets

Example 1 4 3 3 4 3 2 5 2

k = 3, Е=2, Ћ=40 1 iteration 1 1 1

k = 3, Е=2, Ћ=40 1 iteration 1 1 1

k = 3, Е=2, Ћ=40 10 iterations 1 1 1

Interesting Because… This is all very parallelizable by row, column

BONUS: Folding in New Data ¡  Model building takes time

In Mahout ¡  org.apache.mahout.cf.  taste.hadoop.als.  ParallelALSFactorizationJob" ¡  Alternating least squares

Myrrix ¡  Complete product ¡  Real-time Serving Layer ¡  Hadoop-based

Thank You srowen at myrrix.com  mahout.apache.org"