Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Simple Matrix Factorization for Recommendation in Mahout

Simple Matrix Factorization for Recommendation in Mahout

Presentation by Sean Owen core Mahout commiter, author of Mahout in Action, founder of Myrryx at Data Science London 23/05/12

Data Science London

July 03, 2012
Tweet

More Decks by Data Science London

Other Decks in Technology

Transcript

  1. Apache Mahout •  Scalable machine learning •  (Mostly) Hadoop-based • 

    Clustering, classification and recommender engines •  Nearest-neighbor •  User-based •  Item-based •  Slope-one •  Clustering-based •  Latent factor •  SVD-based •  ALS •  More! mahout.apache.org
  2. Matrix = Associations Rose Navy Olive Alice 0 +4 0

    Bob 0 0 +2 Carol -1 0 -2 Dave +3 0 0 ¡  Things are associated Like people to colors ¡  Associations have strengths Like preferences and dislikes ¡  Can quantify associations Alice loves navy = +4, Carol dislikes olive = -2 ¡  We don’t know all associations Many implicit zeroes
  3. From One Matrix, Two ¡  Like numbers, matrices can be

    factored ¡  m•n matrix = m•k times k•n ¡  Associations can decompose into others ¡  Alice likes navy = Alice loves blues, and blues includes navy P m = n X m k k n • Y’
  4. In Terms of Few Features ¡  Can explain associations by

    appealing to underlying intermediate features (e.g. “blue-ness”) ¡  Relatively few (one “blue-ness”, but many shades) (Alice) (Blue) (Navy)
  5. Losing Information is Helpful ¡  When k (= features) is

    small, information is lost ¡  Factorization is approximate (Alice appears to like blue-ish periwinkle too) (Alice) (Blue) (Navy) (Periwinkle)
  6. Alternating Least Squares ¡  Collaborative Filtering for Implicit Feedback Datasets

    www2.research.att.com/~yifanhu/PUB/cf.pdf ¡  R = matrix of user-item interactions “strengths” ¡  P = R reduced to 0 and 1 ¡  Factor as approximate P ≈ X•Y’ ¡  Start with random Y ¡  Compute X such that X•Y’ best approximates P (Frobenius / L2 norm) ¡  Repeat for Y ¡  Iterate, Iterate, Iterate ¡  Large values in X•Y’ are good recommendations (Least Squares) (Alternating)
  7. Example 1 4 3 3 4 3 2 5 2

    3 5 2 4 1 1 1 0 0 0 0 1 0 0 0 1 0 1 1 1 0 1 0 1 0 0 0 1 0 1 1 0 0 0 P R
  8. k = 3, Е=2, Ћ=40 1 iteration 1 1 1

    0 0 0 0 1 0 0 0 1 0 1 1 1 0 1 0 1 0 0 0 1 0 1 1 0 0 0 2.18 -0.01 0.35 1.83 -0.11 -0.68 0.79 1.15 -1.80 0.97 -1.90 -2.12 1.01 -0.25 -1.77 2.33 -8.00 1.06 0.43 0.48 0.48 0.16 0.10 -0.27 0.39 -0.13 0.03 0.05 -0.03 -0.09 -0.13 -0.47 -0.47 ≈ Y’ X
  9. k = 3, Е=2, Ћ=40 1 iteration 1 1 1

    0 0 0 0 1 0 0 0 1 0 1 1 1 0 1 0 1 0 0 0 1 0 1 1 0 0 0 ≈ 0.94 1.00 1.00 0.18 0.07 0.84 0.89 0.99 0.60 0.50 0.07 0.99 0.46 1.01 0.98 1.00 -0.09 1.00 1.08 0.99 0.55 0.54 0.75 0.98 0.92 1.01 0.99 0.98 -0.13 -0.25 X•Y’
  10. k = 3, Е=2, Ћ=40 10 iterations 1 1 1

    0 0 0 0 1 0 0 0 1 0 1 1 1 0 1 0 1 0 0 0 1 0 1 1 0 0 0 ≈ 0.96 0.99 0.99 0.38 0.93 0.44 0.39 0.98 -0.11 0.39 0.70 0.99 0.42 0.98 0.98 1.00 1.04 0.99 0.44 0.98 0.11 0.51 -0.13 1.00 0.57 0.97 1.00 0.68 0.47 0.91 X•Y’
  11. BONUS: Folding in New Data ¡  Model building takes time

    ¡  Sometimes need immediate, if approximate, updates for new data ¡  For new user U, need new row, XU •Y’ = QU , but have PU ¡  What is XU ? ¡  Apply some right inverse: X•Y’•(Y’)-1 = Q•(Y’)-1 = so X = Q•(Y’)-1 ¡  OK, what is (Y’)-1? ¡  Of course (Y’•Y)•(Y’•Y)-1 = I ¡  So Y’•(Y•(Y’•Y)-1) = I and right inverse is Y•(Y’•Y)-1 ¡  Xu = QU •Y•(Y’•Y)-1 and so Xu ≈ Pu •Y•(Y’•Y)-1 ⌃
  12. In Mahout ¡  org.apache.mahout.cf.
 taste.hadoop.als.
 ParallelALSFactorizationJob" ¡  Alternating least squares

    ¡  Distributed, Hadoop- based ¡  org.apache.mahout.cf.
 taste.impl.recommender.
 svd.SVDRecommender" ¡  SVD-based ¡  Non-distributed, not Hadoop ¡  MAHOUT-737 ¡  Alternate implementation of alternating least squares ¡  And more… ¡  DistributedLanczosSolver" ¡  SequentialOutOfCoreSvd" ¡  …
  13. Myrrix ¡  Complete product ¡  Real-time Serving Layer ¡  Hadoop-based

    Computation Layer ¡  Tuned, documented ¡  Free / open: Serving Layer, for small data ¡  Commercial: add Computation Layer for big data; Hosting ¡  Matrix factorization-based, attractive properties ¡  http://myrrix.com