Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Collaborative Filtering

forLoop
August 22, 2016

Collaborative Filtering

Umar Farouq presented on collaborative filtering at the forLoop Machine Learning themed event

forLoop

August 22, 2016
Tweet

More Decks by forLoop

Other Decks in Programming

Transcript

  1. • Farouq Oyebiyi • Machine Learning Dude at Konga •

    Focused on recommendations and personalizations • Working on product recommendations at Konga Who am I?
  2. Recommender Systems (RecSys)? • Information systems that predicts a user’s

    preference for an item or list of items • Use cases: ◦ Product recommendations on Amazon ◦ Music recommendations on Spotify ◦ Book recommendations on Goodreads ◦ Job recommendations on Jobberman? ◦ Hotel recommendations on Hotels.ng? ◦ Friend recommendations on Facebook
  3. Why do we need RecSys? • Lots more information than

    we have time to go through • Finding the perfect movie/product is a “needle in the haystack” problem 30M songs 200M items
  4. Collaborative Filtering (CF) • Predict user preference based on behaviour

    • Behaviour includes: ◦ Purchase history - Implicit feedback ◦ Listening history - implicit feedback ◦ Likes - Implicit feedback ◦ Shares - Implicit Feedback ◦ Review/rating - Explicit feedback
  5. Formulating the CF problem • Let U be the set

    of all users • Let V be the set of all items • R is a U by V matrix Goal: Predict the value of the empty cells. Super Story Jennifer’s Diaries Papa Ajasco Saworoide Gibran ? 1 ? ? Adichie ? ? 1 ? Fajuyi 1 ? 1 ?
  6. • Given a matrix P, find 2 matrices (X, Y)

    whose dot product will give you R • Matrix X and Y will have dimension I which is specified by the user or determined via cross validation • I represents the number of latent factors in the each matrix Matrix Factorization (MF)
  7. MF Equation C ui - confidence level that user u

    likes item i P ui - binary value indicating if user u has interacted with item i X u - latent vectors for user u Y i - latent vectors for item i - regularization parameter
  8. Evaluating MF • Root Mean Squared Error (RMSE) • Precision/Recall

    • Mean Average Precision at k (MAP@k) • Normalized Discounted Cumulative Gain (NDCG) Ground truth (y) = [1, 0.5, 1, 0.75, 1, 0.2] Predicted rating ( ) = [0.8, 0.45, 0.2, 0.5, 1, 0.3] RMSE =
  9. Advantages of CF • Does not require side information to

    make recommendations • Very good at discovery; serendipity • Can capture subtle preferences