Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Collaborative Filtering

0b0ba94d014c694b932ada74f1c9f1af?s=47 forLoop
August 22, 2016

Collaborative Filtering

Umar Farouq presented on collaborative filtering at the forLoop Machine Learning themed event

0b0ba94d014c694b932ada74f1c9f1af?s=128

forLoop

August 22, 2016
Tweet

Transcript

  1. Introduction to Collaborative Filtering For Recommender Systems

  2. • Farouq Oyebiyi • Machine Learning Dude at Konga •

    Focused on recommendations and personalizations • Working on product recommendations at Konga Who am I?
  3. Recommender Systems (RecSys)? • Information systems that predicts a user’s

    preference for an item or list of items • Use cases: ◦ Product recommendations on Amazon ◦ Music recommendations on Spotify ◦ Book recommendations on Goodreads ◦ Job recommendations on Jobberman? ◦ Hotel recommendations on Hotels.ng? ◦ Friend recommendations on Facebook
  4. Why do we need RecSys?

  5. Why do we need RecSys? • Lots more information than

    we have time to go through • Finding the perfect movie/product is a “needle in the haystack” problem 30M songs 200M items
  6. How to build a RecSys? • Approaches: ◦ Content-based filtering

    ◦ Collaborative filtering ◦ Hybrid
  7. Collaborative Filtering (CF) • Predict user preference based on behaviour

    • Behaviour includes: ◦ Purchase history - Implicit feedback ◦ Listening history - implicit feedback ◦ Likes - Implicit feedback ◦ Shares - Implicit Feedback ◦ Review/rating - Explicit feedback
  8. Formulating the CF problem • Let U be the set

    of all users • Let V be the set of all items • R is a U by V matrix Goal: Predict the value of the empty cells. Super Story Jennifer’s Diaries Papa Ajasco Saworoide Gibran ? 1 ? ? Adichie ? ? 1 ? Fajuyi 1 ? 1 ?
  9. Collaborative Filtering Techniques • Techniques ◦ Memory-based; nearest neighbour, Pearson’s

    Correlation ◦ Model-based; Matrix Factorization
  10. • Given a matrix P, find 2 matrices (X, Y)

    whose dot product will give you R • Matrix X and Y will have dimension I which is specified by the user or determined via cross validation • I represents the number of latent factors in the each matrix Matrix Factorization (MF)
  11. MF Equation C ui - confidence level that user u

    likes item i P ui - binary value indicating if user u has interacted with item i X u - latent vectors for user u Y i - latent vectors for item i - regularization parameter
  12. Evaluating MF • Root Mean Squared Error (RMSE) • Precision/Recall

    • Mean Average Precision at k (MAP@k) • Normalized Discounted Cumulative Gain (NDCG) Ground truth (y) = [1, 0.5, 1, 0.75, 1, 0.2] Predicted rating ( ) = [0.8, 0.45, 0.2, 0.5, 1, 0.3] RMSE =
  13. Advantages of CF • Does not require side information to

    make recommendations • Very good at discovery; serendipity • Can capture subtle preferences
  14. Disadvantages of CF • Cold start problem • Accuracy is

    low when there’s limited data
  15. Matrix Factorization Code Sample http://spark.apache.org/docs/latest/mllib-collaborative-filtering.html#collaborative-filtering

  16. Value of RecSys

  17. Value of RecSys cont’d • Engage users • Increase conversion

    • Better UX
  18. Resources • http://www.slideshare.net/MrChrisJohnson/algorithmic-music-recommendation s-at-spotify • http://www.slideshare.net/xamat/recommender-systems-machine-learning-sum mer-school-2014-cmu • http://spark.apache.org/docs/latest/mllib-collaborative-filtering.html

  19. I recommend you ask questions. Thank you