Slide 1

Slide 1 text

Introduction to Collaborative Filtering For Recommender Systems

Slide 2

Slide 2 text

● Farouq Oyebiyi ● Machine Learning Dude at Konga ● Focused on recommendations and personalizations ● Working on product recommendations at Konga Who am I?

Slide 3

Slide 3 text

Recommender Systems (RecSys)? ● Information systems that predicts a user’s preference for an item or list of items ● Use cases: ○ Product recommendations on Amazon ○ Music recommendations on Spotify ○ Book recommendations on Goodreads ○ Job recommendations on Jobberman? ○ Hotel recommendations on Hotels.ng? ○ Friend recommendations on Facebook

Slide 4

Slide 4 text

Why do we need RecSys?

Slide 5

Slide 5 text

Why do we need RecSys? ● Lots more information than we have time to go through ● Finding the perfect movie/product is a “needle in the haystack” problem 30M songs 200M items

Slide 6

Slide 6 text

How to build a RecSys? ● Approaches: ○ Content-based filtering ○ Collaborative filtering ○ Hybrid

Slide 7

Slide 7 text

Collaborative Filtering (CF) ● Predict user preference based on behaviour ● Behaviour includes: ○ Purchase history - Implicit feedback ○ Listening history - implicit feedback ○ Likes - Implicit feedback ○ Shares - Implicit Feedback ○ Review/rating - Explicit feedback

Slide 8

Slide 8 text

Formulating the CF problem ● Let U be the set of all users ● Let V be the set of all items ● R is a U by V matrix Goal: Predict the value of the empty cells. Super Story Jennifer’s Diaries Papa Ajasco Saworoide Gibran ? 1 ? ? Adichie ? ? 1 ? Fajuyi 1 ? 1 ?

Slide 9

Slide 9 text

Collaborative Filtering Techniques ● Techniques ○ Memory-based; nearest neighbour, Pearson’s Correlation ○ Model-based; Matrix Factorization

Slide 10

Slide 10 text

● Given a matrix P, find 2 matrices (X, Y) whose dot product will give you R ● Matrix X and Y will have dimension I which is specified by the user or determined via cross validation ● I represents the number of latent factors in the each matrix Matrix Factorization (MF)

Slide 11

Slide 11 text

MF Equation C ui - confidence level that user u likes item i P ui - binary value indicating if user u has interacted with item i X u - latent vectors for user u Y i - latent vectors for item i - regularization parameter

Slide 12

Slide 12 text

Evaluating MF ● Root Mean Squared Error (RMSE) ● Precision/Recall ● Mean Average Precision at k (MAP@k) ● Normalized Discounted Cumulative Gain (NDCG) Ground truth (y) = [1, 0.5, 1, 0.75, 1, 0.2] Predicted rating ( ) = [0.8, 0.45, 0.2, 0.5, 1, 0.3] RMSE =

Slide 13

Slide 13 text

Advantages of CF ● Does not require side information to make recommendations ● Very good at discovery; serendipity ● Can capture subtle preferences

Slide 14

Slide 14 text

Disadvantages of CF ● Cold start problem ● Accuracy is low when there’s limited data

Slide 15

Slide 15 text

Matrix Factorization Code Sample http://spark.apache.org/docs/latest/mllib-collaborative-filtering.html#collaborative-filtering

Slide 16

Slide 16 text

Value of RecSys

Slide 17

Slide 17 text

Value of RecSys cont’d ● Engage users ● Increase conversion ● Better UX

Slide 18

Slide 18 text

Resources ● http://www.slideshare.net/MrChrisJohnson/algorithmic-music-recommendation s-at-spotify ● http://www.slideshare.net/xamat/recommender-systems-machine-learning-sum mer-school-2014-cmu ● http://spark.apache.org/docs/latest/mllib-collaborative-filtering.html

Slide 19

Slide 19 text

I recommend you ask questions. Thank you