Why are they important? • Structure of a recommender - Item-item recommendations - Top N recommendations • Types of recommenders - Collaborative filtering vs. Content-based filtering • Tutorial using the MovieLens dataset - Build an item-item recommender - Build a top N recommender (time permitting) Agenda
who bought this item also bought Netflix Because you watched this show… OkCupid Finding your best match LinkedIn Jobs recommended for you New York Times Recommended Articles for You Medicine Facilitating clinical decision making GitHub Repos “based on your interest”
jam samples 24 jam samples vs. Initial Interest 40% of customers stopped at the limited-choice booth 60% of customers stopped at the extensive-choice booth
recommender system? An application of machine learning Recommender System User preferences Recommendations Collaborative filtering Content-based filtering item user John Jim Anne Liz Erica
are we populating these cells with? Explicit feedback Implicit feedback Likert-scale rating (1-5) Liked or not (boolean) Browsing behaviour Purchased? Read? Watched? Developing a user feedback score • Dwell time • Recent vs. old interactions • Negative implicit feedback • What behaviour are you trying to drive?
Jim Anne Liz Erica items scary funny family anime drama romance age gender country lang family? horror? 24 63 10 38 45 M F F F M CA US CA IT UK EN EN FR IT EN N N Y Y Y Y Y N N Y N N N Y N N Y N N Y Y N Y Y Y N N N N Y Y Y N N N Y N Y N N • User features: age, gender, spoken language • Item features: movie genre, year of release, cast
Optimists → rate everything 4 or 5 • Pessimists → rate everything 1 or 2 • Need to normalize ratings by accounting for user and item bias • Mean normalization - subtract from each rating for given item - subtract from each rating for given user bui = μ + bi + bu global avg user-item rating bias item’s avg rating user’s avg rating bi i u bu
to get 2 latent factor matrices: - User-factor matrix - Item-factor matrix • Missing ratings are predicted from the inner product of these two factor matrices Xmn ≈ Pmk × QT nk = ̂ X user item user K K item X ≈
“relevant”? Recall@K Proportion of items that were found in the top k recommendations. True negative False negative Reality Predicted liked did not like liked did not like precision = TP TP + FP recall = TP TP + FN True positive False positive Evaluation
“relevant”? Recall@K Proportion of items that were found in the top k recommendations. True negative False negative Reality Predicted liked did not like liked did not like precision = TP TP + FP recall = TP TP + FN True positive False positive Evaluation