Building a Recommender System from Scratch

Building a recommender system from scratch Jill Cates PyDataDC Tutorial
November 16, 2018 Washington D.C.

1. Build a item-item recommender • “Because you watched Movie
X…” 2. Build a top-N recommender (time permitting) • “Your Top Recommendations” Objective

• An intro to recommenders - What is a recommender?
Why are they important? • Structure of a recommender - Item-item recommendations - Top N recommendations • Types of recommenders - Collaborative ﬁltering vs. Content-based ﬁltering • Tutorial using the MovieLens dataset - Build an item-item recommender - Build a top N recommender (time permitting) Agenda

Recommender Systems in the Wild Spotify Discover Weekly Amazon Customers
who bought this item also bought Netﬂix Because you watched this show… OkCupid Finding your best match LinkedIn Jobs recommended for you New York Times Recommended Articles for You Medicine Facilitating clinical decision making GitHub Repos “based on your interest”

Things were sold exclusively in brick-and-mortar stores… Before e-commerce limited
inventory mainstream products

Things were sold exclusively in brick-and-mortar stores… Before e-commerce limited
inventory mainstream products unlimited inventory niche products unlimited inventory niche products E-commerce

Recommender Systems in the Wild The Tasting Booth Experiment 6
jam samples 24 jam samples vs.

jam samples 24 jam samples vs. Initial Interest 40% of customers stopped at the limited-choice booth 60% of customers stopped at the extensive-choice booth

jam samples 24 jam samples vs. Subsequent Purchase 30% conversion rate 3% conversion rate

What is a recommender system? An application of machine learning
Machine Learning Model Data Predictions

What is a recommender system? An application of machine learning
Recommender System User preferences Recommendations

predicting future behaviour explicit feedback implicit feedback What is a
recommender system? An application of machine learning Recommender System User preferences Recommendations

predicting future behaviour explicit feedback implicit feedback What is a
recommender system? An application of machine learning Recommender System User preferences Recommendations Collaborative ﬁltering Content-based ﬁltering item user John Jim Anne Liz Erica

Collaborative Filtering Similar people like similar things items users John
Jim Anne Liz Erica 3 User-item (“utility”) matrix

User Feedback item user John Jim Anne Liz Erica What
are we populating these cells with? Explicit feedback Implicit feedback Likert-scale rating (1-5) Liked or not (boolean) Browsing behaviour Purchased? Read? Watched? Developing a user feedback score • Dwell time • Recent vs. old interactions • Negative implicit feedback • What behaviour are you trying to drive?

Content-based Filtering Looks at user and item features users John
Jim Anne Liz Erica items scary funny family anime drama romance age gender country lang family? horror? 24 63 10 38 45 M F F F M CA US CA IT UK EN EN FR IT EN N N Y Y Y Y Y N N Y N N N Y N N Y N N Y Y N Y Y Y N N N N Y Y Y N N N Y N Y N N • User features: age, gender, spoken language • Item features: movie genre, year of release, cast

Tutorial

• Option 1: Run notebook locally • Option 2: Run
notebook with Google Colab - Jupyter notebook environment that runs in the cloud - Minimal set-up required - Supports free GPU Environment set-up

• Created by GroupLens research group at the University of
Minnesota • Titanic dataset of recommenders MovieLens

MovieLens

Follow along here: https://github.com/topspinj/pydata-workshop/

Examples

Pre-processing Hyperparameter Tuning Model Training Post-processing Evaluation user_id movie_id rating
2 439 4.0 10 368 4.5 14 114 5.0 19 371 1.0 2 371 3.0 19 114 4.5 3 439 3.5 54 421 2.0 32 114 3.0 10 369 1.0 Pre-processing 1.5 2.0 3.5 4.5 2.0 3.0 5.0 4.5 2.0 1.0 3.0 2.5 4.0 3.0 3.0 4.5 5.0 items users Transform original data to user-item (utility) matrix

Pre-processing Hyperparameter Tuning Model Training Post-processing Evaluation Mean Normalization •
Optimists → rate everything 4 or 5 • Pessimists → rate everything 1 or 2 • Need to normalize ratings by accounting for user and item bias • Mean normalization - subtract from each rating for given item - subtract from each rating for given user bui = μ + bi + bu global avg user-item rating bias item’s avg rating user’s avg rating bi i u bu

Top N Recommender

Matrix Factorization • Dimensionality reduction • Factorize the user-item matrix
to get 2 latent factor matrices: - User-factor matrix - Item-factor matrix • Missing ratings are predicted from the inner product of these two factor matrices Xmn ≈ Pmk × QT nk = ̂ X user item user K K item X ≈

Matrix Factorization • Algorithms that perform matrix factorization: - Alternating
Least Squares (ALS) - Stochastic Gradient Descent (SGD) - Singular Value Decomposition (SVD) Xmn ≈ Pmk × QT nk = ̂ X user item user K K item X ≈

Pre-processing Hyperparameter Tuning Model Training Post-processing Evaluation Evaluation How do
we evaluate recommendations? Traditional ML Recommendation Systems

Evaluation Metrics RMSE = ΣN i=1 (y − ̂ y)2
N precision = TP TP + FP recall = TP TP + FN F1 = 2 ⋅ precision ⋅ recall precision + recall Pre-processing Hyperparameter Tuning Model Training Post-processing Evaluation

Precision@K Of the top k recommendations, what proportion are actually
“relevant”? Recall@K Proportion of items that were found in the top k recommendations. True negative False negative Reality Predicted liked did not like liked did not like precision = TP TP + FP recall = TP TP + FN True positive False positive Evaluation

Important Considerations •Interpretability •Eﬃciency and scalability •Diversity •Serendipity

• import surprise (@NicolasHug) • import implicit (@benfred) • import
LightFM (@lyst) • import pyspark.mlib.recommendation Python Tools

Thank you! Jill Cates twitter: @jillacates github: @topspinj [email protected]

Building a Recommender System from Scratch

Building a Recommender System from Scratch

More Decks by Jill Cates

Other Decks in Science

Featured

Transcript