Slide 1

Slide 1 text

HYBRID RECOMMENDER SYSTEMS IN PYTHON THE WHYS AND WHEREFORES

Slide 2

Slide 2 text

@maciej_kula I'M MACIEJ

Slide 3

Slide 3 text

I mainly build recommendations, but have dabbled in other systems I'M A DATA SCIENTIST AT LYST

Slide 4

Slide 4 text

I'M GOING TO TALK ABOUR HYBRID RECOMMENDERS What they are, and Why you might want one.

Slide 5

Slide 5 text

COLLABORATIVE FILTERING IS THE WORKHORSE OF RECOMMENDER SYSTEMS Use historical data on co-purchasing behaviour 'Users who bought X also bought...'

Slide 6

Slide 6 text

USER-ITEM INTERACTIONS AS A SPARSE MATRIX I = ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎜ 1.0 0.0 ⋮ 1.0 0.0 1.0 ⋮ 1.0 ⋯ ⋯ ⋱ ⋯ 1.0 0.0 ⋮ 1.0 ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ ⎟

Slide 7

Slide 7 text

IN THE SIMPLEST CASE, THAT'S ENOUGH TO MAKE RECOMMENDATIONS find similar users by calculating the distance between the rows that represent them recommend items similar users have bought, weighted by the degree of similarity

Slide 8

Slide 8 text

Represent as a product of two reduced-rank matrices and . MOST APPLICATIONS USE SOME FORM OF MATRIX FACTORIZATION I U P

Slide 9

Slide 9 text

THIS WORKS REMARKABLY WELL IF YOU HAVE A LOT OF DATA domain-agnostic: don't need to know anything about the users and items easy to understand and implement chief component of the Netflix-prize-winning ensemble MF yields nice, low-dimensional item representations, useful if you want to do related products

Slide 10

Slide 10 text

BUT WHAT IF YOUR DATA IS SPARSE? large product inventory short-lived products lots of new users

Slide 11

Slide 11 text

CAN'T COMPUTE SIMILARITIES most users haven't bought most items haven't been bought

Slide 12

Slide 12 text

PERFORMS NO BETTER THAN RANDOM

Slide 13

Slide 13 text

CONTENT-BASED MODELS TO THE RESCUE collect metadata about items construct a classifier for each user

Slide 14

Slide 14 text

PROBLEMS need to have plenty of data for each user no information sharing across users doesn't provide compact representations for item similarity

Slide 15

Slide 15 text

'Gucci Evening Dress' and 'Givenchy Ball Gown' DOESN'T CAPTURE SIMILARITY

Slide 16

Slide 16 text

SOLUTION: USE A HYBRID MODEL

Slide 17

Slide 17 text

It's called LightFM. DISCLAIMER: THIS IS WHERE I TRY TO CONVINCE YOU TO USE MY RECOMMENDER PACKAGE

Slide 18

Slide 18 text

A VARIANT OF MATRIX FACTORIZATION Instead of estimating a latent vector per user and item, estimate latent vectors for user and item metadata. User and items ids can also be included if you have enough data.

Slide 19

Slide 19 text

The representation for 'Givenchy Ball Gown' is the element- wise sum of representations for 'givenchy', 'ball', and 'gown'. The representation for a female user with id 100 is the element-wise sum of representations for 'female' and 'ID 100'.

Slide 20

Slide 20 text

The prediction for a user-item pair is given by the inner product of their representations.

Slide 21

Slide 21 text

Two independent fully-connected layers, one with user, the other with item features as inputs, connected via a dot product. NEURAL NETWORK PERSPECTIVE

Slide 22

Slide 22 text

BENEFITS fewer parameters to estimate can make predictions for new items and new users captures synonymy produces nice dense item representations reduces to a standard MF model as a special case

Slide 23

Slide 23 text

EXAMPLE: CROSS-VALIDATED Try to predict which questions will users answer A ranking task, measure AUC

Slide 24

Slide 24 text

PURE COLLABORATIVE FILTERING AUC of 0.43 worse than random little data, lots of parameters massive overfitting

Slide 25

Slide 25 text

PURE CONTENT-BASED SOLUTION fit a separate logistic regression model for each user AUC of 0.66 a lot better

Slide 26

Slide 26 text

HYBRID SOLUTION AUC of 0.71 best result get tag embeddings as an extra benefit

Slide 27

Slide 27 text

TAG SIMILARITY 'bayesian': 'mcmc', 'variational-bayes' 'survival': 'cox-model', 'odds-ratio', 'kaplan-meier'

Slide 28

Slide 28 text

Both are essentially matrix factorization algorithms SIMILAR TO WORD2VEC

Slide 29

Slide 29 text

If you have lots of new users or new items, you will benefit from a hybrid algorithm IN SUMMARY

Slide 30

Slide 30 text

Even if you don't face cold-start, you might still want to use LightFM.

Slide 31

Slide 31 text

EASY TO USE from lightfm import LightFM model = LightFM(loss='warp', learning_rate=0.01, learning_schedule='adagrad', no_components=30) model.fit(interactions, item_features=item_features, user_features=user_features, num_threads=4, epochs=epochs)

Slide 32

Slide 32 text

FAST Written in Cython Supports multicore training via Hogwild

Slide 33

Slide 33 text

LEARNING-TO-RANK Supports learning-to-rank objectives BPR WARP

Slide 34

Slide 34 text

ASIDE: LEARNING-TO-RANK IS A GREAT IDEA A Siamese network with triplet loss in NN parlance WARP is especially effective

Slide 35

Slide 35 text

Adagrad and Adadelta PER-PARAMETER LEARNING RATES

Slide 36

Slide 36 text

pip install lightfm github.com/lyst/lightfm