Recommendations under sparsity

Recommendations in the face of sparsity by Maciej Kula

Hi, I’m Maciej Kula. @maciej_kula

We collect the world of  fashion into a customisable  shopping
experience.

Surface items that a user might be interested in Recommendations
4

Collaborative Filtering Users who have bought X also bought Y.

Represent a user-item interaction matrix as a product of two
lower-dimensional matrices to ﬁll in the missing entries. Matrix Factorisation user = (-0.3, 2.1, 0.5) product = (5.2, 0.3, -0.5)

Works well in settings where interaction data are plentiful. 7

1. 900 users 2. 1600 movies 3. 100k ratings 4.
7% dense MovieLens 100K 8

0.91 ROC AUC (probability that a randomly chosen item the
user likes will be ranked higher than an item the user dislikes)

But performance is poor when there are few interactions. 10

1. 3200 users 2. 42,200 questions 3. 60,000 answers 4.
0.005% dense Cross Validated 11

0.54 ROC AUC Barely better than random.

Besides, we want to get people to answer question that
have no answers yet! Not enough information to estimate representations for each user and question. 14

In particular, each question is described by a set of
tags. We can use question metadata. 15

Predict probability that a user will answer a question as
a function of its tags. Use them to build logistic regression models for each user. 16

Better, but: 1. There is no transfer of information between
users 2. We are not capturing tag similarity 3. Item representations remain high-dimensional 0.66 ROC AUC 17

Instead of finding embeddings for items, let’s find embeddings for
item features Then add them together to represent items.

Success! 0.71 ROC AUC 19

We also get tag similarity.

21 `regression' `least squares', `multiple regression' `MCMC' `BUGS', `Metropolis-Hastings', `Beta-Binomial'
`survival' `epidemiology', `Cox model' Tag similarity

That’s not an accident, the objective function is quite similar.
Similar to results from word2vec. 22

23 Useful for • Explaining recommendations • Tag recommendations

24 We've open-sourced a Python implementation https://github.com/lyst/lightfm pip install lightfm
24

25 from lightfm import LightFM model = LightFM(no_components=30) model.fit(train, user_features=user_features,
item_features=item_features, epochs=20) 25

26 Multiple loss functions • Logistic loss for explicit binary
feedback • BPR • WARP • k-th order statistic WARP loss

Two learning rate schedules: • adagrad • adadelta Trained with
asynchronous stochastic gradient descent. 27

thank you @maciej_kula

Recommendations under sparsity

Recommendations under sparsity

Maciej Kula

More Decks by Maciej Kula

Other Decks in Programming

Featured

Transcript

Recommendations in the face of sparsity by Maciej Kula

Hi, I’m Maciej Kula. @maciej_kula

We collect the world of  fashion into a customisable  shopping

Surface items that a user might be interested in Recommendations

Collaborative Filtering Users who have bought X also bought Y.

Represent a user-item interaction matrix as a product of two

Works well in settings where interaction data are plentiful. 7

1. 900 users 2. 1600 movies 3. 100k ratings 4.

0.91 ROC AUC (probability that a randomly chosen item the

But performance is poor when there are few interactions. 10

1. 3200 users 2. 42,200 questions 3. 60,000 answers 4.

0.54 ROC AUC Barely better than random.

Oops.

Besides, we want to get people to answer question that

In particular, each question is described by a set of

Predict probability that a user will answer a question as

Better, but: 1. There is no transfer of information between

Instead of finding embeddings for items, let’s find embeddings for

Success! 0.71 ROC AUC 19

We also get tag similarity.

21 `regression' `least squares', `multiple regression' `MCMC' `BUGS', `Metropolis-Hastings', `Beta-Binomial'

That’s not an accident, the objective function is quite similar.

23 Useful for • Explaining recommendations • Tag recommendations

24 We've open-sourced a Python implementation https://github.com/lyst/lightfm pip install lightfm

25 from lightfm import LightFM model = LightFM(no_components=30) model.fit(train, user_features=user_features,

26 Multiple loss functions • Logistic loss for explicit binary

Two learning rate schedules: • adagrad • adadelta Trained with

thank you @maciej_kula