Save 37% off PRO during our Black Friday Sale! »

Recommendations under sparsity

Recommendations under sparsity

In this talk, I look at the advantages and disadvantages of collaborative filtering and content-based recommenders when interaction data is sparse, and describe a hybrid approach implemented in the LightFM package.


Maciej Kula

October 06, 2015


  1. Recommendations in the face of sparsity by Maciej Kula

  2. Hi, I’m Maciej Kula. @maciej_kula

  3. We collect the world of
 fashion into a customisable

  4. Surface items that a user might be interested in Recommendations

  5. Collaborative Filtering Users who have bought X also bought Y.

  6. Represent a user-item interaction matrix as a product of two

    lower-dimensional matrices to fill in the missing entries. Matrix Factorisation user = (-0.3, 2.1, 0.5) product = (5.2, 0.3, -0.5)
  7. Works well in settings where interaction data are plentiful. 7

  8. 1. 900 users 2. 1600 movies 3. 100k ratings 4.

    7% dense MovieLens 100K 8
  9. 0.91 ROC AUC (probability that a randomly chosen item the

    user likes will be ranked higher than an item the user dislikes)
  10. But performance is poor when there are few interactions. 10

  11. 1. 3200 users 2. 42,200 questions 3. 60,000 answers 4.

    0.005% dense Cross Validated 11
  12. 0.54 ROC AUC Barely better than random.

  13. Oops.

  14. Besides, we want to get people to answer question that

    have no answers yet! Not enough information to estimate representations for each user and question. 14
  15. In particular, each question is described by a set of

    tags. We can use question metadata. 15
  16. Predict probability that a user will answer a question as

    a function of its tags. Use them to build logistic regression models for each user. 16
  17. Better, but: 1. There is no transfer of information between

    users 2. We are not capturing tag similarity 3. Item representations remain high-dimensional 0.66 ROC AUC 17
  18. Instead of finding embeddings for items, let’s find embeddings for

    item features Then add them together to represent items.
  19. Success! 0.71 ROC AUC 19

  20. We also get tag similarity.

  21. 21 `regression' `least squares', `multiple regression' `MCMC' `BUGS', `Metropolis-Hastings', `Beta-Binomial'

    `survival' `epidemiology', `Cox model' Tag similarity
  22. That’s not an accident, the objective function is quite similar.

    Similar to results from word2vec. 22
  23. 23 Useful for • Explaining recommendations • Tag recommendations

  24. 24 We've open-sourced a Python implementation pip install lightfm

  25. 25 from lightfm import LightFM model = LightFM(no_components=30), user_features=user_features,

    item_features=item_features, epochs=20) 25
  26. 26 Multiple loss functions • Logistic loss for explicit binary

    feedback • BPR • WARP • k-th order statistic WARP loss
  27. Two learning rate schedules: • adagrad • adadelta Trained with

    asynchronous stochastic gradient descent. 27
  28. thank you @maciej_kula