Hybrid Recommender Systems at PyData Amsterdam 2016

Hybrid Recommender Systems at PyData Amsterdam 2016

F401afbfb8100568304f0caf45c79575?s=128

Maciej Kula

March 13, 2016
Tweet

Transcript

  1. HYBRID RECOMMENDER SYSTEMS IN PYTHON THE WHYS AND WHEREFORES

  2. @maciej_kula I'M MACIEJ

  3. I mainly build recommendations, but have dabbled in other systems

    I'M A DATA SCIENTIST AT LYST
  4. I'M GOING TO TALK ABOUR HYBRID RECOMMENDERS What they are,

    and Why you might want one.
  5. COLLABORATIVE FILTERING IS THE WORKHORSE OF RECOMMENDER SYSTEMS Use historical

    data on co-purchasing behaviour 'Users who bought X also bought...'
  6. USER-ITEM INTERACTIONS AS A SPARSE MATRIX I = ⎛ ⎝

    ⎜ ⎜ ⎜ ⎜ ⎜ 1.0 0.0 ⋮ 1.0 0.0 1.0 ⋮ 1.0 ⋯ ⋯ ⋱ ⋯ 1.0 0.0 ⋮ 1.0 ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ ⎟
  7. IN THE SIMPLEST CASE, THAT'S ENOUGH TO MAKE RECOMMENDATIONS find

    similar users by calculating the distance between the rows that represent them recommend items similar users have bought, weighted by the degree of similarity
  8. Represent as a product of two reduced-rank matrices and .

    MOST APPLICATIONS USE SOME FORM OF MATRIX FACTORIZATION I U P
  9. THIS WORKS REMARKABLY WELL IF YOU HAVE A LOT OF

    DATA domain-agnostic: don't need to know anything about the users and items easy to understand and implement chief component of the Netflix-prize-winning ensemble MF yields nice, low-dimensional item representations, useful if you want to do related products
  10. BUT WHAT IF YOUR DATA IS SPARSE? large product inventory

    short-lived products lots of new users
  11. CAN'T COMPUTE SIMILARITIES most users haven't bought most items haven't

    been bought
  12. PERFORMS NO BETTER THAN RANDOM

  13. CONTENT-BASED MODELS TO THE RESCUE collect metadata about items construct

    a classifier for each user
  14. PROBLEMS need to have plenty of data for each user

    no information sharing across users doesn't provide compact representations for item similarity
  15. 'Gucci Evening Dress' and 'Givenchy Ball Gown' DOESN'T CAPTURE SIMILARITY

  16. SOLUTION: USE A HYBRID MODEL

  17. It's called LightFM. DISCLAIMER: THIS IS WHERE I TRY TO

    CONVINCE YOU TO USE MY RECOMMENDER PACKAGE
  18. A VARIANT OF MATRIX FACTORIZATION Instead of estimating a latent

    vector per user and item, estimate latent vectors for user and item metadata. User and items ids can also be included if you have enough data.
  19. The representation for 'Givenchy Ball Gown' is the element- wise

    sum of representations for 'givenchy', 'ball', and 'gown'. The representation for a female user with id 100 is the element-wise sum of representations for 'female' and 'ID 100'.
  20. The prediction for a user-item pair is given by the

    inner product of their representations.
  21. Two independent fully-connected layers, one with user, the other with

    item features as inputs, connected via a dot product. NEURAL NETWORK PERSPECTIVE
  22. BENEFITS fewer parameters to estimate can make predictions for new

    items and new users captures synonymy produces nice dense item representations reduces to a standard MF model as a special case
  23. EXAMPLE: CROSS-VALIDATED Try to predict which questions will users answer

    A ranking task, measure AUC
  24. PURE COLLABORATIVE FILTERING AUC of 0.43 worse than random little

    data, lots of parameters massive overfitting
  25. PURE CONTENT-BASED SOLUTION fit a separate logistic regression model for

    each user AUC of 0.66 a lot better
  26. HYBRID SOLUTION AUC of 0.71 best result get tag embeddings

    as an extra benefit
  27. TAG SIMILARITY 'bayesian': 'mcmc', 'variational-bayes' 'survival': 'cox-model', 'odds-ratio', 'kaplan-meier'

  28. Both are essentially matrix factorization algorithms SIMILAR TO WORD2VEC

  29. If you have lots of new users or new items,

    you will benefit from a hybrid algorithm IN SUMMARY
  30. Even if you don't face cold-start, you might still want

    to use LightFM.
  31. EASY TO USE from lightfm import LightFM model = LightFM(loss='warp',

    learning_rate=0.01, learning_schedule='adagrad', no_components=30) model.fit(interactions, item_features=item_features, user_features=user_features, num_threads=4, epochs=epochs)
  32. FAST Written in Cython Supports multicore training via Hogwild

  33. LEARNING-TO-RANK Supports learning-to-rank objectives BPR WARP

  34. ASIDE: LEARNING-TO-RANK IS A GREAT IDEA A Siamese network with

    triplet loss in NN parlance WARP is especially effective
  35. Adagrad and Adadelta PER-PARAMETER LEARNING RATES

  36. pip install lightfm github.com/lyst/lightfm