Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Recommender systems in practice

Recommender systems in practice

Hands-on data science meetup, 8 December 2017. https://www.meetup.com/Hands-on-Data-Science/events/244078052/

Alexander Backus

December 08, 2017
Tweet

More Decks by Alexander Backus

Other Decks in Technology

Transcript

  1. SYSTEM DESIGN item user impression preference rating (implicit/explicit) data collaborative

    content / attributes model predictions recommendations attributes
  2. Collaborative filtering with Matrix Factorization COLLABORATIVE FILTERING WITH MATRIX FACTORIZATION

    • User-based and item-based • Customized for implicit feedback • Scalable computation
  3. BASIC SYSTEM PREPROCESSING MODEL item rankings POSTPROCESSING 1 0 0

    1 1 1 1 0 1 user-item interactions user-item ratings 1.2 0.8 0.2 1.3 1.2 1.1 1.2 1.1 0.9 predicted ratings � , = , , = � 1 , > 0 0 , = 0 , = 1 + ,
  4. MATRIX FACTORIZATION Summarize one large matrix into two smaller (lower-rank)

    matrices How does it work? 3 4 5 6 8 10 = 1 2 3 4 5
  5. MATRIX FACTORIZATION predicted ratings = = Τ � Predicted rating

    of user u for item i items � , users ratings × ≈
  6. MATRIX FACTORIZATION 𝑚𝑚 � , 𝑢𝑢 𝑢𝑢 − Τ 2

    + � 2 + � 2 weighted prediction error regularization penalty true rating predicted rating sample weight items users ratings × user profiles item profiles ≈ Alternating Least Squares 0. Initiate and at random 1. Solve alternating: • Fix , optimize • Fix , optimize 2. Repeat until convergence
  7. MATRIX FACTORIZATION item profiles × item-item similarities = • Recommend

    item-to-item • Cosine similarity of item profiles , = � 2 2
  8. THE ESSENCE Using data-driven user and item profiles…. user profiles

    item profiles …to predict the preference of a specific user for a specific item ratings predicted ratings
  9. ADVANGATES AND DISADVANTAGES + Simultaneous latent user and item factors

    + Can handle sparse data + Scalable computation − Temporal and popularity biases − Cold start problem − No context-awareness popularity sorted items TAIL HEAD
  10. CROSS-VALIDATION train train train test Temporal split Quasi-random split train

    test items users Note: test user and item need to be present in train set
  11. 0.75 0.25 OFFLINE EVALUATION Ranking metrics Top items more important?

    (MRR, MAP, nDCG) Simple: Average Percentile Rank TEST TRAIN APR =
  12. CONTEXT-AWARE RECOMMENDERS Neural network view of matrix factorization context metadata

    standard matrix factorization factorization layer … … sparse features output layer … × × … … … dense features / embeddings × biases +
  13. PRACTICAL ADVISE • Never instantiate full user-item matrix! • Based

    on volumes, go for scalable framework (e.g. Spark MLlib) • Based on requirements, go for flexible framework (e.g. TensorFlow)
  14. SUMMARY • Recommender use cases • Types of recommender algorithms

    • Matrix factorization • Recommender evaluation • Various challenges >> Time for hands-on!