Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Trends in Real-world Recommender Systems

Trends in Real-world Recommender Systems

37130a5f1550eb2d91e640cedf907a78?s=128

Takuya Kitazawa

November 21, 2017
Tweet

Transcript

  1. Trends in Real-world Recommender Systems Your “fancy” algorithm doesn’t scale

    in production Takuya Kitazawa @takuti
  2. $ whoami Treasure Data, Inc. Data Science Engineer Apache Hivemall

    Committer * All contents are based on the speaker's own thought, and they do NOT reflect the view of any of his previous and current affiliations.
  3. takuti.me

  4. None
  5. None
  6. Trend Beyond rating Realistic scenario Me Persistent cold-start Online algorithm

    Future New application Production scale
  7. Messages Recommendation ≠ Machine Learning Keep Things Simple, Be Data-Driven

    Get Outside of Your Lab
  8. User Modeling in Folksonomies Persistently Cold-Starting Online Item Recommendation Users

    Web pages + Tag Master’s thesis (2016) Bachelor’s thesis (2014) Internship
  9. Master’s thesis (2016) Users Web pages + Tag #BUDI 0OMJOF

    User Modeling in Folksonomies Persistently Cold-Starting Online Item Recommendation Bachelor’s thesis (2014) Internship
  10. Master’s thesis (2016) Users Web pages + Tag #BUDI 0OMJOF

    User Modeling in Folksonomies Persistently Cold-Starting Online Item Recommendation Bachelor’s thesis (2014) Trend? Internship
  11. ACM RecSys Conference 2014-2017 https://takuti.me/note/recsys-wordcloud/ 2014 2016 2015 2017

  12. ACM RecSys Conference 2014-2017 https://takuti.me/note/recsys-wordcloud/ 2014 2016 2015 2017 Beyond

    collaborative filtering on rating
  13. “Netflix never implemented that solution itself” https://digit.hbs.org/submission/the-netflix-prize-crowdsourcing-to-improve-dvd-recommendations/ https://www.techdirt.com/blog/innovation/articles/20120409/03412518422/why-netflix-never-implemented-algorithm-that-won-netflix-1-million-challenge.shtml

  14. https://digit.hbs.org/submission/the-netflix-prize-crowdsourcing-to-improve-dvd-recommendations/ https://www.techdirt.com/blog/innovation/articles/20120409/03412518422/why-netflix-never-implemented-algorithm-that-won-netflix-1-million-challenge.shtml Change from US DVDs to global streaming Did

    not scale against dynamic growth of users and items Use more blended technique
  15. https://www.slideshare.net/optimaltransformation/a-collection-of-quotes-from-albert-einstein

  16. System requirements Wide-ranging applications and data “Practices” Scalability Batch vs

    streaming Social networks Product review (EC) Group recommendation
  17. Recommendation is Predicting users’ unforeseen behavior from data Users’ history

    Item attributes Context …
  18. Recommendation is Predicting users’ unforeseen behavior from data But,

  19. Recommendation ≠ Machine Learning

  20. Practice: Golf package recommendation at Rakuten Course Price Options
 (e.g.

    caddy, lunch) + + ML as a tool Interpretable Simple R. Swezey and Y. Chung. Recommending Short-Lived Dynamic Packages for Golf Booking Services. CIKM 2015.
  21. Theory: My new recommender

  22. Factorization Machines S. Rendle. Factorization Machines with libFM. ACM Transactions

    on Intelligent Systems and Technology, 3(3).
  23. Practice: My new “fancy” recommender on real data Poor accuracy

    Many hyper-params Inefficient Worse than Matrix Factorization Don’t squeeze everything into single method
  24. Keep Things Simple, Be Data-Driven

  25. # of data = # of solutions Whew! My new

    algorithm beats well-known methods!
  26. # of data = # of solutions Always recommend “most

    popular” items ML-ish techniques Whew! My new algorithm beats well-known methods! Accuracy High Low
  27. Simplest: Non-personalized recommendation Most Popular Average rating Random

  28. Do the “minimum” math https://takuti.me/note/the-amazon-way-on-iot/

  29. Q. Which technique should I use?

  30. Q. Which technique should I use? A. It depends on

    your data and application
  31. Get Outside of Your Lab

  32. Persistent cold-start problem at Rakuten Institute of Technology

  33. Golf package recommendation at Rakuten Course Price Options
 (caddy, lunch,

    …) + + Q. What happens for dynamic trends (e.g., changing price and/or users’ taste)
  34. Persistent cold-start on ad data (Yahoo! Lab; 2013)

  35. Persistent cold-start on real web service (Booking.com; 2015)

  36. Persistent cold-start Online update Rich auxiliary data Incremental Factorization Machines

    Persistently Cold-Starting Online Item Recommendation RecProfile 2016 Master’s thesis Problem Effective approach
  37. Production-level algorithm should be “usable” at Treasure Data

  38. Implement anomaly detection algorithms Test on real system metrics https://takuti.me/note/td-intern-2016/

  39. Time-series data e.g., syslog Outlier and change-point in time-series data

    STEP 1 Find patterns from past observations Wide-scale “global” change time value … … 1508966854 290 1508966853 294 1508966852 38 1508966852 290 1508966851 294 1508958753 301 1508955307 38 1508954422 38 1508948503 38 … … Change-Point Spiky “local” data point Outlier STEP 2 Compute score at each point in time “How far from past pattern”
  40. ‣ Probabilistic approach ‣ Many hyper-parameters and sensitive result ‣

    Mathematically tractable, numerical algebraic approach ‣ Minimum # of hyper-params with robust result ‣ Efficient approximation scheme ChangeFinder Singular Spectrum Transformation Easy-to-use, Interpretable
  41. Similarities between anomaly detection and recommendation Feature-expressiveness Rich vector representation

    Online-updating Finding similar/dissimilar samples in real-time Usability Simple hyper-params, interpretable result Scalability Production-level efficient back-end system Implicit feedback Binary feedback (buy or not, anomaly or not)
  42. Apply “usable” anomaly detection method for recommendation

  43. RecSys 2016 tutorial by Quora Implicit >>> Explicit https://www.slideshare.net/xamat/recsys-2016-tutorial-lessons-learned-from-building-reallife-recommender-systems

  44. Don’t be algorithm-driven at Silver Egg Technology

  45. ‣ 1M+ purchase log ‣ Attributes - Customer’s session ID

    - Item ID - Timestamp Started from algorithm Real e-commerce data Lack of features
  46. Understanding data Small amount of daily purchase Customers Items 0.0086%

    nonzero Need to take advantage of sparsity in terms of both algorithm and implementation
  47. Understanding data Rapidly increasing # of customers and items High

    dimensionality Customers Items
  48. Understanding data Small % of customers/items contribute many purchases Massive

    “useless” customers and items Customers Items
  49. Understanding data Timestamp represents seasonality

  50. Assumption My algorithm might NOT be effective on this data…

  51. Anyway, let me try as much as I can… Dimensionality

    reduction by hashing Store item candidates with time window ‣ Only use most-recently observed 100 items for recommendation
  52. Lessons Start from data Understanding data leads appropriate algorithm Think

    of hybrid approach
  53. Messages Recommendation ≠ Machine Learning Keep Things Simple, Be Data-Driven

    Get Outside of Your Lab
  54. Future: Scaling recommender in production Personalization is everywhere in various

    ways as Netflix said “Everything is recommendation” https://www.slideshare.net/justinbasilico/past-present-future-of-recommender-systems-an-industry-perspective
  55. Listen podcast episode with Dr. Joseph Konstan ‣ “I hate

    Amazon’s first page” ‣ Recommendation for education ‣ Context-aware recommender ‣ Cross validation is NOT realistic ‣ Serendipity ≠ Just “BAD” - = like & didn’t know ‣ …
  56. First step Online course https://takuti.me/note/coursera-recommender-systems/

  57. First step Pre-programmed (mostly static) algorithms and metrics ‣ Surprise

    (Python) http://surpriselib.com/ ‣ fastFM (Python) http://ibayer.github.io/fastFM/ ‣ Implicit (Python) http://implicit.readthedocs.io/en/latest/ ‣ MyMediaLite (C#) http://www.mymedialite.net/ ‣ LibRec (Java) https://www.librec.net/ ‣ LensKit (Java) http://lenskit.org/ On Apache Hadoop, Hive, Spark: ‣ Apache Mahout http://mahout.apache.org/ ‣ Apache Hivemall https://hivemall.incubator.apache.org/ ‣ Spark MLlib https://spark.apache.org/mllib/
  58. And, FluRS :)

  59. Trends in Real-world Recommender Systems Your “fancy” algorithm doesn’t scale

    in production Takuya Kitazawa @takuti