Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Trends in Real-world Recommender Systems

Trends in Real-world Recommender Systems

Takuya Kitazawa

November 21, 2017
Tweet

More Decks by Takuya Kitazawa

Other Decks in Technology

Transcript

  1. $ whoami Treasure Data, Inc. Data Science Engineer Apache Hivemall

    Committer * All contents are based on the speaker's own thought, and they do NOT reflect the view of any of his previous and current affiliations.
  2. User Modeling in Folksonomies Persistently Cold-Starting Online Item Recommendation Users

    Web pages + Tag Master’s thesis (2016) Bachelor’s thesis (2014) Internship
  3. Master’s thesis (2016) Users Web pages + Tag #BUDI 0OMJOF

    User Modeling in Folksonomies Persistently Cold-Starting Online Item Recommendation Bachelor’s thesis (2014) Internship
  4. Master’s thesis (2016) Users Web pages + Tag #BUDI 0OMJOF

    User Modeling in Folksonomies Persistently Cold-Starting Online Item Recommendation Bachelor’s thesis (2014) Trend? Internship
  5. System requirements Wide-ranging applications and data “Practices” Scalability Batch vs

    streaming Social networks Product review (EC) Group recommendation
  6. Practice: Golf package recommendation at Rakuten Course Price Options
 (e.g.

    caddy, lunch) + + ML as a tool Interpretable Simple R. Swezey and Y. Chung. Recommending Short-Lived Dynamic Packages for Golf Booking Services. CIKM 2015.
  7. Practice: My new “fancy” recommender on real data Poor accuracy

    Many hyper-params Inefficient Worse than Matrix Factorization Don’t squeeze everything into single method
  8. # of data = # of solutions Whew! My new

    algorithm beats well-known methods!
  9. # of data = # of solutions Always recommend “most

    popular” items ML-ish techniques Whew! My new algorithm beats well-known methods! Accuracy High Low
  10. Golf package recommendation at Rakuten Course Price Options
 (caddy, lunch,

    …) + + Q. What happens for dynamic trends (e.g., changing price and/or users’ taste)
  11. Persistent cold-start Online update Rich auxiliary data Incremental Factorization Machines

    Persistently Cold-Starting Online Item Recommendation RecProfile 2016 Master’s thesis Problem Effective approach
  12. Time-series data e.g., syslog Outlier and change-point in time-series data

    STEP 1 Find patterns from past observations Wide-scale “global” change time value … … 1508966854 290 1508966853 294 1508966852 38 1508966852 290 1508966851 294 1508958753 301 1508955307 38 1508954422 38 1508948503 38 … … Change-Point Spiky “local” data point Outlier STEP 2 Compute score at each point in time “How far from past pattern”
  13. ‣ Probabilistic approach ‣ Many hyper-parameters and sensitive result ‣

    Mathematically tractable, numerical algebraic approach ‣ Minimum # of hyper-params with robust result ‣ Efficient approximation scheme ChangeFinder Singular Spectrum Transformation Easy-to-use, Interpretable
  14. Similarities between anomaly detection and recommendation Feature-expressiveness Rich vector representation

    Online-updating Finding similar/dissimilar samples in real-time Usability Simple hyper-params, interpretable result Scalability Production-level efficient back-end system Implicit feedback Binary feedback (buy or not, anomaly or not)
  15. ‣ 1M+ purchase log ‣ Attributes - Customer’s session ID

    - Item ID - Timestamp Started from algorithm Real e-commerce data Lack of features
  16. Understanding data Small amount of daily purchase Customers Items 0.0086%

    nonzero Need to take advantage of sparsity in terms of both algorithm and implementation
  17. Anyway, let me try as much as I can… Dimensionality

    reduction by hashing Store item candidates with time window ‣ Only use most-recently observed 100 items for recommendation
  18. Future: Scaling recommender in production Personalization is everywhere in various

    ways as Netflix said “Everything is recommendation” https://www.slideshare.net/justinbasilico/past-present-future-of-recommender-systems-an-industry-perspective
  19. Listen podcast episode with Dr. Joseph Konstan ‣ “I hate

    Amazon’s first page” ‣ Recommendation for education ‣ Context-aware recommender ‣ Cross validation is NOT realistic ‣ Serendipity ≠ Just “BAD” - = like & didn’t know ‣ …
  20. First step Pre-programmed (mostly static) algorithms and metrics ‣ Surprise

    (Python) http://surpriselib.com/ ‣ fastFM (Python) http://ibayer.github.io/fastFM/ ‣ Implicit (Python) http://implicit.readthedocs.io/en/latest/ ‣ MyMediaLite (C#) http://www.mymedialite.net/ ‣ LibRec (Java) https://www.librec.net/ ‣ LensKit (Java) http://lenskit.org/ On Apache Hadoop, Hive, Spark: ‣ Apache Mahout http://mahout.apache.org/ ‣ Apache Hivemall https://hivemall.incubator.apache.org/ ‣ Spark MLlib https://spark.apache.org/mllib/