FluRS: A Library for Streaming Recommendation Algorithms

FluRS: A Library for Streaming Recommendation Algorithms

37130a5f1550eb2d91e640cedf907a78?s=128

Takuya Kitazawa

August 31, 2017
Tweet

Transcript

  1. FluRS A Library for Streaming Recommendation Algorithms Takuya Kitazawa @takuti

  2. $ whoami Treasure Data, Inc. Data Science Engineer Apache Hivemall

    Committer
  3. “Recommender Systems” 4

  4. What I talk * But it’s NOT all about recommender

    systems! Past Batch Static Present Streaming Dynamic Future At scale in production 5
  5. Past: The era of user-item matrix

  6. Collaborative Filtering (CF; k-Nearest-Neighbors) Finding similar users (items) from history

    7
  7. import numpy as np import numpy.linalg as ln def similarity(x,

    y): return np.inner(x, y) / (ln.norm(x, ord=2) * ln.norm(y, ord=2)) user_a = np.array([5, 0, 1, 1, 0, 2]) user_b = np.array([0, 2, 0, 4, 0, 4]) user_c = np.array([4, 5, 0, 1, 1, 2]) print(similarity(user_a, user_b)) # 0.359210604054 print(similarity(user_a, user_c)) # 0.654953146328 user c is more similar than b
  8. Singular Value Decomposition (SVD) Computationally cheaper, mathematically tractable than CF

    9
  9. import numpy as np import numpy.linalg as ln # 5

    users * 6 items A = np.array([[5, 0, 1, 1, 0, 2], [0, 2, 0, 4, 0, 4], [4, 5, 0, 1, 1, 2], [0, 0, 3, 5, 2, 0], [2, 0, 1, 0, 4, 4]]) U, s, V = ln.svd(A, full_matrices=False) # represent user/item characteristics in a lower dimensional space k = 2 A_approx = np.dot(np.dot(U[:, :k], np.diag(s[:k])), V[:k, :]) print(A_approx) # [[ 3.19741238 1.98064059 0.19763307 0.50430074 1.04148574 2.47123826] # [ 1.20450954 1.18625722 1.50361641 3.5812116 1.61569345 2.37803076] # [ 4.36792826 2.68465163 0.20157659 0.52659617 1.36419993 3.30665072] # [-0.94009727 0.07701659 2.08296828 4.93223799 1.52652414 1.44132726] # [ 2.67985286 1.80342544 0.63125085 1.52750202 1.27145836 2.54266834]] Missing values are filled
  10. Matrix Factorization (MF) More feasible in terms of running time

    & missing value imputation 11
  11. import numpy as np # 5 users * 6 items

    R = np.array([[5, 0, 1, 1, 0, 2], [0, 2, 0, 4, 0, 4], [4, 5, 0, 1, 1, 2], [0, 0, 3, 5, 2, 0], [2, 0, 1, 0, 4, 4]]) n_user, n_item = R.shape # represent user/item characteristics in a lower dimensional space k = 2 For the same data
  12. P = np.random.rand(n_user, k) Q = np.random.rand(n_item, k) for user

    in range(n_user): for item in range(n_item): if R[user, item] == 0: continue p, q = P[user], Q[item] err = R[user, item] - np.inner(p, q) next_p = p - 0.1 * (-2. * (err * q - 0.01 * p)) next_q = q - 0.1 * (-2. * (err * p - 0.01 * q)) P[user], Q[user] = next_p, next_q print(np.dot(P, Q.T)) # [[ 1.44089222 2.10861345 1.35586737 1.30939713 2.25707035 1.10801462] # [ 1.93053648 3.03696723 1.89003927 1.94703341 3.2410682 1.5074436 ] # [ 1.93037675 3.08600948 1.90697021 1.99171424 3.2913027 1.51264917] # [ 1.95476578 3.09267286 1.91985778 1.98747126 3.29976689 1.52826493] # [ 2.27963689 3.58058058 2.22988788 2.29405565 3.82145282 1.77943416]] Randomly guess “factors” Ignore useless zero elements Adjust factors based on estimation error Missing values are filled
  13. History: Netflix Prize (2006-2009) 14 https://digit.hbs.org/submission/the-netflix-prize-crowdsourcing-to-improve-dvd-recommendations/

  14. “Netflix never implemented that solution itself” https://www.techdirt.com/blog/innovation/articles/20120409/03412518422/why-netflix-never- implemented-algorithm-that-won-netflix-1-million-challenge.shtml Poor scalability

    on dynamic user/item data
  15. Present: Streamed rich user-item data as one possible approach to

    improve scalability time … 21 yrs. man Student Genre: music Price: $1000 Context e.g., “when”, “where”
  16. Streaming recommender systems Update recommendation model in real-time 17 Recommend

    top-N items to users User interacts with items Update recommendation model on-the-fly ‣ Incremental CF ‣ Incremental SVD ‣ Incremental MF
  17. FluRS Flu-* (Flux, Fluid, Fluent) Recommender Systems $ pip install

    flurs https://github.com/takuti/flurs
  18. Unified data representation Algorithm- agnostic Streaming evaluation 19 Available for

    wide variety of realistic data Separate recommender-specific implementation from algorithm code Monitor accuracy of streaming recommendation in appropriate scheme
  19. User 20 Item - index - feature vector Event -

    context vector Recommender - initialize()
 - register(user), register(item) - update(event) - recommend(user) Model Evaluator
 (recommender) - fit(event[])
 - evaluate(event[])
  20. Feature-based recommender - Factorization Machine (FM) Create prediction model from

    context-aware feature vectors 21 S. Rendle. Factorization Machines with libFM. ACM Transactions on Intelligent Systems and Technology, 3(3), May 2012.
  21. Make FM “incremental” somehow 22

  22. from flurs.data.entity import User, Item user_a = User(0, feature=np.array([0, 0,

    1])) user_b = User(1, feature=np.array([1, 0, 1])) user_c = User(2, feature=np.array([1, 3, 1])) item_a = Item(0, feature=np.array([2, 1, 1])) item_b = Item(1, feature=np.array([0, 2, 1]))
  23. from flurs.recommender.fm import FMRecommender # initialize a recommendation instance recommender

    = FMRecommender(p=8, k=2) recommender.initialize() # register users and items recommender.register(user_a) recommender.register(user_b) recommender.register(user_c) recommender.register(item_a) recommender.register(item_b) # feed some events recommender.update(Event(user_a, item_b, context=np.array([1, 1]))) recommender.update(Event(user_b, item_b, context=np.array([0, 2]))) recommender.update(Event(user_a, item_a, context=np.array([1, 3]))) # make recommendation to `user_c` candidates = np.array([0, 1]) recommender.recommend(user_c, candidates, context=np.array([0, 4])) Context-aware recommendation
  24. Evaluator(recommender) “test-then-learn” evaluation scheme J. Vinagre et al. Fast Incremental

    Matrix Factorization for Recommendation with Positive-only . In Proc. of UMAP 2014, pp. 459–470, July 2014. 25 T. Kitazawa. Incremental Factorization Machines for Persistently Cold-Starting Online Item Recommendation. arXiv:1607.02858 [cs.LG], July 2016. fit(event[]) evaluate(event[])
  25. Easily develop & evaluate your own recommender You just need

    to follow the FluRS’s way 26
  26. Future: Scaling recommender in production Personalization is everywhere in various

    ways as Netflix said “Everything is recommendation” https://www.slideshare.net/justinbasilico/past-present-future-of-recommender-systems-an-industry-perspective
  27. Why Python? It’s for production! Versatile Stream and store data

    internally Trial-and-error Develop “hybrid” algorithm Portable Integrate w/ existing code 28 FluRS will be updated in these aspect :)
  28. ‣ Surprise (Python) http://surpriselib.com/ ‣ fastFM (Python) http://ibayer.github.io/fastFM/ ‣ Implicit

    (Python) http://implicit.readthedocs.io/en/latest/ ‣ MyMediaLite (C#) http://www.mymedialite.net/ ‣ LibRec (Java) https://www.librec.net/ ‣ LensKit (Java) http://lenskit.org/ On Apache Hadoop, Hive, Spark: ‣ Apache Mahout http://mahout.apache.org/ ‣ Apache Hivemall https://hivemall.incubator.apache.org/ ‣ Spark MLlib https://spark.apache.org/mllib/ OSS for Recommender Systems 29 GET INSPIRATION AND CUSTOMIZE FOR “YOUR” PRODUCTION
  29. None