Upgrade to Pro — share decks privately, control downloads, hide ads and more …

FluRS: A Library for Streaming Recommendation Algorithms

FluRS: A Library for Streaming Recommendation Algorithms

Takuya Kitazawa

August 31, 2017
Tweet

More Decks by Takuya Kitazawa

Other Decks in Programming

Transcript

  1. What I talk * But it’s NOT all about recommender

    systems! Past Batch Static Present Streaming Dynamic Future At scale in production 5
  2. import numpy as np import numpy.linalg as ln def similarity(x,

    y): return np.inner(x, y) / (ln.norm(x, ord=2) * ln.norm(y, ord=2)) user_a = np.array([5, 0, 1, 1, 0, 2]) user_b = np.array([0, 2, 0, 4, 0, 4]) user_c = np.array([4, 5, 0, 1, 1, 2]) print(similarity(user_a, user_b)) # 0.359210604054 print(similarity(user_a, user_c)) # 0.654953146328 user c is more similar than b
  3. import numpy as np import numpy.linalg as ln # 5

    users * 6 items A = np.array([[5, 0, 1, 1, 0, 2], [0, 2, 0, 4, 0, 4], [4, 5, 0, 1, 1, 2], [0, 0, 3, 5, 2, 0], [2, 0, 1, 0, 4, 4]]) U, s, V = ln.svd(A, full_matrices=False) # represent user/item characteristics in a lower dimensional space k = 2 A_approx = np.dot(np.dot(U[:, :k], np.diag(s[:k])), V[:k, :]) print(A_approx) # [[ 3.19741238 1.98064059 0.19763307 0.50430074 1.04148574 2.47123826] # [ 1.20450954 1.18625722 1.50361641 3.5812116 1.61569345 2.37803076] # [ 4.36792826 2.68465163 0.20157659 0.52659617 1.36419993 3.30665072] # [-0.94009727 0.07701659 2.08296828 4.93223799 1.52652414 1.44132726] # [ 2.67985286 1.80342544 0.63125085 1.52750202 1.27145836 2.54266834]] Missing values are filled
  4. import numpy as np # 5 users * 6 items

    R = np.array([[5, 0, 1, 1, 0, 2], [0, 2, 0, 4, 0, 4], [4, 5, 0, 1, 1, 2], [0, 0, 3, 5, 2, 0], [2, 0, 1, 0, 4, 4]]) n_user, n_item = R.shape # represent user/item characteristics in a lower dimensional space k = 2 For the same data
  5. P = np.random.rand(n_user, k) Q = np.random.rand(n_item, k) for user

    in range(n_user): for item in range(n_item): if R[user, item] == 0: continue p, q = P[user], Q[item] err = R[user, item] - np.inner(p, q) next_p = p - 0.1 * (-2. * (err * q - 0.01 * p)) next_q = q - 0.1 * (-2. * (err * p - 0.01 * q)) P[user], Q[user] = next_p, next_q print(np.dot(P, Q.T)) # [[ 1.44089222 2.10861345 1.35586737 1.30939713 2.25707035 1.10801462] # [ 1.93053648 3.03696723 1.89003927 1.94703341 3.2410682 1.5074436 ] # [ 1.93037675 3.08600948 1.90697021 1.99171424 3.2913027 1.51264917] # [ 1.95476578 3.09267286 1.91985778 1.98747126 3.29976689 1.52826493] # [ 2.27963689 3.58058058 2.22988788 2.29405565 3.82145282 1.77943416]] Randomly guess “factors” Ignore useless zero elements Adjust factors based on estimation error Missing values are filled
  6. Present: Streamed rich user-item data as one possible approach to

    improve scalability time … 21 yrs. man Student Genre: music Price: $1000 Context e.g., “when”, “where”
  7. Streaming recommender systems Update recommendation model in real-time 17 Recommend

    top-N items to users User interacts with items Update recommendation model on-the-fly ‣ Incremental CF ‣ Incremental SVD ‣ Incremental MF
  8. Unified data representation Algorithm- agnostic Streaming evaluation 19 Available for

    wide variety of realistic data Separate recommender-specific implementation from algorithm code Monitor accuracy of streaming recommendation in appropriate scheme
  9. User 20 Item - index - feature vector Event -

    context vector Recommender - initialize()
 - register(user), register(item) - update(event) - recommend(user) Model Evaluator
 (recommender) - fit(event[])
 - evaluate(event[])
  10. Feature-based recommender - Factorization Machine (FM) Create prediction model from

    context-aware feature vectors 21 S. Rendle. Factorization Machines with libFM. ACM Transactions on Intelligent Systems and Technology, 3(3), May 2012.
  11. from flurs.data.entity import User, Item user_a = User(0, feature=np.array([0, 0,

    1])) user_b = User(1, feature=np.array([1, 0, 1])) user_c = User(2, feature=np.array([1, 3, 1])) item_a = Item(0, feature=np.array([2, 1, 1])) item_b = Item(1, feature=np.array([0, 2, 1]))
  12. from flurs.recommender.fm import FMRecommender # initialize a recommendation instance recommender

    = FMRecommender(p=8, k=2) recommender.initialize() # register users and items recommender.register(user_a) recommender.register(user_b) recommender.register(user_c) recommender.register(item_a) recommender.register(item_b) # feed some events recommender.update(Event(user_a, item_b, context=np.array([1, 1]))) recommender.update(Event(user_b, item_b, context=np.array([0, 2]))) recommender.update(Event(user_a, item_a, context=np.array([1, 3]))) # make recommendation to `user_c` candidates = np.array([0, 1]) recommender.recommend(user_c, candidates, context=np.array([0, 4])) Context-aware recommendation
  13. Evaluator(recommender) “test-then-learn” evaluation scheme J. Vinagre et al. Fast Incremental

    Matrix Factorization for Recommendation with Positive-only . In Proc. of UMAP 2014, pp. 459–470, July 2014. 25 T. Kitazawa. Incremental Factorization Machines for Persistently Cold-Starting Online Item Recommendation. arXiv:1607.02858 [cs.LG], July 2016. fit(event[]) evaluate(event[])
  14. Future: Scaling recommender in production Personalization is everywhere in various

    ways as Netflix said “Everything is recommendation” https://www.slideshare.net/justinbasilico/past-present-future-of-recommender-systems-an-industry-perspective
  15. Why Python? It’s for production! Versatile Stream and store data

    internally Trial-and-error Develop “hybrid” algorithm Portable Integrate w/ existing code 28 FluRS will be updated in these aspect :)
  16. ‣ Surprise (Python) http://surpriselib.com/ ‣ fastFM (Python) http://ibayer.github.io/fastFM/ ‣ Implicit

    (Python) http://implicit.readthedocs.io/en/latest/ ‣ MyMediaLite (C#) http://www.mymedialite.net/ ‣ LibRec (Java) https://www.librec.net/ ‣ LensKit (Java) http://lenskit.org/ On Apache Hadoop, Hive, Spark: ‣ Apache Mahout http://mahout.apache.org/ ‣ Apache Hivemall https://hivemall.incubator.apache.org/ ‣ Spark MLlib https://spark.apache.org/mllib/ OSS for Recommender Systems 29 GET INSPIRATION AND CUSTOMIZE FOR “YOUR” PRODUCTION