FluRS: A Library for Streaming Recommendation Algorithms

FluRS A Library for Streaming Recommendation Algorithms Takuya Kitazawa @takuti

$ whoami Treasure Data, Inc. Data Science Engineer Apache Hivemall
Committer

“Recommender Systems” 4

What I talk * But it’s NOT all about recommender
systems! Past Batch Static Present Streaming Dynamic Future At scale in production 5

Past: The era of user-item matrix

Collaborative Filtering (CF; k-Nearest-Neighbors) Finding similar users (items) from history
7

import numpy as np import numpy.linalg as ln def similarity(x,
y): return np.inner(x, y) / (ln.norm(x, ord=2) * ln.norm(y, ord=2)) user_a = np.array([5, 0, 1, 1, 0, 2]) user_b = np.array([0, 2, 0, 4, 0, 4]) user_c = np.array([4, 5, 0, 1, 1, 2]) print(similarity(user_a, user_b)) # 0.359210604054 print(similarity(user_a, user_c)) # 0.654953146328 user c is more similar than b

Singular Value Decomposition (SVD) Computationally cheaper, mathematically tractable than CF
9

import numpy as np import numpy.linalg as ln # 5
users * 6 items A = np.array([[5, 0, 1, 1, 0, 2], [0, 2, 0, 4, 0, 4], [4, 5, 0, 1, 1, 2], [0, 0, 3, 5, 2, 0], [2, 0, 1, 0, 4, 4]]) U, s, V = ln.svd(A, full_matrices=False) # represent user/item characteristics in a lower dimensional space k = 2 A_approx = np.dot(np.dot(U[:, :k], np.diag(s[:k])), V[:k, :]) print(A_approx) # [[ 3.19741238 1.98064059 0.19763307 0.50430074 1.04148574 2.47123826] # [ 1.20450954 1.18625722 1.50361641 3.5812116 1.61569345 2.37803076] # [ 4.36792826 2.68465163 0.20157659 0.52659617 1.36419993 3.30665072] # [-0.94009727 0.07701659 2.08296828 4.93223799 1.52652414 1.44132726] # [ 2.67985286 1.80342544 0.63125085 1.52750202 1.27145836 2.54266834]] Missing values are ﬁlled

Matrix Factorization (MF) More feasible in terms of running time
& missing value imputation 11

import numpy as np # 5 users * 6 items
R = np.array([[5, 0, 1, 1, 0, 2], [0, 2, 0, 4, 0, 4], [4, 5, 0, 1, 1, 2], [0, 0, 3, 5, 2, 0], [2, 0, 1, 0, 4, 4]]) n_user, n_item = R.shape # represent user/item characteristics in a lower dimensional space k = 2 For the same data

P = np.random.rand(n_user, k) Q = np.random.rand(n_item, k) for user
in range(n_user): for item in range(n_item): if R[user, item] == 0: continue p, q = P[user], Q[item] err = R[user, item] - np.inner(p, q) next_p = p - 0.1 * (-2. * (err * q - 0.01 * p)) next_q = q - 0.1 * (-2. * (err * p - 0.01 * q)) P[user], Q[user] = next_p, next_q print(np.dot(P, Q.T)) # [[ 1.44089222 2.10861345 1.35586737 1.30939713 2.25707035 1.10801462] # [ 1.93053648 3.03696723 1.89003927 1.94703341 3.2410682 1.5074436 ] # [ 1.93037675 3.08600948 1.90697021 1.99171424 3.2913027 1.51264917] # [ 1.95476578 3.09267286 1.91985778 1.98747126 3.29976689 1.52826493] # [ 2.27963689 3.58058058 2.22988788 2.29405565 3.82145282 1.77943416]] Randomly guess “factors” Ignore useless zero elements Adjust factors based on estimation error Missing values are ﬁlled

History: Netﬂix Prize (2006-2009) 14 https://digit.hbs.org/submission/the-netﬂix-prize-crowdsourcing-to-improve-dvd-recommendations/

“Netflix never implemented that solution itself” https://www.techdirt.com/blog/innovation/articles/20120409/03412518422/why-netflix-never- implemented-algorithm-that-won-netflix-1-million-challenge.shtml Poor scalability
on dynamic user/item data

Present: Streamed rich user-item data as one possible approach to
improve scalability time … 21 yrs. man Student Genre: music Price: $1000 Context e.g., “when”, “where”

Streaming recommender systems Update recommendation model in real-time 17 Recommend
top-N items to users User interacts with items Update recommendation model on-the-ﬂy ‣ Incremental CF ‣ Incremental SVD ‣ Incremental MF

FluRS Flu-* (Flux, Fluid, Fluent) Recommender Systems $ pip install
flurs https://github.com/takuti/ﬂurs

Uniﬁed data representation Algorithm- agnostic Streaming evaluation 19 Available for
wide variety of realistic data Separate recommender-speciﬁc implementation from algorithm code Monitor accuracy of streaming recommendation in appropriate scheme

User 20 Item - index - feature vector Event -
context vector Recommender - initialize()  - register(user), register(item) - update(event) - recommend(user) Model Evaluator  (recommender) - fit(event[])  - evaluate(event[])

Feature-based recommender - Factorization Machine (FM) Create prediction model from
context-aware feature vectors 21 S. Rendle. Factorization Machines with libFM. ACM Transactions on Intelligent Systems and Technology, 3(3), May 2012.

Make FM “incremental” somehow 22

from flurs.data.entity import User, Item user_a = User(0, feature=np.array([0, 0,
1])) user_b = User(1, feature=np.array([1, 0, 1])) user_c = User(2, feature=np.array([1, 3, 1])) item_a = Item(0, feature=np.array([2, 1, 1])) item_b = Item(1, feature=np.array([0, 2, 1]))

from flurs.recommender.fm import FMRecommender # initialize a recommendation instance recommender
= FMRecommender(p=8, k=2) recommender.initialize() # register users and items recommender.register(user_a) recommender.register(user_b) recommender.register(user_c) recommender.register(item_a) recommender.register(item_b) # feed some events recommender.update(Event(user_a, item_b, context=np.array([1, 1]))) recommender.update(Event(user_b, item_b, context=np.array([0, 2]))) recommender.update(Event(user_a, item_a, context=np.array([1, 3]))) # make recommendation to `user_c` candidates = np.array([0, 1]) recommender.recommend(user_c, candidates, context=np.array([0, 4])) Context-aware recommendation

Evaluator(recommender) “test-then-learn” evaluation scheme J. Vinagre et al. Fast Incremental
Matrix Factorization for Recommendation with Positive-only . In Proc. of UMAP 2014, pp. 459–470, July 2014. 25 T. Kitazawa. Incremental Factorization Machines for Persistently Cold-Starting Online Item Recommendation. arXiv:1607.02858 [cs.LG], July 2016. fit(event[]) evaluate(event[])

Easily develop & evaluate your own recommender You just need
to follow the FluRS’s way 26

Future: Scaling recommender in production Personalization is everywhere in various
ways as Netﬂix said “Everything is recommendation” https://www.slideshare.net/justinbasilico/past-present-future-of-recommender-systems-an-industry-perspective

Why Python? It’s for production! Versatile Stream and store data
internally Trial-and-error Develop “hybrid” algorithm Portable Integrate w/ existing code 28 FluRS will be updated in these aspect :)

‣ Surprise (Python) http://surpriselib.com/ ‣ fastFM (Python) http://ibayer.github.io/fastFM/ ‣ Implicit
(Python) http://implicit.readthedocs.io/en/latest/ ‣ MyMediaLite (C#) http://www.mymedialite.net/ ‣ LibRec (Java) https://www.librec.net/ ‣ LensKit (Java) http://lenskit.org/ On Apache Hadoop, Hive, Spark: ‣ Apache Mahout http://mahout.apache.org/ ‣ Apache Hivemall https://hivemall.incubator.apache.org/ ‣ Spark MLlib https://spark.apache.org/mllib/ OSS for Recommender Systems 29 GET INSPIRATION AND CUSTOMIZE FOR “YOUR” PRODUCTION

FluRS: A Library for Streaming Recommendation A...

FluRS: A Library for Streaming Recommendation Algorithms

Takuya Kitazawa

More Decks by Takuya Kitazawa

Other Decks in Programming

Featured

Transcript