FluRS: A Library for Streaming Recommendation Algorithms

Slide 1

Slide 1 text

FluRS A Library for Streaming Recommendation Algorithms Takuya Kitazawa @takuti

Slide 2

Slide 2 text

$ whoami Treasure Data, Inc. Data Science Engineer Apache Hivemall Committer

Slide 3

Slide 3 text

“Recommender Systems” 4

Slide 4

Slide 4 text

What I talk * But it’s NOT all about recommender systems! Past Batch Static Present Streaming Dynamic Future At scale in production 5

Slide 5

Slide 5 text

Past: The era of user-item matrix

Slide 6

Slide 6 text

Collaborative Filtering (CF; k-Nearest-Neighbors) Finding similar users (items) from history 7

Slide 7

Slide 7 text

import numpy as np import numpy.linalg as ln def similarity(x, y): return np.inner(x, y) / (ln.norm(x, ord=2) * ln.norm(y, ord=2)) user_a = np.array([5, 0, 1, 1, 0, 2]) user_b = np.array([0, 2, 0, 4, 0, 4]) user_c = np.array([4, 5, 0, 1, 1, 2]) print(similarity(user_a, user_b)) # 0.359210604054 print(similarity(user_a, user_c)) # 0.654953146328 user c is more similar than b

Slide 8

Slide 8 text

Singular Value Decomposition (SVD) Computationally cheaper, mathematically tractable than CF 9

Slide 9

Slide 9 text

import numpy as np import numpy.linalg as ln # 5 users * 6 items A = np.array([[5, 0, 1, 1, 0, 2], [0, 2, 0, 4, 0, 4], [4, 5, 0, 1, 1, 2], [0, 0, 3, 5, 2, 0], [2, 0, 1, 0, 4, 4]]) U, s, V = ln.svd(A, full_matrices=False) # represent user/item characteristics in a lower dimensional space k = 2 A_approx = np.dot(np.dot(U[:, :k], np.diag(s[:k])), V[:k, :]) print(A_approx) # [[ 3.19741238 1.98064059 0.19763307 0.50430074 1.04148574 2.47123826] # [ 1.20450954 1.18625722 1.50361641 3.5812116 1.61569345 2.37803076] # [ 4.36792826 2.68465163 0.20157659 0.52659617 1.36419993 3.30665072] # [-0.94009727 0.07701659 2.08296828 4.93223799 1.52652414 1.44132726] # [ 2.67985286 1.80342544 0.63125085 1.52750202 1.27145836 2.54266834]] Missing values are ﬁlled

Slide 10

Slide 10 text

Matrix Factorization (MF) More feasible in terms of running time & missing value imputation 11

Slide 11

Slide 11 text

import numpy as np # 5 users * 6 items R = np.array([[5, 0, 1, 1, 0, 2], [0, 2, 0, 4, 0, 4], [4, 5, 0, 1, 1, 2], [0, 0, 3, 5, 2, 0], [2, 0, 1, 0, 4, 4]]) n_user, n_item = R.shape # represent user/item characteristics in a lower dimensional space k = 2 For the same data

Slide 12

Slide 12 text

P = np.random.rand(n_user, k) Q = np.random.rand(n_item, k) for user in range(n_user): for item in range(n_item): if R[user, item] == 0: continue p, q = P[user], Q[item] err = R[user, item] - np.inner(p, q) next_p = p - 0.1 * (-2. * (err * q - 0.01 * p)) next_q = q - 0.1 * (-2. * (err * p - 0.01 * q)) P[user], Q[user] = next_p, next_q print(np.dot(P, Q.T)) # [[ 1.44089222 2.10861345 1.35586737 1.30939713 2.25707035 1.10801462] # [ 1.93053648 3.03696723 1.89003927 1.94703341 3.2410682 1.5074436 ] # [ 1.93037675 3.08600948 1.90697021 1.99171424 3.2913027 1.51264917] # [ 1.95476578 3.09267286 1.91985778 1.98747126 3.29976689 1.52826493] # [ 2.27963689 3.58058058 2.22988788 2.29405565 3.82145282 1.77943416]] Randomly guess “factors” Ignore useless zero elements Adjust factors based on estimation error Missing values are ﬁlled

Slide 13

Slide 13 text

History: Netﬂix Prize (2006-2009) 14 https://digit.hbs.org/submission/the-netﬂix-prize-crowdsourcing-to-improve-dvd-recommendations/

Slide 14

Slide 14 text

“Netflix never implemented that solution itself” https://www.techdirt.com/blog/innovation/articles/20120409/03412518422/why-netflix-never- implemented-algorithm-that-won-netflix-1-million-challenge.shtml Poor scalability on dynamic user/item data

Slide 15

Slide 15 text

Present: Streamed rich user-item data as one possible approach to improve scalability time … 21 yrs. man Student Genre: music Price: $1000 Context e.g., “when”, “where”

Slide 16

Slide 16 text

Streaming recommender systems Update recommendation model in real-time 17 Recommend top-N items to users User interacts with items Update recommendation model on-the-ﬂy ‣ Incremental CF ‣ Incremental SVD ‣ Incremental MF

Slide 17

Slide 17 text

FluRS Flu-* (Flux, Fluid, Fluent) Recommender Systems $ pip install flurs https://github.com/takuti/ﬂurs

Slide 18

Slide 18 text

Uniﬁed data representation Algorithm- agnostic Streaming evaluation 19 Available for wide variety of realistic data Separate recommender-speciﬁc implementation from algorithm code Monitor accuracy of streaming recommendation in appropriate scheme

Slide 19

Slide 19 text

User 20 Item - index - feature vector Event - context vector Recommender - initialize()  - register(user), register(item) - update(event) - recommend(user) Model Evaluator  (recommender) - fit(event[])  - evaluate(event[])

Slide 20

Slide 20 text

Feature-based recommender - Factorization Machine (FM) Create prediction model from context-aware feature vectors 21 S. Rendle. Factorization Machines with libFM. ACM Transactions on Intelligent Systems and Technology, 3(3), May 2012.

Slide 21

Slide 21 text

Make FM “incremental” somehow 22

Slide 22

Slide 22 text

from flurs.data.entity import User, Item user_a = User(0, feature=np.array([0, 0, 1])) user_b = User(1, feature=np.array([1, 0, 1])) user_c = User(2, feature=np.array([1, 3, 1])) item_a = Item(0, feature=np.array([2, 1, 1])) item_b = Item(1, feature=np.array([0, 2, 1]))

Slide 23

Slide 23 text

from flurs.recommender.fm import FMRecommender # initialize a recommendation instance recommender = FMRecommender(p=8, k=2) recommender.initialize() # register users and items recommender.register(user_a) recommender.register(user_b) recommender.register(user_c) recommender.register(item_a) recommender.register(item_b) # feed some events recommender.update(Event(user_a, item_b, context=np.array([1, 1]))) recommender.update(Event(user_b, item_b, context=np.array([0, 2]))) recommender.update(Event(user_a, item_a, context=np.array([1, 3]))) # make recommendation to `user_c` candidates = np.array([0, 1]) recommender.recommend(user_c, candidates, context=np.array([0, 4])) Context-aware recommendation

Slide 24

Slide 24 text

Evaluator(recommender) “test-then-learn” evaluation scheme J. Vinagre et al. Fast Incremental Matrix Factorization for Recommendation with Positive-only . In Proc. of UMAP 2014, pp. 459–470, July 2014. 25 T. Kitazawa. Incremental Factorization Machines for Persistently Cold-Starting Online Item Recommendation. arXiv:1607.02858 [cs.LG], July 2016. fit(event[]) evaluate(event[])

Slide 25

Slide 25 text

Easily develop & evaluate your own recommender You just need to follow the FluRS’s way 26

Slide 26

Slide 26 text

Future: Scaling recommender in production Personalization is everywhere in various ways as Netﬂix said “Everything is recommendation” https://www.slideshare.net/justinbasilico/past-present-future-of-recommender-systems-an-industry-perspective

Slide 27

Slide 27 text

Why Python? It’s for production! Versatile Stream and store data internally Trial-and-error Develop “hybrid” algorithm Portable Integrate w/ existing code 28 FluRS will be updated in these aspect :)

Slide 28

Slide 28 text

‣ Surprise (Python) http://surpriselib.com/ ‣ fastFM (Python) http://ibayer.github.io/fastFM/ ‣ Implicit (Python) http://implicit.readthedocs.io/en/latest/ ‣ MyMediaLite (C#) http://www.mymedialite.net/ ‣ LibRec (Java) https://www.librec.net/ ‣ LensKit (Java) http://lenskit.org/ On Apache Hadoop, Hive, Spark: ‣ Apache Mahout http://mahout.apache.org/ ‣ Apache Hivemall https://hivemall.incubator.apache.org/ ‣ Spark MLlib https://spark.apache.org/mllib/ OSS for Recommender Systems 29 GET INSPIRATION AND CUSTOMIZE FOR “YOUR” PRODUCTION

Slide 29

Slide 29 text

No content