Takuya Kitazawa
August 31, 2017
2.3k

# FluRS: A Library for Streaming Recommendation Algorithms

August 31, 2017

## Transcript

Committer

4. ### What I talk * But it’s NOT all about recommender

systems! Past Batch Static Present Streaming Dynamic Future At scale in production 5

7
7. ### import numpy as np import numpy.linalg as ln def similarity(x,

y): return np.inner(x, y) / (ln.norm(x, ord=2) * ln.norm(y, ord=2)) user_a = np.array([5, 0, 1, 1, 0, 2]) user_b = np.array([0, 2, 0, 4, 0, 4]) user_c = np.array([4, 5, 0, 1, 1, 2]) print(similarity(user_a, user_b)) # 0.359210604054 print(similarity(user_a, user_c)) # 0.654953146328 user c is more similar than b

9
9. ### import numpy as np import numpy.linalg as ln # 5

users * 6 items A = np.array([[5, 0, 1, 1, 0, 2], [0, 2, 0, 4, 0, 4], [4, 5, 0, 1, 1, 2], [0, 0, 3, 5, 2, 0], [2, 0, 1, 0, 4, 4]]) U, s, V = ln.svd(A, full_matrices=False) # represent user/item characteristics in a lower dimensional space k = 2 A_approx = np.dot(np.dot(U[:, :k], np.diag(s[:k])), V[:k, :]) print(A_approx) # [[ 3.19741238 1.98064059 0.19763307 0.50430074 1.04148574 2.47123826] # [ 1.20450954 1.18625722 1.50361641 3.5812116 1.61569345 2.37803076] # [ 4.36792826 2.68465163 0.20157659 0.52659617 1.36419993 3.30665072] # [-0.94009727 0.07701659 2.08296828 4.93223799 1.52652414 1.44132726] # [ 2.67985286 1.80342544 0.63125085 1.52750202 1.27145836 2.54266834]] Missing values are ﬁlled
10. ### Matrix Factorization (MF) More feasible in terms of running time

& missing value imputation 11
11. ### import numpy as np # 5 users * 6 items

R = np.array([[5, 0, 1, 1, 0, 2], [0, 2, 0, 4, 0, 4], [4, 5, 0, 1, 1, 2], [0, 0, 3, 5, 2, 0], [2, 0, 1, 0, 4, 4]]) n_user, n_item = R.shape # represent user/item characteristics in a lower dimensional space k = 2 For the same data
12. ### P = np.random.rand(n_user, k) Q = np.random.rand(n_item, k) for user

in range(n_user): for item in range(n_item): if R[user, item] == 0: continue p, q = P[user], Q[item] err = R[user, item] - np.inner(p, q) next_p = p - 0.1 * (-2. * (err * q - 0.01 * p)) next_q = q - 0.1 * (-2. * (err * p - 0.01 * q)) P[user], Q[user] = next_p, next_q print(np.dot(P, Q.T)) # [[ 1.44089222 2.10861345 1.35586737 1.30939713 2.25707035 1.10801462] # [ 1.93053648 3.03696723 1.89003927 1.94703341 3.2410682 1.5074436 ] # [ 1.93037675 3.08600948 1.90697021 1.99171424 3.2913027 1.51264917] # [ 1.95476578 3.09267286 1.91985778 1.98747126 3.29976689 1.52826493] # [ 2.27963689 3.58058058 2.22988788 2.29405565 3.82145282 1.77943416]] Randomly guess “factors” Ignore useless zero elements Adjust factors based on estimation error Missing values are ﬁlled

14. ### “Netﬂix never implemented that solution itself” https://www.techdirt.com/blog/innovation/articles/20120409/03412518422/why-netﬂix-never- implemented-algorithm-that-won-netﬂix-1-million-challenge.shtml Poor scalability

on dynamic user/item data
15. ### Present: Streamed rich user-item data as one possible approach to

improve scalability time … 21 yrs. man Student Genre: music Price: \$1000 Context e.g., “when”, “where”
16. ### Streaming recommender systems Update recommendation model in real-time 17 Recommend

top-N items to users User interacts with items Update recommendation model on-the-ﬂy ‣ Incremental CF ‣ Incremental SVD ‣ Incremental MF
17. ### FluRS Flu-* (Flux, Fluid, Fluent) Recommender Systems \$ pip install

flurs https://github.com/takuti/ﬂurs
18. ### Uniﬁed data representation Algorithm- agnostic Streaming evaluation 19 Available for

wide variety of realistic data Separate recommender-speciﬁc implementation from algorithm code Monitor accuracy of streaming recommendation in appropriate scheme
19. ### User 20 Item - index - feature vector Event -

context vector Recommender - initialize()  - register(user), register(item) - update(event) - recommend(user) Model Evaluator  (recommender) - fit(event[])  - evaluate(event[])
20. ### Feature-based recommender - Factorization Machine (FM) Create prediction model from

context-aware feature vectors 21 S. Rendle. Factorization Machines with libFM. ACM Transactions on Intelligent Systems and Technology, 3(3), May 2012.

22. ### from flurs.data.entity import User, Item user_a = User(0, feature=np.array([0, 0,

1])) user_b = User(1, feature=np.array([1, 0, 1])) user_c = User(2, feature=np.array([1, 3, 1])) item_a = Item(0, feature=np.array([2, 1, 1])) item_b = Item(1, feature=np.array([0, 2, 1]))
23. ### from flurs.recommender.fm import FMRecommender # initialize a recommendation instance recommender

= FMRecommender(p=8, k=2) recommender.initialize() # register users and items recommender.register(user_a) recommender.register(user_b) recommender.register(user_c) recommender.register(item_a) recommender.register(item_b) # feed some events recommender.update(Event(user_a, item_b, context=np.array([1, 1]))) recommender.update(Event(user_b, item_b, context=np.array([0, 2]))) recommender.update(Event(user_a, item_a, context=np.array([1, 3]))) # make recommendation to `user_c` candidates = np.array([0, 1]) recommender.recommend(user_c, candidates, context=np.array([0, 4])) Context-aware recommendation
24. ### Evaluator(recommender) “test-then-learn” evaluation scheme J. Vinagre et al. Fast Incremental

Matrix Factorization for Recommendation with Positive-only . In Proc. of UMAP 2014, pp. 459–470, July 2014. 25 T. Kitazawa. Incremental Factorization Machines for Persistently Cold-Starting Online Item Recommendation. arXiv:1607.02858 [cs.LG], July 2016. fit(event[]) evaluate(event[])
25. ### Easily develop & evaluate your own recommender You just need

to follow the FluRS’s way 26
26. ### Future: Scaling recommender in production Personalization is everywhere in various

ways as Netﬂix said “Everything is recommendation” https://www.slideshare.net/justinbasilico/past-present-future-of-recommender-systems-an-industry-perspective
27. ### Why Python? It’s for production! Versatile Stream and store data

internally Trial-and-error Develop “hybrid” algorithm Portable Integrate w/ existing code 28 FluRS will be updated in these aspect :)
28. ### ‣ Surprise (Python) http://surpriselib.com/ ‣ fastFM (Python) http://ibayer.github.io/fastFM/ ‣ Implicit

(Python) http://implicit.readthedocs.io/en/latest/ ‣ MyMediaLite (C#) http://www.mymedialite.net/ ‣ LibRec (Java) https://www.librec.net/ ‣ LensKit (Java) http://lenskit.org/ On Apache Hadoop, Hive, Spark: ‣ Apache Mahout http://mahout.apache.org/ ‣ Apache Hivemall https://hivemall.incubator.apache.org/ ‣ Spark MLlib https://spark.apache.org/mllib/ OSS for Recommender Systems 29 GET INSPIRATION AND CUSTOMIZE FOR “YOUR” PRODUCTION