Y. Yamamoto
November 21, 2022
20

# Recommender Systems Part 3

1. Programming assignments review
2. Problems on Simple Collaborative Filtering
3. Matrix Factorization
4. Challenges for recommender systems

## Y. Yamamoto

November 21, 2022

## Transcript

1. ### Matrix Factorization: Beyond Simple Collaborative Filtering Yusuke Yamamoto Associate Professor,

Faculty of Informatics yusuke_yamamoto@acm.org Data Engineering （Recommender Systems 3） 2022.11.28

6. ### User-based Collaborative Filtering 6 Predicts a target user’s rating for

an item based on rating tendency of similar users 𝑝𝑟𝑒𝑑𝑖𝑐𝑡 𝑢! , 𝑖 = 𝑟"! + ∑"∈\$" 𝑠𝑖𝑚(𝑢! , 𝑢) 1 (𝑟",& − 𝑟"! ) ∑"∈\$" 𝑠𝑖𝑚(𝑢! , 𝑢) Item5 sim Average Rating Alice ? 1 4 User1 3 0.85 2.4 User2 5 0.71 3.8 Similar users
7. ### Item-based Collaborative Filtering 7 Item1 Item2 Item3 Item4 Item5 Alice

5 3 4 4 ? User1 3 1 2 3 3 User2 4 3 4 3 5 User3 3 3 1 5 4 User4 1 5 5 2 1 similar Predicts unknown rating scores based on rating tendency for similar items similar 𝑝𝑟𝑒𝑑𝑖𝑐𝑡 𝑢! , 𝑖" = ∑#∈%! 𝑠𝑖𝑚(𝑖" , 𝑖) 0 𝑟&",# ∑#∈%! 𝑠𝑖𝑚(𝑖" , 𝑖)
8. ### Problems on CF approaches (1/3) 8 Image reference: https://rafalab.github.io/dsbook/recommendation-systems.html Real

data is quite sparse!! Even on large e-commerce sites, there are few intersections between user vectors (& item vectors).
9. ### Problems on CF approaches (2/3) 9 The curse of dimensionality

• In high-dimensional space, it’s difficult to handle similarity • Usually, item/user vectors have quite high dimensionality (b/c rating matrix is quite large)
10. ### Problems on CF approaches (3/3) 10 High computational cost •

A rating matrix is directly used every time systems try to find similar user/items and make predictions • CF approaches do not scale for most real world scenarios User 1 3 2 4 … 3 5 2 4 … 1 … Rating matrix Similar users for user 1 3 2 4 … 3 5 2 4 … 1 … Rating matrix Suggested items for user 1 Compute Compute User 2 3 2 4 … 3 5 2 4 … 1 … Rating matrix Similar users for user 2 3 2 4 … 3 5 2 4 … 1 … Rating matrix Suggested items for user 2 Compute Compute
11. ### Recent approaches for recommender systems 11 Model-based approach - Based

on offline pre-processing - At run-time, only pre-trained model is used for rating prediction - Pre-trained models can be updated 3 2 4 … 3 5 2 4 … 1 … Rating matrix Pre-trained Model Computation offline User Suggested Items for user Computation online
12. ### Memory-based approach vs. model-based approach 12 Memory-based approach - User-based

CF - Item-based CF Model-based approach - Matrix factorization - Association rule mining - Probabilistic model - Other ML techniques

14. ### User God father Termin- ator Money game Titanic Back to

the future … X-men Alice 5 1 4 4 3 … 2 Basic idea 1 (1/2) 14 I don’t like horror… Sci-Fis often move me. I love humane and dramatic movies! User Horror Sci-Fi Humanity Drama … Alice 0.3 2.1 6.1 4.7 … Latent factors (which cannot be observed) Assumes that latent factors exist in users/items
15. ### Basic idea 1 (2/2) 15 Assumes that latent factors exist

in users/items Image from Amazon.com Latent factors (which cannot be observed) User God father Termin- ator Money game Titanic Back to the future … X-men Alice 5 1 4 4 3 … 2 Movie Horror Sci-Fi Humanity Drama … God father 2.3 0.4 4.9 5.7 …
16. ### Basic idea 2 16 User God father Termin- ator Money

game Titanic Back to the future … X-men Alice 5 1 4 4 3 … 2 User Horror Sci-Fi Humanity Drama … Alice 0.3 2.1 6.1 4.7 … Assumes that rating scores derive from latent factors of users and items Movie Horror Sci-Fi Humanity Drama … God father 2.3 0.4 4.9 5.7 … ×
17. ### Summary of matrix factorization 17 3 … 3 5 2

… 1 … Rating matrix = R (m users × n items) ≈ P 1.3 … 0.7 0.2 … 4.9 … × 0.2 1.1 … 0.9 3.1 … 0.7 1 … Latent user matrix (m users × k latent factors) Latent item matrix (k latent factors x n items) Q R ≈ T × • Rating matrix can be decomposed to latent factors of users and items • The dimension of latent factors (vectors) is much less than the number of users and items (k ≪ m, n) □ □ □ (□ = unknown scores) User Item Rating
18. ### 3 … 3 5 2 … 1 … Prediction using

matrix factorization 18 Rating matrix = R R □ □ □ Original (raw)
19. ### 3 … 3 5 2 … 1 … Prediction using

matrix factorization 19 Rating matrix = R ≒ P 1.3 … 0.7 0.2 … 4.9 … × 0.2 1.1 … 0.9 3.1 … 0.7 1 … Latent user matrix Latent item matrix Q R ≒ T × □ □ □ Original (raw)
20. ### Prediction using matrix factorization 20 3 … 3 5 2

… 1 … Predicted rating matrix = R* = P 1.3 … 0.7 0.2 … 4.9 … × 0.2 1.1 … 0.9 3.1 … 0.7 1 … Latent user matrix Latent item matrix Q R = T × If we obtain latent user/item matrix, we can predict unknown scores by multiplying the two latent matrix How to obtain latent user/item matrix? ＊ 2 5 4 □ □ □
21. ### SVD for recommender systems 21 SVD: singular value decomposition -

A famous linear algebra technique for matrix decomposition - It is often used for dimensionality reduction - SVD delivers essentially the same result as PCA does U Σ X = T × V × m x n matrix m x m unitary matrix n x n unitary matrix Rectangular diagonal matrix (diagonal values are called as singular values)
22. ### Example of SVD 22 U Σ X = T ×

V × 1 3 3 3 0 2 4 2 2 4 1 3 3 5 1 4 5 2 3 3 1 1 5 2 1 -0.369 -0.325 0.282 0.343 0.749 -0.459 0.448 -0.339 -0.584 0.364 -0.468 -0.359 0.544 -0.459 -0.381 -0.563 0.491 0.07 0.562 -0.348 -0.341 -0.569 -0.71 0.118 -0.202 13.368 0 0 0 0 0 4.708 0 0 0 0 0 2.792 0 0 0 0 0 1.586 0 0 0 0 0 0.904 -0.325 0.341 -0.101 0.682 -0.55 -0.562 0.345 0.273 0.153 0.684 -0.468 -0.642 -0.577 0.125 0.141 -0.504 -0.327 0.601 -0.324 -0.416 -0.324 0.496 -0.47 -0.626 -0.189 U= Σ= V=
23. ### Important features of SVD (1/4) 23 We can approximate a

given matrix by focusing on the largest singular values and the vectors of U and V which correspond to the singular values -0.369 -0.325 0.282 0.343 0.749 -0.459 0.448 -0.339 -0.584 0.364 -0.468 -0.359 0.544 -0.459 -0.381 -0.563 0.491 0.07 0.562 -0.348 -0.341 -0.569 -0.71 0.118 -0.202 13.368 0 0 0 0 0 4.708 0 0 0 0 0 2.792 0 0 0 0 0 1.586 0 0 0 0 0 0.904 -0.325 0.341 -0.101 0.682 -0.55 -0.562 0.345 0.273 0.153 0.684 -0.468 -0.642 -0.577 0.125 0.141 -0.504 -0.327 0.601 -0.324 -0.416 -0.324 0.496 -0.47 -0.626 -0.189 U= Σ= V= Largest singular values
24. ### Important features of SVD (2/4) 24 -0.369 -0.325 0.282 0.343

0.749 -0.459 0.448 -0.339 -0.584 0.364 -0.468 -0.359 0.544 -0.459 -0.381 -0.563 0.491 0.07 0.562 -0.348 -0.341 -0.569 -0.71 0.118 -0.202 13.368 0 0 0 0 0 4.708 0 0 0 0 0 2.792 0 0 0 0 0 1.586 0 0 0 0 0 0.904 -0.325 0.341 -0.101 0.682 -0.55 -0.562 0.345 0.273 0.153 0.684 -0.468 -0.642 -0.577 0.125 0.141 -0.504 -0.327 0.601 -0.324 -0.416 -0.324 0.496 -0.47 -0.626 -0.189 U= Σ= V= Ignore unimportant values! We can approximate a given matrix by focusing on the largest singular values and the vectors of U and V which correspond to the singular values
25. ### 13.368 0 0 0 0 0 4.708 0 0 0

0 0 2.792 0 0 0 0 0 1.586 0 0 0 0 0 0.90 Important features of SVD (3/4) 25 Σ2 = -0.369 -0.325 0.282 0.343 0.749 -0.459 0.448 -0.339 -0.584 0.364 -0.468 -0.359 0.544 -0.459 -0.381 -0.563 0.491 0.07 0.562 -0.348 -0.341 -0.569 -0.71 0.118 -0.202 U2 = -0.325 0.341 -0.101 0.682 -0.55 -0.562 0.345 0.273 0.153 0.684 -0.468 -0.642 -0.577 0.125 0.141 -0.504 -0.327 0.601 -0.324 -0.416 -0.324 0.496 -0.47 -0.626 -0.189 V2 = We can approximate a given matrix by focusing on the largest singular values and the vectors of U and V which correspond to the singular values
26. ### Important features of SVD (4/4) 26 U2 Σ2 T ×

V2 × = 1.08 2.24 3.29 2.98 0.84 2.72 4.17 1.52 2.41 3.04 1.46 2.93 4.02 3.71 1.19 3.24 5.03 2.04 3.04 3.59 0.57 1.64 3.86 3.18 0.15 1 3 3 3 0 2 4 2 2 4 1 3 3 5 1 4 5 2 3 3 1 1 5 2 1 ≈ = X We can approximate a given matrix by focusing on the largest singular values and the vectors of U and V which correspond to the singular values
27. ### Apply SVD for recommender systems (1/2) 27 Item1 Item2 Item3

Item4 Item5 Alice 1 3 3 3 ? User1 2 4 2 2 4 User2 1 3 3 5 1 User3 4 5 2 3 3 User4 1 1 5 2 1 Regarded as zero 1 3 3 3 0 2 4 2 2 4 1 3 3 5 1 4 5 2 3 3 1 1 5 2 1 = X Convert to matrix Step 1
28. ### Apply SVD for recommender systems (2/2) 28 U2 Σ2 T

V2 U Σ X = T × V × Focus on important features × × Step 2 Step 3 Run SVD Multiply three matrix Step 4 = 1.08 2.24 3.29 2.98 0.84 2.72 4.17 1.52 2.41 3.04 1.46 2.93 4.02 3.71 1.19 3.24 5.03 2.04 3.04 3.59 0.57 1.64 3.86 3.18 0.15 Check the values which were zero before running SVD Step 5
29. ### Problems on SVD 29 Predicted values are often negative -

SVD does not take rating score range into account Zero replacement decreases prediction quality - SVD analyzes the relation between all data in matrix - The meaning of “zero” is different from that of “unknown”
30. ### Bad example of SVD-based recommendation 30 Music 1 Music 2

Music 3 Music 4 User1 5 User2 3 4 User3 2 1 User4 5 4 User5 5 5 0 0 0 3 4 0 0 2 0 1 0 0 5 0 4 0 0 0 5 Example from: http://smrmkt.hatenablog.jp/entry/2014/08/23/211555 U2 , Σ2 , V2 Apply SVD Zero replacement Multiply matrix 3.53 1.88 0.16 -0.26 3.62 4.04 0.13 2.17 1.44 0.76 0.07 -0.12 2.76 5.41 0.06 4.34 1.67 5.97 0 5.74 -0.26 2.91 -0.06 3.52
31. ### Netflix Prize (2006-2010) 31 Image ref: http://blogs.itmedia.co.jp/saito/2009/09/httpjournalmyco.html Netflix held an

open competition to advance collaborative filtering algorithms and to seek the best algorithm.
32. ### Simon Funk’s Matrix Factorization (2006) 32 Without using SVD (with

zero replacement), the Simon’s method learns matrix P and Q by using only observed values in R 𝑚𝑖𝑛(,) 2 &,# ∈* 𝑟&,# − 𝒑& 𝒒# + , + 𝜆( 𝒑& , + 𝒒# ,) Target optimization function m users n items Rating matrix R ≈ × User Item P Q u: user u; i: item i; pu : u’s latent vector; qi : i’s latent vector
33. ### Various approaches have been developed … 33 Ref: https://www.slideshare.net/databricks/deep-learning-for-recommender-systems-with-nick-pentreath 2003

2006-2009 2010 2013 Scalable models Amazon’s item-based CF Netflix Prize The rise of matrix factorization like Simon Funk’s method Factorization machine Generalized matrix factorization for dealing with various factors Deep Learning ・Deep Factorization machine ・Content2Vec to get content embeddings

35. ### Remaining challenges 35 Cold start problem How to recommend new

items? What to recommend to new users? Serendipity - Only recommendation of favorable items cannot expand user interests - How to recommend unexpected and surprising items? Explanation of recommendation reasons - Recommendation algorithms just decide what to recommend - How can systems persuade users to purchase recommended items? Reviewer’s trust How to remove spam reviewers? How to find high-quality reviewers?
36. ### Remaining challenges 36 Cold start problem How to recommend new

items? What to recommend to new users? Serendipity - Only recommendation of favorable items cannot expand user interests - How to recommend unexpected and surprising items? Providing explanations - Recommendation algorithms just decide what to recommend - How can systems persuade users to purchase recommended items? Reviewer’s trust How to remove spam reviewers? How to find high-quality reviewers?
37. ### Cold start problem 37 Item1 Item2 Item3 Item4 item5 Kate

User1 3 1 2 3 User2 4 3 4 3 User3 3 3 1 5 User4 1 5 5 2 New user New item CF approaches don’t work for new items/users New items/users have no clues to predict unknown scores b/c the CF cannot find neighbor users/items