• In high-dimensional space, it’s difficult to handle similarity • Usually, item/user vectors have quite high dimensionality (b/c rating matrix is quite large)
A rating matrix is directly used every time systems try to find similar user/items and make predictions • CF approaches do not scale for most real world scenarios User 1 3 2 4 … 3 5 2 4 … 1 … Rating matrix Similar users for user 1 3 2 4 … 3 5 2 4 … 1 … Rating matrix Suggested items for user 1 Compute Compute User 2 3 2 4 … 3 5 2 4 … 1 … Rating matrix Similar users for user 2 3 2 4 … 3 5 2 4 … 1 … Rating matrix Suggested items for user 2 Compute Compute
on offline pre-processing - At run-time, only pre-trained model is used for rating prediction - Pre-trained models can be updated 3 2 4 … 3 5 2 4 … 1 … Rating matrix Pre-trained Model Computation offline User Suggested Items for user Computation online
the future … X-men Alice 5 1 4 4 3 … 2 Basic idea 1 (1/2) 14 I don’t like horror… Sci-Fis often move me. I love humane and dramatic movies! User Horror Sci-Fi Humanity Drama … Alice 0.3 2.1 6.1 4.7 … Latent factors (which cannot be observed) Assumes that latent factors exist in users/items
in users/items Image from Amazon.com Latent factors (which cannot be observed) User God father Termin- ator Money game Titanic Back to the future … X-men Alice 5 1 4 4 3 … 2 Movie Horror Sci-Fi Humanity Drama … God father 2.3 0.4 4.9 5.7 …
game Titanic Back to the future … X-men Alice 5 1 4 4 3 … 2 User Horror Sci-Fi Humanity Drama … Alice 0.3 2.1 6.1 4.7 … Assumes that rating scores derive from latent factors of users and items Movie Horror Sci-Fi Humanity Drama … God father 2.3 0.4 4.9 5.7 … ×
… 1 … Rating matrix = R (m users × n items) ≈ P 1.3 … 0.7 0.2 … 4.9 … × 0.2 1.1 … 0.9 3.1 … 0.7 1 … Latent user matrix (m users × k latent factors) Latent item matrix (k latent factors x n items) Q R ≈ T × • Rating matrix can be decomposed to latent factors of users and items • The dimension of latent factors (vectors) is much less than the number of users and items (k ≪ m, n) □ □ □ (□ = unknown scores) User Item Rating
A famous linear algebra technique for matrix decomposition - It is often used for dimensionality reduction - SVD delivers essentially the same result as PCA does U Σ X = T × V × m x n matrix m x m unitary matrix n x n unitary matrix Rectangular diagonal matrix (diagonal values are called as singular values)
0 0 2.792 0 0 0 0 0 1.586 0 0 0 0 0 0.90 Important features of SVD (3/4) 25 Σ2 = -0.369 -0.325 0.282 0.343 0.749 -0.459 0.448 -0.339 -0.584 0.364 -0.468 -0.359 0.544 -0.459 -0.381 -0.563 0.491 0.07 0.562 -0.348 -0.341 -0.569 -0.71 0.118 -0.202 U2 = -0.325 0.341 -0.101 0.682 -0.55 -0.562 0.345 0.273 0.153 0.684 -0.468 -0.642 -0.577 0.125 0.141 -0.504 -0.327 0.601 -0.324 -0.416 -0.324 0.496 -0.47 -0.626 -0.189 V2 = We can approximate a given matrix by focusing on the largest singular values and the vectors of U and V which correspond to the singular values
V2 × = 1.08 2.24 3.29 2.98 0.84 2.72 4.17 1.52 2.41 3.04 1.46 2.93 4.02 3.71 1.19 3.24 5.03 2.04 3.04 3.59 0.57 1.64 3.86 3.18 0.15 1 3 3 3 0 2 4 2 2 4 1 3 3 5 1 4 5 2 3 3 1 1 5 2 1 ≈ = X We can approximate a given matrix by focusing on the largest singular values and the vectors of U and V which correspond to the singular values
V2 U Σ X = T × V × Focus on important features × × Step 2 Step 3 Run SVD Multiply three matrix Step 4 = 1.08 2.24 3.29 2.98 0.84 2.72 4.17 1.52 2.41 3.04 1.46 2.93 4.02 3.71 1.19 3.24 5.03 2.04 3.04 3.59 0.57 1.64 3.86 3.18 0.15 Check the values which were zero before running SVD Step 5
SVD does not take rating score range into account Zero replacement decreases prediction quality - SVD analyzes the relation between all data in matrix - The meaning of “zero” is different from that of “unknown”
zero replacement), the Simon’s method learns matrix P and Q by using only observed values in R 𝑚𝑖𝑛(,) 2 &,# ∈* 𝑟&,# − 𝒑& 𝒒# + , + 𝜆( 𝒑& , + 𝒒# ,) Target optimization function m users n items Rating matrix R ≈ × User Item P Q u: user u; i: item i; pu : u’s latent vector; qi : i’s latent vector
2006-2009 2010 2013 Scalable models Amazon’s item-based CF Netflix Prize The rise of matrix factorization like Simon Funk’s method Factorization machine Generalized matrix factorization for dealing with various factors Deep Learning ・Deep Factorization machine ・Content2Vec to get content embeddings
items? What to recommend to new users? Serendipity - Only recommendation of favorable items cannot expand user interests - How to recommend unexpected and surprising items? Explanation of recommendation reasons - Recommendation algorithms just decide what to recommend - How can systems persuade users to purchase recommended items? Reviewer’s trust How to remove spam reviewers? How to find high-quality reviewers?
items? What to recommend to new users? Serendipity - Only recommendation of favorable items cannot expand user interests - How to recommend unexpected and surprising items? Providing explanations - Recommendation algorithms just decide what to recommend - How can systems persuade users to purchase recommended items? Reviewer’s trust How to remove spam reviewers? How to find high-quality reviewers?
User1 3 1 2 3 User2 4 3 4 3 User3 3 3 1 5 User4 1 5 5 2 New user New item CF approaches don’t work for new items/users New items/users have no clues to predict unknown scores b/c the CF cannot find neighbor users/items
Ratings Might Mislead You: The Story of Herding Effects”, Journal of Big Data, Vol.2, No.4, pp.196-204. 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 P−Value Cumulative Dis 0 50 100 150 200 250 2 3 4 5 Sequence Number of Rating Mean Rating Books Electronics Movies & TV Music −0 −0 −0 0 0 Pearson’s Correlation Coefficient 0.05 C FIG. 1. (A) Cumulative distribution of p-values of Augment Dickey–Fuller test Fig: Average rating scores on Amazon.com How to find trustworthy reviewers (rating experts)? - People often give good scores to items - Some reviewers intentionally give too high/low scores to items (spammers)