Amazon's Item-to-item Recommendation Algorithm

Item-to-item Collaborative Filtering Recommendation Algorithm Zamboni Luca Zen Roberto Zamboni
Luca, Zen Roberto Item-to-item Collaborative Filtering Alg. May 7, 2015 1 / 21

Topics Recommendation Algorithms: targets and problems Collaborative Filtering Recommendation Algorithms
Memory-Based and Model-Based Item-to-item Recommendation Algorithm Experimental results Conclusion Zamboni Luca, Zen Roberto Item-to-item Collaborative Filtering Alg. May 7, 2015 2 / 21

Recommendation Algorithms Recommendation Algorithms are used in E-commerce, web-sites and
email advertising. They apply data analysis techniques to the problem of helping users to ﬁnd the items they would like to purchase by producing a predicted likeliness score or a list of Top-N recommended items. At Amazon.com recommendation algorithms are used to personalize online store for each costumer. Zamboni Luca, Zen Roberto Item-to-item Collaborative Filtering Alg. May 7, 2015 3 / 21

Recommendation Algorithms Challenges: Improve the scalability: the demands of modern
systems are to search tens of millions of potential neighbors. Improve the quality: users need recommentadions they can trust to help them ﬁnd items they will like. Zamboni Luca, Zen Roberto Item-to-item Collaborative Filtering Alg. May 7, 2015 4 / 21

Collaborative Filtering Recommendation Algoritms Use a database about user preferences
to predict additional topics or products a user (active user) might like. Build under the assumption that a good way to ﬁnd interesting content is to ﬁnd other people who have similar interests, and then recommend titles that those similar users like. i1 i2 .. in u1 R .. R u2 R .. : : : .:. : um R R .. CF-based algorithms can be divided into two main categories: Memory-based Model-based Zamboni Luca, Zen Roberto Item-to-item Collaborative Filtering Alg. May 7, 2015 5 / 21

Memory-based systems These systems utilize the entire user-item database to
generate a prediction. The main idea is to calculate and use the similarities between users and/or items and use them as weight to predict a rating for a user-item pair. Advantages: The quality of predictions is high. Relatively simple algorithm, allow database updates. Disadvantages: Very slow: they use the entire database every time it makes a prediction (even in memory). Not fast and scalable as we would like them to be in case of very large datasets. Zamboni Luca, Zen Roberto Item-to-item Collaborative Filtering Alg. May 7, 2015 6 / 21

Model-based systems Their main idea is the same as the
one of Memory-Based. However, it overcomes the Memory-based drawbacks by building a model based on the dataset of ratings. The model building process is performed by diﬀerent machine learing algorithms such as Bayesian networks, clustering and rule-based approaches. Advantages: Scalability: models are much smaller than the actual dataset. Prediction speed is high with respect to the time required to query the model. Disadvantages: Quality of predictions depends a lot on the way the model is built. The model is not ﬂexible to database updates. Zamboni Luca, Zen Roberto Item-to-item Collaborative Filtering Alg. May 7, 2015 7 / 21

Item-to-item Recommendation Algorithm They avoid the bottleneck of searching for
neighbors by exploring the relationships between items ﬁrst, rather than the relationships between users. Recommendations for users are computed by ﬁnding items that are similar to other items the active user has liked. Similarity Computation Prediction Computation Zamboni Luca, Zen Roberto Item-to-item Collaborative Filtering Alg. May 7, 2015 8 / 21

Item Similarity Computation Zamboni Luca, Zen Roberto Item-to-item Collaborative Filtering
Alg. May 7, 2015 9 / 21

Item Similarity Computation Cosine-based Similarity: Two items i,j are thought
of as two vectors in the m dimensional user-space. sim(i, j) = cos(i, j) = i · j i ∗ j Zamboni Luca, Zen Roberto Item-to-item Collaborative Filtering Alg. May 7, 2015 10 / 21

Item Similarity Computation Correlation-based Similarity: Also called Pearson-r correlation. Only
users that rated both items i,j are considered. sim(i, j) = corri,j = u∈U (Ru,i − ¯ Ri )(Ru,j − ¯ Rj ) u∈U (Ru,i − ¯ Ri )2 u∈U (Ru,j − ¯ Rj )2 Zamboni Luca, Zen Roberto Item-to-item Collaborative Filtering Alg. May 7, 2015 11 / 21

Item Similarity Computation Adjusted Cosine Similarity: The diﬀerence in rating
scale between users is now taken into account. sim(i, j) = u∈U (Ru,i − ¯ Ru)(Ru,j − ¯ Ru) u∈U (Ru,i − ¯ Ru)2 u∈U (Ru,j − ¯ Ru)2 Zamboni Luca, Zen Roberto Item-to-item Collaborative Filtering Alg. May 7, 2015 12 / 21

Performance implications The similarity computation is the performance bottleneck. The
similarity table can be computed oﬄine and results can be stored in a table that requires O(n2) space. To compute a prediction on a particular item, only a small set of similar items is needed. For each item only the k most similar items are stored (k << n) and the space required is O(n). We term k as the model size. Zamboni Luca, Zen Roberto Item-to-item Collaborative Filtering Alg. May 7, 2015 13 / 21

Prediction Computation: Weighted Sum The most important step in Collaborative
Filtering system is to generate the output interface in terms of prediction. Once we identiﬁed the most similar items we can compute the prediction of a pair user-item as follows. Pu,i = j∈SimilarItems (si,j ∗ Ru,j ) j∈SimilarItems (|si,j |) Zamboni Luca, Zen Roberto Item-to-item Collaborative Filtering Alg. May 7, 2015 14 / 21

Experimental Results Dataset: 43000 users. 3000 movies. Only users that
had rated at least 20 movies have been considered. Training/test ratio x = 0.8. Zamboni Luca, Zen Roberto Item-to-item Collaborative Filtering Alg. May 7, 2015 15 / 21

Experimental Results Mean Absolute Error (MAE): For each ratings-prediction pair
this metric treats the absolute error between them. MAE = N i=1 |pi − qi | N Note: the lower the MAE, the more accurately the recommendation engine predicts user ratings. Zamboni Luca, Zen Roberto Item-to-item Collaborative Filtering Alg. May 7, 2015 16 / 21

Experimental Results Zamboni Luca, Zen Roberto Item-to-item Collaborative Filtering Alg.
May 7, 2015 17 / 21

Experimental Results Zamboni Luca, Zen Roberto Item-to-item Collaborative Filtering Alg.
May 7, 2015 18 / 21

Conclusion The Item-to-item Collaborative Filtering Algorithm used by Amazon provides
the same precticions’ quality as the user-user k-nearest neighbor. The item neighborhood is fairly static, so it can be pre-computed oﬄine which results in very high on-line performance also among large data sets. Zamboni Luca, Zen Roberto Item-to-item Collaborative Filtering Alg. May 7, 2015 19 / 21

THANK YOU FOR YOUR ATTENTION Zamboni Luca, Zen Roberto Item-to-item
Collaborative Filtering Alg. May 7, 2015 20 / 21

References I [1] Item-based collaborative filtering recommendation algorithms. Badrul Sarwar,
George Karypis, Joseph Konstan, and John Riedl. 2001. In Proceedings of the 10th international conference on World Wide Web (WWW ’01). ACM, New York, NY, USA, 285-295. [2] Amazon.com recommendations: item-to-item collaborative filtering. G. Linden, B. Smith, J. York. 2003. Internet Computing, IEEE (Volume:7 , Issue: 1 ). 76 - 80. [3] Empirical analysis of predictive algorithms for collaborative filtering. John S. Breese, David Heckerman, and Carl Kadie. 1998. In Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence (UAI’98), Gregory F. Cooper and Serafn Moral (Eds.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 43-52. [4] http://www.cs.carleton.edu/cs_comps/0607/recommend/ recommender/index.html Zamboni Luca, Zen Roberto Item-to-item Collaborative Filtering Alg. May 7, 2015 21 / 21

Amazon's Item-to-item Recommendation Algorithm

Amazon's Item-to-item Recommendation Algorithm

Roberto Zen

More Decks by Roberto Zen

Other Decks in Technology

Featured

Transcript

Item-to-item Collaborative Filtering Recommendation Algorithm Zamboni Luca Zen Roberto Zamboni

Topics Recommendation Algorithms: targets and problems Collaborative Filtering Recommendation Algorithms

Recommendation Algorithms Recommendation Algorithms are used in E-commerce, web-sites and

Recommendation Algorithms Challenges: Improve the scalability: the demands of modern

Collaborative Filtering Recommendation Algoritms Use a database about user preferences

Memory-based systems These systems utilize the entire user-item database to

Model-based systems Their main idea is the same as the

Item-to-item Recommendation Algorithm They avoid the bottleneck of searching for

Item Similarity Computation Zamboni Luca, Zen Roberto Item-to-item Collaborative Filtering

Item Similarity Computation Cosine-based Similarity: Two items i,j are thought

Item Similarity Computation Correlation-based Similarity: Also called Pearson-r correlation. Only

Item Similarity Computation Adjusted Cosine Similarity: The diﬀerence in rating

Performance implications The similarity computation is the performance bottleneck. The

Prediction Computation: Weighted Sum The most important step in Collaborative

Experimental Results Dataset: 43000 users. 3000 movies. Only users that

Experimental Results Mean Absolute Error (MAE): For each ratings-prediction pair

Experimental Results Zamboni Luca, Zen Roberto Item-to-item Collaborative Filtering Alg.

Experimental Results Zamboni Luca, Zen Roberto Item-to-item Collaborative Filtering Alg.

Conclusion The Item-to-item Collaborative Filtering Algorithm used by Amazon provides

THANK YOU FOR YOUR ATTENTION Zamboni Luca, Zen Roberto Item-to-item

References I [1] Item-based collaborative ﬁltering recommendation algorithms. Badrul Sarwar,