Find interesting items and information easily • Narrow down the set of choices Value for providers • Increase trust and customer loyalty • Increase sales, click rates, conversion rates etc. • Discover new things.. • Opportunities for promotion
Recommender System item score item1 0.9 item2 1 item3 0.3 … … User profile & context Recommendation list for a target user 1 Community data 2 Item features 3 User preference model
items should be recommended, based on users behavior logs. Content-based filtering Decides which items should be recommended, based on item features and their metadata Knowledge-based filtering Decides which items should be recommended, based on preference information which users explicitly show
={b1 , b2 , …, bn } § Item set I = {i1 , i2 , …, im } § User u’s profile(user model):pu § Relevance between pu and item i :Rel (pu , i) Input Output Ranked list of item set I (∀ i ∈ I), based on Rel (pu , i) s.t. l How to model user profiles? l How to compute relevance? Point
model user preferences and to recommend items Basis assumption • Users give ratings to items appropriately • Patterns in the rating data help us predict the ratings Practical points • Large commercial eCommerce sites use the CF • Well-understood • Applicable in many domains if only rating data can be obtained
5 3 4 4 ? User1 3 1 2 3 3 User2 4 3 4 3 5 User3 3 3 1 5 4 User4 1 5 5 2 1 Focus on ratings of similar users Idea: Point How to compute user similarity How do we combine the ratings of the similar users to predict Alice’s rating? Which/how many similar users should we consider? 1. 2. 3.
two variables X and Y 0 1 2 3 4 5 6 Item1 Item2 Item3 Item4 Alice User1 User4 Rating score sim(Alice, User4)=-0.79 sim(Alice, User1)=0.85 (It takes differences in rating behavior into account)
prediction function 𝑝𝑟𝑒𝑑𝑖𝑐𝑡 𝑢! , 𝑖 = 𝑟"# + ∑"$∈$$ 𝑠𝑖𝑚(𝑢! , 𝑢% ) 1 (𝑟"$,' − 𝑟"$ ) ∑"$∈$$ 𝑠𝑖𝑚(𝑢! , 𝑢% ) :Target user a 𝑢& 𝑟#,% :Rating score of u for item i 𝑖 :Target item i 𝑟#! :User a’s average rating score 𝑈' :A set of similar users to ua
threshold for user similarity • If a user has higher similarity than a threshold, he/she can be regarded as a “similar” user • In worst cases, no similar users will be found Focus on top K similar users (kNN method) • If a user ranks at the top K similarity, he/she can be regarded as a similar user • K is often set to between 50 〜 200 • In worst cases, a system uses rating information of users with low similarity
similarities are obtained from a rating matrix • Based on rating scores of similar users, systems predict a rating score of target user for a target item Similarity Calculation Pearson correlation coefficient is often used Selection of Similar Users Top K users with high similarity are often selected as similar users