Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Discovering Latent Factors from Movies Genres for Enhanced Recommendation

Discovering Latent Factors from Movies Genres for Enhanced Recommendation

from ACM RecSys 2012 short paper session

ChiaChia Lee

October 14, 2012
Tweet

More Decks by ChiaChia Lee

Other Decks in Technology

Transcript

  1. Discovering Latent Factors from Movies Genres for Enhanced Recommendation Marcelo

    Garcia Manzato Mathematics and Computing Institute – University of São Paulo São Carlos, SP, Brazil [email protected] 12年10月8日星期⼀一
  2. user-genre model • user-item: sparse(rating missing), high dimension(many items) •

    genre/category: a movie belongs to several genres • user-item matrix -> user-genre matrix • so less sparse and at lower dimension 12年10月8日星期⼀一
  3. w(g, u) g1 g2 g3 g4 u1 u2 u3 u4

    12年10月8日星期⼀一
  4. notation • U: users • S: items • R: ratings

    • G: genres • δu(s): the rating user u rated for item s • Su: items user rated • Gs: genres associated to items Preferred Genres Factorization Enrichment User Similarity 12年10月8日星期⼀一
  5. g5 g4 g3 g1 g2 s5 s3 s2 s1 genres

    Chia Jerry items users s7 s8 g7 s4 s6 s9 Chi Preferred Genres Factorization Enrichment User Similarity 12年10月8日星期⼀一
  6. g5 g4 g3 g1 g2 s5 s3 s2 s1 genres

    Chia Jerry items users s7 s8 g7 s4 s6 s9 Chi Preferred Genres Factorization Enrichment User Similarity 12年10月8日星期⼀一
  7. • set of pairs • Chia rated 5: s1(g1, g2)、s2(g1,

    g3)、s3(g4, g5)、s5(g1, g2, g3) • cloud_g(Chia, 5) = {(g1, 3), (g2, 2), (g3, 2), (g4, 1), (g5, 1)} • Chia rated 2: s7(g4)、s8(g4, g7) • cloud_g(Chia, 2) = {(g4, 2), (g7, 1)} • represents the frequency of occurrence of genre g for all items that user u has associated with rating r Preferred Genres Factorization Enrichment User Similarity 12年10月8日星期⼀一
  8. tf-idf • the set of different ratings assigned to genre

    g by user u • R(g4, Chia) = {5, 2}, |R(g4, Chia)| = 2 • tf-idf(g4, Chia, 5) = 1 * log(5/(1+2)) • the tf-idf value will reflect how important a genre is to a particular rating in the set of all ratings Preferred Genres Factorization Enrichment User Similarity 12年10月8日星期⼀一
  9. w(g, u) • w(g, u) represents how much a user

    u likes genre g Preferred Genres Factorization Enrichment User Similarity 12年10月8日星期⼀一
  10. w(g, u) g1 g2 g3 g4 u1 5 2 0

    2 u2 0 1 0 2 u3 5 2 1 0 u4 1 0 1 1 Preferred Genres Factorization Enrichment User Similarity 12年10月8日星期⼀一
  11. w(g, u) g1 g2 g3 g4 u1 5 2 0

    2 u2 0 1 0 2 u3 5 2 1 0 u4 1 0 1 1 Preferred Genres Factorization Enrichment User Similarity 12年10月8日星期⼀一
  12. missing value • before user-genre matrix factorization • substitutes the

    weights w(g, u) which are zero with the user average rating offset • less missing values Preferred Genres Factorization Enrichment User Similarity 12年10月8日星期⼀一
  13. SVD • user-genre matrix M • The resulting factorized:a topic

    preference- relevance model • Vk:users’ interest in each of the k inferred topics • Tk:the genres’ relevance for each topic • singular values in Σ:represents the influence of a particular topic on user-genre preferences Preferred Genres Factorization Enrichment User Similarity 12年10月8日星期⼀一
  14. Enrichment • (1) reducing the effects produced by the lack

    of information in new user profiles • (2) adjusting the value of w(g,u) to consider the most relevant topics of interest associated to genre g • ex: an adjusted preference for science fiction will indicate the user’s interests for different topics, such as undead people, the end of the world and star wars. Preferred Genres Factorization Enrichment User Similarity 12年10月8日星期⼀一
  15. update w(g, u) • update w(g, u) -> w’(g, u)

    • incorporate the topic preference-relevance model, we redefine w(g, u) as: • γ is a weighting parameter • Such values are a combination of user feedback, topic preference and relevance to associated genres Preferred Genres Factorization Enrichment User Similarity 12年10月8日星期⼀一
  16. User Similarity • two users u and v, their similarity

    is sim(u,v) • use the Pearson correlation coefficient Preferred Genres Factorization Enrichment User Similarity 12年10月8日星期⼀一
  17. Recommendation • collaborative filtering • Predict unknown rating δu(s) •

    : the most similar users to u • fg is a normalizing factor 12年10月8日星期⼀一
  18. Experimental results • (a): 1.535, average of w(g, u) without

    enrichment • (b): 0.9617, with enrichment but no similar users • (c): 0.8743, enriched profiles and similar users 12年10月8日星期⼀一
  19. Remarks • factorized user-genre matrix model:discover latent factors from genres

    in order to enrich users profiles • Enrichment:adjust the user preference for a genre by considering the most relevant topics that compose the genre • less sparse because individual weights will be associated to a general information about all the content • factorization:infer latent semantics without metadata 12年10月8日星期⼀一