Discovering Latent Factors from Movies Genres for Enhanced Recommendation

Discovering Latent Factors from Movies Genres for Enhanced Recommendation Marcelo
Garcia Manzato Mathematics and Computing Institute – University of São Paulo São Carlos, SP, Brazil [email protected] 12年10月8日星期⼀一

user-genre model • user-item: sparse(rating missing), high dimension(many items) •
genre/category: a movie belongs to several genres • user-item matrix -> user-genre matrix • so less sparse and at lower dimension 12年10月8日星期⼀一

w(g, u) g1 g2 g3 g4 u1 u2 u3 u4
12年10月8日星期⼀一

user-genre model • Preferred Genres • Factorization • Enrichment •
User Similarity 12年10月8日星期⼀一

notation • U: users • S: items • R: ratings
• G: genres • δu(s): the rating user u rated for item s • Su: items user rated • Gs: genres associated to items Preferred Genres Factorization Enrichment User Similarity 12年10月8日星期⼀一

g5 g4 g3 g1 g2 s5 s3 s2 s1 genres
Chia Jerry items users s7 s8 g7 s4 s6 s9 Chi Preferred Genres Factorization Enrichment User Similarity 12年10月8日星期⼀一

• set of pairs • Chia rated 5: s1(g1, g2)、s2(g1,
g3)、s3(g4, g5)、s5(g1, g2, g3) • cloud_g(Chia, 5) = {(g1, 3), (g2, 2), (g3, 2), (g4, 1), (g5, 1)} • Chia rated 2: s7(g4)、s8(g4, g7) • cloud_g(Chia, 2) = {(g4, 2), (g7, 1)} • represents the frequency of occurrence of genre g for all items that user u has associated with rating r Preferred Genres Factorization Enrichment User Similarity 12年10月8日星期⼀一

tf-idf • the set of different ratings assigned to genre
g by user u • R(g4, Chia) = {5, 2}, |R(g4, Chia)| = 2 • tf-idf(g4, Chia, 5) = 1 * log(5/(1+2)) • the tf-idf value will reﬂect how important a genre is to a particular rating in the set of all ratings Preferred Genres Factorization Enrichment User Similarity 12年10月8日星期⼀一

w(g, u) • w(g, u) represents how much a user
u likes genre g Preferred Genres Factorization Enrichment User Similarity 12年10月8日星期⼀一

w(g, u) g1 g2 g3 g4 u1 5 2 0
2 u2 0 1 0 2 u3 5 2 1 0 u4 1 0 1 1 Preferred Genres Factorization Enrichment User Similarity 12年10月8日星期⼀一

missing value • before user-genre matrix factorization • substitutes the
weights w(g, u) which are zero with the user average rating offset • less missing values Preferred Genres Factorization Enrichment User Similarity 12年10月8日星期⼀一

SVD • user-genre matrix M • The resulting factorized：a topic
preference- relevance model • Vk：users’ interest in each of the k inferred topics • Tk：the genres’ relevance for each topic • singular values in Σ：represents the inﬂuence of a particular topic on user-genre preferences Preferred Genres Factorization Enrichment User Similarity 12年10月8日星期⼀一

Enrichment • (1) reducing the effects produced by the lack
of information in new user proﬁles • (2) adjusting the value of w(g,u) to consider the most relevant topics of interest associated to genre g • ex: an adjusted preference for science ﬁction will indicate the user’s interests for different topics, such as undead people, the end of the world and star wars. Preferred Genres Factorization Enrichment User Similarity 12年10月8日星期⼀一

update w(g, u) • update w(g, u) -> w’(g, u)
• incorporate the topic preference-relevance model, we redeﬁne w(g, u) as: • γ is a weighting parameter • Such values are a combination of user feedback, topic preference and relevance to associated genres Preferred Genres Factorization Enrichment User Similarity 12年10月8日星期⼀一

User Similarity • two users u and v, their similarity
is sim(u,v) • use the Pearson correlation coefﬁcient Preferred Genres Factorization Enrichment User Similarity 12年10月8日星期⼀一

Recommendation • collaborative ﬁltering • Predict unknown rating δu(s) •
: the most similar users to u • fg is a normalizing factor 12年10月8日星期⼀一

Experimental results • (a): 1.535, average of w(g, u) without
enrichment • (b): 0.9617, with enrichment but no similar users • (c): 0.8743, enriched proﬁles and similar users 12年10月8日星期⼀一

Remarks • factorized user-genre matrix model：discover latent factors from genres
in order to enrich users proﬁles • Enrichment：adjust the user preference for a genre by considering the most relevant topics that compose the genre • less sparse because individual weights will be associated to a general information about all the content • factorization：infer latent semantics without metadata 12年10月8日星期⼀一

Discovering Latent Factors from Movies Genres f...

Discovering Latent Factors from Movies Genres for Enhanced Recommendation

ChiaChia Lee

More Decks by ChiaChia Lee

Other Decks in Technology

Featured

Transcript

Discovering Latent Factors from Movies Genres for Enhanced Recommendation Marcelo

user-genre model • user-item: sparse(rating missing), high dimension(many items) •

w(g, u) g1 g2 g3 g4 u1 u2 u3 u4

user-genre model • Preferred Genres • Factorization • Enrichment •

notation • U: users • S: items • R: ratings

g5 g4 g3 g1 g2 s5 s3 s2 s1 genres

g5 g4 g3 g1 g2 s5 s3 s2 s1 genres

• set of pairs • Chia rated 5: s1(g1, g2)、s2(g1,

tf-idf • the set of different ratings assigned to genre

w(g, u) • w(g, u) represents how much a user

w(g, u) g1 g2 g3 g4 u1 5 2 0

w(g, u) g1 g2 g3 g4 u1 5 2 0

missing value • before user-genre matrix factorization • substitutes the

SVD • user-genre matrix M • The resulting factorized：a topic

Enrichment • (1) reducing the effects produced by the lack

update w(g, u) • update w(g, u) -> w’(g, u)

User Similarity • two users u and v, their similarity

Recommendation • collaborative ﬁltering • Predict unknown rating δu(s) •

Experimental results • (a): 1.535, average of w(g, u) without

Remarks • factorized user-genre matrix model：discover latent factors from genres