User Modeling in Folksonomies

Takuya Kitazawa* Masahide Sugiyama School of Computer Science and Engineering
The University of Aizu, Fukushima, Japan * Current afﬁliation is Graduate School of Information Science and Technology, The University of Tokyo, Japan User Modeling in Folksonomies: Relational Clustering and Tag Weighting

1. What is Folksonomies? — Problem formulation 2 User Modeling
in Folksonomies 2. How to tackle problems 3. Recommender system-based evaluation

3 Folksonomies = social networking services e.g. Flickr Delicious Users’
preferences / characteristics extract

4 Conventional: compute vectors one-by-one user1 [1 0 0 1
0 1 1 0 0 0 0 0] user2 [0 1 0 1 0 0 1 0 0 1 0 0] … userN [1 0 0 0 0 0 0 0 0 1 0 0] “Create, normalize, and then compute…” Accurate, but time-consuming

5 Related work [Niwa et al. 2006] S. Niwa et
al. Web page recommender system based on folksonomy mining. In Proc. of ITNG2006, pages 388–393, Apr. 2006.

6 Roughly obtain preferences/characteristics user1 1 0 0 1 0
1 1 0 0 0 0 0 user2 0 1 0 1 0 0 1 0 0 1 0 0 … userN 1 0 0 0 0 0 0 0 0 1 0 0 ✦ Low accuracy 㲗 serendipity ✦ Short running time for future application Use matrices with stochastic model

7 Approach: Tag weights-based user modeling users contents Tag frequencies
for every content “User model” = group structures and weighted tags Relational matrix

8 Group structures: Inﬁnite Relational Model (IRM) ✦ Simultaneous relational
clustering ✦ Find group structures with strength = η C. Kemp et al. Learning systems of concepts with an inﬁnite relational model. In Proc. of AAAI2006, pp. 381–388, July 2006. assign clusters and then sorted

9 Apply IRM (1/2) Data set Hatena bookmark (social bookmarking)
IRM-based relational clustering 1,017 users 7,000 web pages

10 Apply IRM (2/2) Data set Hatena bookmark (social bookmarking)
— tags

11 Tag weights: TF-IDF-like weighting technique (1/2) TF-IDF weighting in
information retrieval Term Frequency (TF) Terms appear many times → characteristic Inverse Document Frequency (IDF) Terms appear in many different documents → irrelevant (e.g. a, the) use similar idea

12 Tag weights: TF-IDF-like weighting technique (2/2) TF-IDF-like tag weighting
Term Frequency (TF) Tags appear many times → characteristic Inverse Document Frequency (IDF) Tags appear in many different content clusters → irrelevant

13 Results: top-20 tags topical news technical topics 1st page
cluster 3rd page cluster weight Rank of tags

14 Connecting to user modeling (overview) Find strong relation →
preferences

15 Tag weights → user models (overall weights) Overall tag
weights for single user cluster tech general general tech … overall weight Rank of tags strength η tag weights ×

User-model-based recommendation New page’s tags User models thresholding P by
θ for every cluster 16 By summing up, compute page’s prediction degree “P” “New page can be preferred for these users?” P > θ : recommend to every user in cluster

Evaluation setting 17 Matrix Tuples Used same 1017-by-7000 dataset from
Hatena bookmark 172,365 tuples in total ✦ 5-fold cross validation with F-measure ✦ User modeling by using learning data ✦ Thresholding all test tuples for every user cluster

Accuracy and running time Better accuracy than worst and faster
running time → achieved sketchy user modeling worst base proposed accuracy higher is better 18 including IRM-based clustering ↑

19 Summary User modeling with faster, sketchy data mining Combine
relational clustering and tag weighting Achieved faster, sketchy recommendation 2. How to tackle problems 3. Recommender system-based evaluation 1. What is Folksonomies? — Problem formulation

20 ✦ Consider more competitors ✦ Improve accuracy ✦ Take
incremental/online approaches How can I roughly obtain users’ group structures and their preferences on web services? Conclusion

User Modeling in Folksonomies: Relational Clustering and Tag Weighting Takuya
Kitazawa Email: [email protected] Implementations and datasets: github.com/takuti/wims-2015

Running time of IRM-based clustering 22 5 sec 1,017 users
7,000 web pages 13 sec Iteration 0 Iteration 1 Iteration 2 more accurate?

Inﬂuence of threshold θ and IRM iteration More iteration is
more accurate? → Probably NOT 23

User Modeling in Folksonomies

User Modeling in Folksonomies

Takuya Kitazawa

More Decks by Takuya Kitazawa

Other Decks in Research

Featured

Transcript

Takuya Kitazawa* Masahide Sugiyama School of Computer Science and Engineering

1. What is Folksonomies? — Problem formulation 2 User Modeling

3 Folksonomies = social networking services e.g. Flickr Delicious Users’

4 Conventional: compute vectors one-by-one user1 [1 0 0 1

5 Related work [Niwa et al. 2006] S. Niwa et

6 Roughly obtain preferences/characteristics user1 1 0 0 1 0

7 Approach: Tag weights-based user modeling users contents Tag frequencies

8 Group structures: Inﬁnite Relational Model (IRM) ✦ Simultaneous relational

9 Apply IRM (1/2) Data set Hatena bookmark (social bookmarking)

10 Apply IRM (2/2) Data set Hatena bookmark (social bookmarking)

11 Tag weights: TF-IDF-like weighting technique (1/2) TF-IDF weighting in

12 Tag weights: TF-IDF-like weighting technique (2/2) TF-IDF-like tag weighting

13 Results: top-20 tags topical news technical topics 1st page

14 Connecting to user modeling (overview) Find strong relation →

15 Tag weights → user models (overall weights) Overall tag

User-model-based recommendation New page’s tags User models thresholding P by

Evaluation setting 17 Matrix Tuples Used same 1017-by-7000 dataset from

Accuracy and running time Better accuracy than worst and faster

19 Summary User modeling with faster, sketchy data mining Combine

20 ✦ Consider more competitors ✦ Improve accuracy ✦ Take

User Modeling in Folksonomies: Relational Clustering and Tag Weighting Takuya

Running time of IRM-based clustering 22 5 sec 1,017 users

Inﬂuence of threshold θ and IRM iteration More iteration is