Slide 1

Slide 1 text

Takuya Kitazawa* Masahide Sugiyama School of Computer Science and Engineering The University of Aizu, Fukushima, Japan * Current affiliation is Graduate School of Information Science and Technology, The University of Tokyo, Japan User Modeling in Folksonomies: Relational Clustering and Tag Weighting

Slide 2

Slide 2 text

1. What is Folksonomies? — Problem formulation 2 User Modeling in Folksonomies 2. How to tackle problems 3. Recommender system-based evaluation

Slide 3

Slide 3 text

3 Folksonomies = social networking services e.g. Flickr Delicious Users’ preferences / characteristics extract

Slide 4

Slide 4 text

4 Conventional: compute vectors one-by-one user1 [1 0 0 1 0 1 1 0 0 0 0 0] user2 [0 1 0 1 0 0 1 0 0 1 0 0] … userN [1 0 0 0 0 0 0 0 0 1 0 0] “Create, normalize, and then compute…” Accurate, but time-consuming

Slide 5

Slide 5 text

5 Related work [Niwa et al. 2006] S. Niwa et al. Web page recommender system based on folksonomy mining. In Proc. of ITNG2006, pages 388–393, Apr. 2006.

Slide 6

Slide 6 text

6 Roughly obtain preferences/characteristics user1 1 0 0 1 0 1 1 0 0 0 0 0 user2 0 1 0 1 0 0 1 0 0 1 0 0 … userN 1 0 0 0 0 0 0 0 0 1 0 0 ✦ Low accuracy 㲗 serendipity ✦ Short running time for future application Use matrices with stochastic model

Slide 7

Slide 7 text

7 Approach: Tag weights-based user modeling users contents Tag frequencies for every content “User model” = group structures and weighted tags Relational matrix

Slide 8

Slide 8 text

8 Group structures: Infinite Relational Model (IRM) ✦ Simultaneous relational clustering ✦ Find group structures with strength = η C. Kemp et al. Learning systems of concepts with an infinite relational model. In Proc. of AAAI2006, pp. 381–388, July 2006. assign clusters and then sorted

Slide 9

Slide 9 text

9 Apply IRM (1/2) Data set Hatena bookmark (social bookmarking) IRM-based relational clustering 1,017 users 7,000 web pages

Slide 10

Slide 10 text

10 Apply IRM (2/2) Data set Hatena bookmark (social bookmarking) — tags

Slide 11

Slide 11 text

11 Tag weights: TF-IDF-like weighting technique (1/2) TF-IDF weighting in information retrieval Term Frequency (TF) Terms appear many times → characteristic Inverse Document Frequency (IDF) Terms appear in many different documents → irrelevant (e.g. a, the) use similar idea

Slide 12

Slide 12 text

12 Tag weights: TF-IDF-like weighting technique (2/2) TF-IDF-like tag weighting Term Frequency (TF) Tags appear many times → characteristic Inverse Document Frequency (IDF) Tags appear in many different content clusters → irrelevant

Slide 13

Slide 13 text

13 Results: top-20 tags topical news technical topics 1st page cluster 3rd page cluster weight Rank of tags

Slide 14

Slide 14 text

14 Connecting to user modeling (overview) Find strong relation → preferences

Slide 15

Slide 15 text

15 Tag weights → user models (overall weights) Overall tag weights for single user cluster tech general general tech … overall weight Rank of tags strength η tag weights ×

Slide 16

Slide 16 text

User-model-based recommendation New page’s tags User models thresholding P by θ for every cluster 16 By summing up, compute page’s prediction degree “P” “New page can be preferred for these users?” P > θ : recommend to every user in cluster

Slide 17

Slide 17 text

Evaluation setting 17 Matrix Tuples Used same 1017-by-7000 dataset from Hatena bookmark 172,365 tuples in total ✦ 5-fold cross validation with F-measure ✦ User modeling by using learning data ✦ Thresholding all test tuples for every user cluster

Slide 18

Slide 18 text

Accuracy and running time Better accuracy than worst and faster running time → achieved sketchy user modeling worst base proposed accuracy higher is better 18 including IRM-based clustering ↑

Slide 19

Slide 19 text

19 Summary User modeling with faster, sketchy data mining Combine relational clustering and tag weighting Achieved faster, sketchy recommendation 2. How to tackle problems 3. Recommender system-based evaluation 1. What is Folksonomies? — Problem formulation

Slide 20

Slide 20 text

20 ✦ Consider more competitors ✦ Improve accuracy ✦ Take incremental/online approaches How can I roughly obtain users’ group structures and their preferences on web services? Conclusion

Slide 21

Slide 21 text

User Modeling in Folksonomies: Relational Clustering and Tag Weighting Takuya Kitazawa Email: [email protected] Implementations and datasets: github.com/takuti/wims-2015

Slide 22

Slide 22 text

Running time of IRM-based clustering 22 5 sec 1,017 users 7,000 web pages 13 sec Iteration 0 Iteration 1 Iteration 2 more accurate?

Slide 23

Slide 23 text

Influence of threshold θ and IRM iteration More iteration is more accurate? → Probably NOT 23

Slide 24

Slide 24 text

No content