Upgrade to Pro — share decks privately, control downloads, hide ads and more …

CIKM2016 参加報告 @CyberZ スキルウェンズデー

CIKM2016 参加報告 @CyberZ スキルウェンズデー

2016年10月にCIKM2016に参加し発表してきました。その報告発表になります。

czimamori-daichi

December 22, 2016
Tweet

Other Decks in Research

Transcript

  1. 2 0 1 6 C I K M ࢀ Ճ

    ใ ࠂ S K I L L W E D N E S D A Y 2 0 1 6 1 2 / 2 1 I M A M O R I D A I C H I
  2. C I K M 2 0 1 6 ͱ ͸

    … ? • CIKM = ACM International Conference on Information and Knowledge Management • ৘ใݕࡧɼσʔλϚΠχϯάɼσʔλϕʔεͱ͍ͬͨσʔ λαΠΤϯεʹؔ͢Δ࿩୊Λ޿͘ѻ͏ओཁࠃࡍձٞͷҰͭ • ࠓ೥͸͋Βͨʹ“Frontier and Applications of Big Data Science”ͱ͍͏ςʔϚΛܝ͍͛ͯΔ
  3. ಉ Ұ ྖ Ҭ ͷ ओ ཁ ࠃ ࡍ ֶ

    ձ • WWW (International World Wide Web Conference) • WSDM (ACM International Conference on Web Search and Data Mining) • SIGIR (ACM SIGIR Conference on Research and Development in Information Retrieval)
  4. 1 . Va n d a l i s m

    D e t e c t i o n i n W i k i d a t a . • Wikipedia ͷฤूʹ͓͚ΔߥΒ͠ (Vandalism) Λൃݟ͢ Δख๏ͷఏҊ • ैདྷख๏ΑΓ΋ੑೳΛେ෯ʹվળ(0.665 → 0.991) • 47ͷಛ௃ͱ4ͭͷ෼ྨʹΑͬͯػցֶश
  5. 1 . Va n d a l i s m

    D e t e c t i o n i n W i k i d a t a .
  6. 2 . I m p ro v i n g

    P e r s o n a l i z e d Tr i p R e c o m m e n d a t i o n b y Av o i d i n g C ro w d s . • ݸਓ࠷దԽཱྀͨ͠ߦܭըΛɺࠞࡶΛճආ͠ͳ͕Βਪન ͢Δख๏ • ڠௐϑΟϧλϦϯάͱٜίϩχʔ࠷దԽΛར༻
  7. 3 . U s e r R e s p

    o n s e L e a r n i n g f o r D i re c t l y O p t i m i z i n g C a m p a i g n P e r f o r m a n c e i n D i s p l a y A d v e r t i s i n g . • ޿ࠂΦʔΫγϣϯʹ͓͍ͯɺϢʔβͷΫϦοΫ཰Λ༧ ଌͯ͠ϦΞϧλΠϜʹֻ͚ۚΛ࠷దԽ͢Δख๏ • Ωϟϯϖʔϯʹର͢Δརӹ͕ɺoffline Ͱ͸ 78.2%ɺ online Ͱ͸ 25.5% ޲্
  8. ࢥ ͍ ग़ ͦ ͷ 1 : ͸ ͡ Ί

    ͯ ͷ ւ ֎ • ࠓճͷͨΊʹύεϙʔτΛऔಘ • ॳւ֎ • ୯਎ • 13࣌ؒͷϑϥΠτ • ΞϝϦΧࠃ಺Ͱͷ৐Γܧ͗ • ӳޠ஻Εͳ͍
  9. ࢥ ͍ ग़ ͦ ͷ 2 : ϗ ςϧ ʹ

    ͨ Ͳ Γ ண ͔ ͳ ͍ • ݱ஍ͷۭߓ͔Βϗςϧʹߦ͘όε͕ͲΕ͔Θ͔Βͳ͍ • ৐߹όεͷछྨ͕ଟ͗͢ • ยݴͷӳޠͰਘͶΔ΋ɺ૬खͷݴͬͯΔ͜ͱ͕Θ͔Β ͳ͍ • ҧ͏όεʹؒҧ͑ͯ৐ͬͯɺӡసखʹݏͳإ͞ΕΔ • ݁ہҰ࣌ؒ൒͘Β͍όεͷΓ͹Ͱ͏Ζ͏Ζ
  10. ࢥ ͍ ग़ ͦ ͷ 3 : ৯ ΂ ෺

    ͕ ಘ Β Ε ͳ ͍ • ӳޠ͕ෆࣗ༝ͳͷͰɺҿ৯ళͰͷձ࿩͕ਏ͍ • ઌੜͱ৯΂Δ͔ɺֶձ͔ΒͰΔ৯ࣄ͔͠ۃྗ৯΂ͳ͍ • ϗςϧ಺ͷࣗൢػ͕༑ୡ • ؼΓͷݱ஍ͷۭߓͰே৯ΛࣗྗͰཔΜͰ৯΂ͨΞϝϦ ΧϯϒϨοΫϑΝʔετ͸ඒຯ͔ͬͨ͠
  11. BACKGROUND    Predic>ng Popularity of TwiGer Accounts through

    the Discovery of Link-Propaga>ng Early Adopters  We propose a method of es>ma>ng prospec>ve popularity of new users. In social media, such as •  New useful users frequently appears. •  We want to detect such new useful users. •  Popularity-based methods, e.g. , HITS and PageRank, do not work well for new users that have not established their reputa>on yet.
  12. OUR APPROACH    Predic>ng Popularity of TwiGer Accounts

    through the Discovery of Link-Propaga>ng Early Adopters  We first detect early adopters Early adopters = The users who are good at finding new good informa>on sources earlier than others. •  The new users followed by early adopters are probably good informa>on sources even if they have few followers at this point. •  We can find good informa>on sources by detec>ng early adopters.
  13. OUR APPROACH    Predic>ng Popularity of TwiGer Accounts

    through the Discovery of Link-Propaga>ng Early Adopters  Informa>on Source EA Early Adopters EA EA S
  14. DETECTION OF EARLY ADOPTERS    Predic>ng Popularity of

    TwiGer Accounts through the Discovery of Link-Propaga>ng Early Adopters  How do we detect early adopters? EA S C Source Copier Early Adopter C imitated EA’s link to S Assump>on Early adopters The users whose follow links are imitated by many followers.
  15. DETECTION OF EARLY ADOPTERS    Predic>ng Popularity of

    TwiGer Accounts through the Discovery of Link-Propaga>ng Early Adopters  We can detect early adopters based on the frequency of link imita>on. How do we detect early adopters? Assump>on Early adopters The users whose follow links are imitated by many followers. Copiers EA S C3 Source C1 C2
  16. ROADMAP    Predic>ng Popularity of TwiGer Accounts through

    the Discovery of Link-Propaga>ng Early Adopters  Copiers EA S C3 Source C1 C2 1. Detect links created through imita>on. 2. Count number of link imita>on. 3. Calculate early adopter score from the number of link imita>on.
  17. ROADMAP    Predic>ng Popularity of TwiGer Accounts through

    the Discovery of Link-Propaga>ng Early Adopters  1. Detect links created through imita>on. 2. Count number of link imita>on. 3. Calculate early adopter score from the number of link imita>on. C3 C1 Copiers S Source C2 EA
  18. ROADMAP    Predic>ng Popularity of TwiGer Accounts through

    the Discovery of Link-Propaga>ng Early Adopters  1. Detect links created through imita>on. 2. Count number of link imita>on. 3. Calculate early adopter score from the number of link imita>on. Copiers EA S C3 Source C1 C2 1 2 3
  19. ROADMAP Predic>ng Popularity of TwiGer Accounts through the Discovery of

    Link-Propaga>ng Early Adopters 1. Detect links created through imita>on. 2. Count number of link imita>on. 3. Calculate early adopter score from the number of link imita>on. Copiers S C3 Source C1 C2 1 2 3 EA Score
  20. Source S ROADMAP    Predic>ng Popularity of TwiGer

    Accounts through the Discovery of Link-Propaga>ng Early Adopters  4. Calculate all users’ early adopter score. 0.2 0.9 0.1 5. Calculate future popularity score from followers’ early adopter score.
  21. ROADMAP    Predic>ng Popularity of TwiGer Accounts through

    the Discovery of Link-Propaga>ng Early Adopters  4. Calculate all users’ early adopter score. 5. Calculate future popularity score from followers’ early adopter score. Source S 0.2 0.9 0.1 0.2 + 0.9 + 0.1 = 1.2
  22. ROADMAP    Predic>ng Popularity of TwiGer Accounts through

    the Discovery of Link-Propaga>ng Early Adopters  Copiers EA S C3 Source C1 C2 1. Detect links created through imita>on. 2. Count number of link imita>on. 3. Calculate early adopter score from the number of link imita>on.
  23. ROADMAP    Predic>ng Popularity of TwiGer Accounts through

    the Discovery of Link-Propaga>ng Early Adopters  1. Detect links created through imita>on. 2. Count number of link imita>on. 3. Calculate early adopter score from the number of link imita>on. C3 C1 Copiers S Source C2 EA
  24. DETECTION OF LINK IMITATION    Predic>ng Popularity of

    TwiGer Accounts through the Discovery of Link-Propaga>ng Early Adopters  We cannot immediately know imita>on of follow links.
  25.    Predic>ng Popularity of TwiGer Accounts through the

    Discovery of Link-Propaga>ng Early Adopters   DETECTION OF LINK IMITATION We cannot immediately know imita>on of follow links. triangle Non-reciprocity Structures
  26.    Predic>ng Popularity of TwiGer Accounts through the

    Discovery of Link-Propaga>ng Early Adopters   EA S C Source Copier Early Adopter DETECTION OF LINK IMITATION We cannot immediately know imita>on of follow links. triangle Non-reciprocity This triangle is a trace that C imitated EA’s link to S. Structures
  27.    Predic>ng Popularity of TwiGer Accounts through the

    Discovery of Link-Propaga>ng Early Adopters   EA S C reciprocal follow links DETECTION OF LINK IMITATION We cannot immediately know imita>on of follow links. triangle Non-reciprocity Structures
  28.    Predic>ng Popularity of TwiGer Accounts through the

    Discovery of Link-Propaga>ng Early Adopters   EA S C Friendship or followback DETECTION OF LINK IMITATION reciprocal follow links We cannot immediately know imita>on of follow links. triangle Non-reciprocity Structures It is important whether this link of triangle is reciprocal or non- reciprocal.
  29. Predic>ng Popularity of TwiGer Accounts through the Discovery of Link-Propaga>ng

    Early Adopters EA S C non- reciprocal follow links We only count triangles where the link is a non- reciprocal. DETECTION OF LINK IMITATION We cannot immediately know imita>on of follow links. triangle Non-reciprocity Structures
  30.    Predic>ng Popularity of TwiGer Accounts through the

    Discovery of Link-Propaga>ng Early Adopters We can detect links created through imita>on  EA S C DETECTION OF LINK IMITATION non- reciprocal follow links We cannot immediately know imita>on of follow links. triangle Non-reciprocity Structures
  31. WHEN THERE ARE MULTIPLE CANDIDATES    Predic>ng Popularity

    of TwiGer Accounts through the Discovery of Link-Propaga>ng Early Adopters  However, in the right figure, it is difficult to determine which was imitated by C, EA1 or EA2 EA1 S C EA2
  32.    Predic>ng Popularity of TwiGer Accounts through the

    Discovery of Link-Propaga>ng Early Adopters  EA1 S C EA2 Score 0.5 Score 0.5 Scoring by similarity between users Each candidates are given a score equally. However, in the right figure, it is difficult to determine which was imitated by C, EA1 or EA2 WHEN THERE ARE MULTIPLE CANDIDATES
  33. ROADMAP    Predic>ng Popularity of TwiGer Accounts through

    the Discovery of Link-Propaga>ng Early Adopters  1. Detect links created through imita>on. 2. Count number of link imita>on. 3. Calculate early adopter score from the number of link imita>on. Copiers EA S C3 Source C1 C2 1 2 3
  34.    Predic>ng Popularity of TwiGer Accounts through the

    Discovery of Link-Propaga>ng Early Adopters  HOW TO COUNT NUMBER OF IMITATION Our method process all edges in the graph one by one. S C S C Each follow link f in a graph f
  35.      S C f S C

    f HOW TO COUNT NUMBER OF IMITATION Intersec>on of S’s followers and C’s followees Our method process all edges in the graph one by one. Predic>ng Popularity of TwiGer Accounts through the Discovery of Link-Propaga>ng Early Adopters Each follow link f in a graph
  36.      HOW TO COUNT NUMBER OF

    IMITATION These intersec>on users are candidates of users whose links to S were imitated by C. Our method process all edges in the graph one by one. Predic>ng Popularity of TwiGer Accounts through the Discovery of Link-Propaga>ng Early Adopters S C f S C f EA1 EA2 Each follow link f in a graph
  37.    Predic>ng Popularity of TwiGer Accounts through the

    Discovery of Link-Propaga>ng Early Adopters   Accumula>ng the score for each candidate HOW TO COUNT NUMBER OF IMITATION Our method process all edges in the graph one by one. S C f S C f EA1 EA2 +0.5 Each follow link f in a graph
  38.    Predic>ng Popularity of TwiGer Accounts through the

    Discovery of Link-Propaga>ng Early Adopters   Accumula>ng the score for each candidate HOW TO COUNT NUMBER OF IMITATION Each follow link f in a graph Our method process all edges in the graph one by one. EA1’s accumulated score = 0.5 S C f S C f EA1 EA2 +0.5
  39. f S S f EA2 +0.5 Predic>ng Popularity of TwiGer

    Accounts through the Discovery of Link-Propaga>ng Early Adopters Accumula>ng the score for each candidate HOW TO COUNT NUMBER OF IMITATION Each follow link f in a graph EA1’s accumulated score = 0.5 + 0.25 = 0.75 C C S’ f’ EA2 EA3 EA4 +0.25 EA1 Our method process all edges in the graph one by one.
  40.    Predic>ng Popularity of TwiGer Accounts through the

    Discovery of Link-Propaga>ng Early Adopters  Ader processing all links in the graph, the scores accumulated to each user is the expected number of >mes that user has been imitated. HOW TO COUNT NUMBER OF IMITATION Each follow link f in a graph Our method process all edges in the graph one by one. f S S f EA2 +0.5 C C S’ f’ EA2 EA3 EA4 +0.25 EA1
  41. ROADMAP    Predic>ng Popularity of TwiGer Accounts through

    the Discovery of Link-Propaga>ng Early Adopters  1. Detect links created through imita>on. 2. Count number of link imita>on. 3. Calculate early adopter score from the number of link imita>on. Copiers S C3 Source C1 C2 1 2 3 EA Score
  42. EARLY ADOPTER SCORE    Predic>ng Popularity of TwiGer

    Accounts through the Discovery of Link-Propaga>ng Early Adopters  EA ... ... EA’s followers EA’s followees Early adopter score Number of link imita>on |followees| x |followers| =
  43. |followees| x |followers| EARLY ADOPTER SCORE    Predic>ng

    Popularity of TwiGer Accounts through the Discovery of Link-Propaga>ng Early Adopters  EA ... ... EA’s followers EA’s followees Early adopter score Number of link imita>on = This denominator corresponds to the maximum number of >mes this user could be imitated.
  44. |followees| x |followers| EARLY ADOPTER SCORE    Predic>ng

    Popularity of TwiGer Accounts through the Discovery of Link-Propaga>ng Early Adopters  EA ... ... EA’s followers EA’s followees Early adopter score Number of link imita>on = This frac>on corresponds to the imita>on ra>o of this user.
  45. EARLY ADOPTER SCORE    Predic>ng Popularity of TwiGer

    Accounts through the Discovery of Link-Propaga>ng Early Adopters   EA 3 followers 2 followees Early adopter score = |followees| x |followers| Number of link imita>on = = 2 x 3 6
  46. EARLY ADOPTER SCORE    Predic>ng Popularity of TwiGer

    Accounts through the Discovery of Link-Propaga>ng Early Adopters   EA 3 followers 2 followees Early adopter score = |followees| x |followers| Number of link imita>on = = 2 2 2 x 3 6
  47. 3 1 EARLY ADOPTER SCORE    Predic>ng Popularity

    of TwiGer Accounts through the Discovery of Link-Propaga>ng Early Adopters   EA 3 followers 2 followees Early adopter score = |followees| x |followers| Number of link imita>on = = 2 x 3 6 2 2 =
  48. Source S ROADMAP    Predic>ng Popularity of TwiGer

    Accounts through the Discovery of Link-Propaga>ng Early Adopters   4. Calculate all users’ early adopter score. 0.2 0.9 0.1 5. Calculate future popularity score from followers’ early adopter score.
  49. ROADMAP Predic>ng Popularity of TwiGer Accounts through the Discovery of

    Link-Propaga>ng Early Adopters 4. Calculate all users’ early adopter score. 5. Calculate future popularity score from followers’ early adopter score. Source S
  50. FUTURE POPULARITY SCORE    Predic>ng Popularity of TwiGer

    Accounts through the Discovery of Link-Propaga>ng Early Adopters  EA S EA EA We call the es>mated new user’s prospec>ve popularity future popularity score.
  51. FUTURE POPULARITY SCORE    Predic>ng Popularity of TwiGer

    Accounts through the Discovery of Link-Propaga>ng Early Adopters  EA S EA EA We call the es>mated new user’s prospec>ve popularity future popularity score Future popularity score is computed as sum of early adopter scores of the new user’s followers.
  52. FUTURE POPULARITY SCORE    Predic>ng Popularity of TwiGer

    Accounts through the Discovery of Link-Propaga>ng Early Adopters  EA S EA EA We call the es>mated new user’s prospec>ve popularity future popularity score Future popularity score is computed as sum of early adopter scores of the new user’s followers. 0.3 0.2 0.4 0.3 + 0.2 + 0.4 = =0.9  S’s future popularity score
  53. OUR EXPERIMENT •  Dataset [Li et al., KDD 2012] – A

    sub-graph of TwiGer crawled in 2011 – About 20,000,000 users – About 300,000,000 follow links •  Target users –  : we select then-new users that are •  within w weeks ader the crea>on, and •  have more than n followers     Predic>ng Popularity of TwiGer Accounts through the Discovery of Link-Propaga>ng Early Adopters T n w
  54. OUR EXPERIMENT •  Evalua>on – We rank users in by our

    methods and baselines. – Ground truth: we rank users by their number of non-reciprocal followers as of 2015. – Compute Spearman’s ρ     Predic>ng Popularity of TwiGer Accounts through the Discovery of Link-Propaga>ng Early Adopters T n w
  55. OUR EXPERIMENT •  Baseline methods – FW: Number of followers in

    May 2011 – PRnr : PageRank scores on the graph consis>ng only of non-reciprocal links – HITSnr : HITS scores on the graph consis>ng only of non-reciprocal links – AD: Adamic-Adar index •  Our methods – FPS: Feature popularity score – LR: The linear regression of FPS and some baselines      Predic>ng Popularity of TwiGer Accounts through the Discovery of Link-Propaga>ng Early Adopters
  56. OUR EXPERIMENT    Predic>ng Popularity of TwiGer Accounts

    through the Discovery of Link-Propaga>ng Early Adopters   Method T4 10 T2 10 T4 20 T2 20 T4 30 T2 30  data size 6921 1515 2259 431 979 165 FW 0.18 0.23 0.15 0.07 0.19 0.00 HITSnr  0.26 0.31 0.30 0.35 0.38 0.46 PRnr  0.16 0.09 0.21 0.20 0.30 0.32 AD -0.21 -0.13 -0.30 -0.46 -0.27 -0.50 FPS 0.39 0.41 0.39 0.45 0.40 0.47 LR 0.43 0.46 0.43 0.50 0.45 0.58
  57. OUR EXPERIMENT    Predic>ng Popularity of TwiGer Accounts

    through the Discovery of Link-Propaga>ng Early Adopters   Method T4 10 T2 10 T4 20 T2 20 T4 30 T2 30  data size 6921 1515 2259 431 979 165 FW 0.18 0.23 0.15 0.07 0.19 0.00 HITSnr  0.26 0.31 0.30 0.35 0.38 0.46 PRnr  0.16 0.09 0.21 0.20 0.30 0.32 AD -0.21 -0.13 -0.30 -0.46 -0.27 -0.50 FPS 0.39 0.41 0.39 0.45 0.40 0.47 LR 0.43 0.46 0.43 0.50 0.45 0.58 green: best within baselines •  HITS works best in most cases. •  AD is the best in some cases.
  58. •  FPS is the best in most cases among all

    the methods excluding LR •  LR is the best for all cases. It means that FPS captures some aspects that are not captured by other methods. OUR EXPERIMENT Method T4 10 T2 10 T4 20 T2 20 T4 30 T2 30  data size 6921 1515 2259 431 979 165 FW 0.18 0.23 0.15 0.07 0.19 0.00 HITSnr  0.26 0.31 0.30 0.35 0.38 0.46 PRnr  0.16 0.09 0.21 0.20 0.30 0.32 AD -0.21 -0.13 -0.30 -0.46 -0.27 -0.50 FPS 0.39 0.41 0.39 0.45 0.40 0.47 LR 0.43 0.46 0.43 0.50 0.45 0.58 red: best blue: best excluding LR 
  59. CONCLUSION •  We proposed a method of es>ma>ng prospec>ve popularity

    of new users. •  Our method es>mate it through the discovery of early adopters. •  Experiment by using sub-graph of TwiGer. •  Our method outperforms baselines. Predic>ng Popularity of TwiGer Accounts through the Discovery of Link-Propaga>ng Early Adopters