Upgrade to Pro — share decks privately, control downloads, hide ads and more …

On Analyzing User Topic-Specific Platform Preferences Across Multiple Social Media Sites

Roy Lee
April 16, 2017

On Analyzing User Topic-Specific Platform Preferences Across Multiple Social Media Sites

Topic modeling has traditionally been studied for single text collections and applied to social media data represented in the form of text documents. With the emergence of many social media platforms, users find themselves using different social media for posting content and for social interaction. While many topics may be shared across social media platforms, users typically show preferences of certain social media platform(s) over others for certain topics. Such platform preferences may even be found at individual level. To model social media topics as well as platform preferences of users, we propose a new topic model known as MultiPlatform-LDA (MultiLDA). Instead of just merging all posts from different social media platforms into a single text collection, MultiLDA keeps one text collection for each social media platform but allowing these platforms to share a common set of topics. MultiLDA further learns the user-specific platform preferences for each topic. We evaluate MultiLDA against TwitterLDA, the state-of-the-art method for social media content modeling, on two aspects: (i) the effectiveness in modeling topics across social media platforms, and (ii) the ability to predict platform choices for each post. We conduct experiments on three real-world datasets from Twitter, Instagram and Tumblr sharing a set of common users. Our experiments results show that the MultiLDA outperforms in both topic modeling and platform choice prediction tasks. We also show empirically that among the three social media platforms, "Daily matters" and "Relationship matters" are dominant topics in Twitter, "Social gathering", "Outing" and "Fashion" are dominant topics in Instagram, and "Music", "Entertainment" and "Fashion" are dominant topics in Tumblr.

Roy Lee

April 16, 2017
Tweet

More Decks by Roy Lee

Other Decks in Research

Transcript

  1. On Analyzing User Topic-Specific Platform Preference Across Multiple Social Media

    Sites Roy  Ka-­‐Wei  LEE,  Tuan-­‐Anh  HOANG  &  Ee-­‐Peng  LIM   Singapore  Management  University   1  
  2. 2   Background   Topical  Modeling  in  Social  Media:  

    1.  Widely  studied  research  area   2.  Tend  to  confine  to  textual  content   3.  Rarely  look  beyond  single  social  media   plaPorm  
  3. 4   Instagram  Example   Topical  Interests:   - Food  

    - Breakfast   - Beer   - Wine  
  4. 5   Research  Objec>ves   Propose  Mul$Pla'orm-­‐LDA,  a  generaXve  

    model  that  holisXcally  learns  user’s   1.  Topical  interests  across  mulXple  social   media  plaPorms   2.  PlaPorm  preferences  for  his/her  social   media  content  
  5. 6   Research  Framework   Design  GeneraXve   Process  

    Parameters   Inference  via  Gibbs   Sampling   Model  Construc>on   Data  Crawling   Image  and  Video   Tagging  for  Rich   Media  Posts   Data  Processing   User  Linkage   Likelihood-­‐ Perplexity  of  Learnt   Content   User  PlaPorm   Preference   PredicXon   Evalua>on   MulXPlaPorm-­‐LDA   Model   2785  users  on   Twi`er,  Instagram   and  Tumblr,  their   ~5.8  million  posts   Empirical  Study   Findings  
  6. 8   Model  Overview   Board  Idea:  MulXPlaPorm-­‐LDA  model  

    simulates  the  generaXon  of  an  observed   user’s  post  from  her  latent  topical  interests   and  topic-­‐specific  pla'orm  preferences  
  7. 12   K |Nu,s | wu,s,n yu,s,n zu,s K |

    | | | Genera>ve  Process  
  8. 13   K |Nu,s | wu,s,n yu,s,n zu,s K |

    | | | pu,s Genera>ve  Process  
  9. 15   Linked  User  Dataset   Twi/er   Instagram  

    Tumblr   Twi/er   2696   2446   272   Instagram   2537   111   Tumblr   362   Breakdown  of  users  in  each  pla'orm  pair   1)  Retrieve  a  set  of   ~200K  Singapore  -­‐ based  Twi`er  users   2)  Obtain  subset  of  users   who  menXoned   Instagram/Tumbler   accounts  in  Twi`er  bios   3)  Obtain  subset  of  users   who  acXve  users  in  at   least  2  of  the  3  plaPorms   2785  users  
  10. 16   Twi/er   Instagram   Tumblr   Text  

    4,923,083   135,853   Image   223,325   515,530   Video   27,015   Breakdown  for  types  of  users’  2015  posts  in  each  pla'orm   Problem:  ~23%  of  image  and  video   posts  are  not  annotated  by  users   Use  Clarifai1  to  generate  word  tags   for  all  collected  photos  and  videos   1h`ps://www.clarifai.com/   Linked  User  Dataset  
  11. 17   Performance  Evalua>on   Evaluate  MulXPlaPorm-­‐LDA  model  in  2

     aspects:   1.  EffecXveness  in  modeling  topics  in  content  from   mulXple  social  media  plaPorm   2.  Accuracy  of  predicXng  users’  plaPorm  choices  as  they   generate  posts   Evaluate  against  Twi`erLDA2   2Comparing  Twi`er  and  tradiXonal  media  using  topic  models  (in  ECIR’11)  
  12. 18   Post  Content  Modeling   Evaluate  effecXveness  in  modeling

     topic  using   likelihood/perplexity  of  the  training/test  set  respecXvely        
  13. 19   PlaEorm  Choice  Predic>on   Task:  Predict  user’s  plaPorm

     choices  given  the   content  of  the  test  posts   MulXPlaPorm-­‐LDA   o  Assign  the  post’s  topic  using  the  trained  model     o  Select  most  probable  plaPorm  for  the  assigned  post  topic  based  on   the  user’s  topic  specific  plaPorm  distribuXon   Twi`erLDA   o  Assign  the  post’s  topic  using  the  trained  model   o  Select  the  most  popular  plaPorm  choice  for  the  assigned  topic   according  to  the  training  set  
  14. 21   User  Topic  Similarity  Between  PlaEorms   Given  a

     pair  of  plaPorms,  compute  the  Jensen-­‐Shannon   Divergence  (JSD)  between  a  user’s  topic  distribuXon  on  the   two  plaPorms  
  15. •  Proposed  a  novel  topic  model   MulXPlaPorm-­‐LDA,  which  jointly

     models   social  media  topics  as  well  as  user’s   plaPorm  preference   •  Evaluated  our  model  with  real-­‐world   datasets  and  benchmark  against  exisXng   model   •  Empirically  show  that  user  exhibit   different  topical  preference  across   plaPorms   Summary   23   For  codes  &  data:   h`p://workbench.roylee.sg/  
  16. 25   Case  Studies   UserId   #Twt  Posts  

    #Ins  Posts   #Tum  Posts   Twi/erLDA   Acc   Mul>LDA   Acc   U1659   95   20   -­‐   0.083   0.916   U2709   224   -­‐   12   1   0.875   •  Individual  Preferences  Exists  –  Many  of  U1659’s  Twi`er   posts  fall  into  Music  and  Entertainment  topic,  which  is   popular  in  Tumblr,  thus  Twi`erLDA  wrongly  predicted  the   posts’  plaPorm.   •  Advantage  of  Popular  Topics  –  Some  of  U2709’s  Tumblr   posts  from  Music  and  Entertainment  topic,  is  wrongly   predicted  by  MulXLDA  as  the  user  has  not  published  posts   of  this  topic  on  Tumblr  in  training  data  set