Upgrade to Pro — share decks privately, control downloads, hide ads and more …

rinkou_02

tom--bo
July 20, 2016

 rinkou_02

Learning "Information network or Social network?: the structure of the twitter follow graph"

tom--bo

July 20, 2016
Tweet

More Decks by tom--bo

Other Decks in Research

Transcript

  1. Information  network  or  Social  network?:  
    the  structure  of  the  twitter  follow  graph
    tom_̲_̲bo
    Proceedings  of  the  23rd International  Conference  on  World  Wide  Web,
    Seth  A.  Myers,  Aneesh Sharma,  Pankaj  Gupta,  Jimmy  Lin
    2014,  Pages  494-‑498

    View Slide

  2. *OUSPEVDUJPO
    • This  paper  provides
    • a  characterization  of  the  topological  features  of  the  Twitter  follow  
    graph
    • analyzing  properties  such  as  ...
    • degree  distributions
    • connected  components
    • shortest  path  lengths
    • clustering  coefficients
    • degree  assortativity
    • Goal  
    1. present  a  set  of  authoritative  descriptive  statistics
    2. use  these  characterizations  to  offer  new  insight  into  a  question
    which  is  “  Is  Twitter  a  social  network  or  an  information  network?

    View Slide

  3. %FGJOJUJPO
    • Social  network  
    • high degree  assortativity
    • small shortest  path  lengths
    • large connected  components
    • high clustering  coefficients
    • high degree  of  reciprocity
    • Information  network
    • large vertex  degrees
    • a  lack of  reciprocity
    • large two-‑hop  neighborhoods
    4PDJBMOFUXPSL*OGPSNBUJPOOFUXPSL

    View Slide

  4. %BUB
    • Twitter  follow  graph  (second  half  of  2012)
    • 175  million  active  users
    • 20  billion  edges
    • 3  countries  and  complete
    • Brazil  (BR)
    • Japan  (JP)
    • United  States  (US)
    • Graph
    • Directed  graph
    • 42%  reciprocated
    • (4  billion  undirected  edges)

    View Slide

  5. $POUSBTU
    • contrasted  with  studies  of  other  social  networks
    • Facebook
    • 721  million  vertices
    • 68.7  billion  undirected  edges
    • MSN  Messenger
    • 180  million  vertices
    • 1.3  billion  undirected  edges

    View Slide

  6. (SBQI$IBSBDUFSJTUJDT
    • Twitter  follow  graph  is  directed.
    • inbound  degree
    • in-‑degree
    • the  number  of  users  who  follow  them
    • outbound  degree
    • out-‑degree
    • the  number  of  users  who  they  follow
    %FHSFF%JTUSJCVUJPOT

    View Slide

  7. (SBQI$IBSBDUFSJTUJDT
    %FHSFF%JTUSJCVUJPOT

    View Slide

  8. (SBQI$IBSBDUFSJTUJDT
    • heavy  tail (  both  in-‑degree  and  out-‑degree)
    • some  users  follow  hundreds  of  thousands  of  accounts
    • celebrities  choose  to  reciprocate  the  follows  for  their  fans
    • ”non-‑social”  behavior
    • individuals  can  only  maintaiin around  150  relationships
    • inconsistent  with  that  of  social  network
    • too  many  social  relationships  as  the  out-‑degrees
    %FHSFF%JTUSJCVUJPOT

    View Slide

  9. (SBQI$IBSBDUFSJTUJDT
    • strongly/weakly  connected
    • weakly  connected
    • connectivity  ignores  edge  direction
    • strongly  connected
    • a  pair  of  vertices  must  be  reachable  through  a  directed  path
    $POOFDUFE$PNQPOFOUT

    View Slide

  10. (SBQI$IBSBDUFSJTUJDT
    $POOFDUFE$PNQPOFOUT
    • weakly  connected  component
    • largest  =>  92.9%  of  all  active  users
    • strongly  connected  component
    • lagees =>  68.7%
    • less  than  Facebook  and  MSN  Messenger  (99%)
    • Twitter  graph  is  less  well  connected  as  social  network.

    View Slide

  11. (SBQI$IBSBDUFSJTUJDT
    • Shortest  Path  Lengths  :  the  number  of  traversals  along  
    edges  required  to  reach  one  from  another
    • Infeasible  to  identify  exact  shortest  path  lengths
    • Hyper  ANF  algorithm
    • probabilistic  estimation
    • HyperLogLog counter
    4IPSUFTU1BUI-FOHUIT

    View Slide

  12. (SBQI$IBSBDUFSJTUJDT
    4IPSUFTU1BUI-FOHUIT

    View Slide

  13. (SBQI$IBSBDUFSJTUJDT
    • Twitter  average
    • 4.17
    • mutual  =>  4.05
    • Other
    • Facebook  =>  4.74
    • MSN  messenger  =>  6.6  (mutual)
    • Twitter  follow  graph  exhibits  properties  that  are  consistent  
    with  a  social  network
    4IPSUFTU1BUI-FOHUIT

    View Slide

  14. (SBQI$IBSBDUFSJTUJDT
    • the  fraction  of  users  whose  friends  are  themselves  friends
    • Twitterʼ’s  clustering  coefficient  is  lower  than  Facebook,  but  
    still  high.
    $MVTUFSJOH$PFGGJDJFOU

    View Slide

  15. (SBQI$IBSBDUFSJTUJDT
    • idiosyncrasy  in  the  Japan  subgraph
    • higher  clustering  cofficient
    • reciprocity  is  much  higher
    • higher  edge  to  vertex  ratio
    • increase  at  degree  of  200  and  peaks  at  1000
    • (possible  explanation)  members  of  clique
    • Twitter  mutual  graph  exhibits  characteristics  that  are  
    consistent  with  a  social  networks
    $MVTUFSJOH$PFGGJDJFOU

    View Slide

  16. (SBQI$IBSBDUFSJTUJDT
    • set  of  vertices  that  are  neighbors  of  a  vertexʼ’s  neighbors
    • outbound/inbound
    • outbound  two-‑hop  neighborhood  characterizes
    =>  “information  gathering  potential”
    • inbound  two-‑hop  neighborhood  characterizes
    =>  “information  dissemination  potential”
    • unique/non-‑unique
    • unique  :  no  overlap
    • non-‑unique  :  simply  sum  of  two-‑hop  neighbourds
    5XP)PQ/FJHICPSIPPET

    View Slide

  17. (SBQI$IBSBDUFSJTUJDT
    5XP)PQ/FJHICPSIPPET
    Twitter behaves  efficiently  as  an  information  network.

    View Slide

  18. (SBQI$IBSBDUFSJTUJDT
    • preference  for  a  graphʼ’s  vertices  to  attach  to  others  that  are  
    similar  (or  disimilar)  in  degree
    • between  0.1  and  0.4  in  social  network
    • Facebook  :  0.226
    %FHSFF"TTPSUBUJWJUZ

    View Slide

  19. (SBQI$IBSBDUFSJTUJDT
    • SOD  vs.  DOD
    • positive  correlation  0.272
    • “the  more  people  you  follow,  the  more  people  that  those  people  are  
    likely  to  follow”
    • SID  vs.  DOD
    • positive  correlation  0.241
    • “the  more  popular  you  are,  the  people  you  follow  will  end  to  follow  
    more  people”
    %FHSFF"TTPSUBUJWJUZ

    View Slide

  20. (SBQI$IBSBDUFSJTUJDT
    • SOD  vs.  DID
    • negative  correlation  -‑0.118
    • “the  more  people  you  follow,  the  less  popular  those  people  are  
    likely  to  be”
    • SID  vs.  DID
    • negative  correlation  -‑0.296
    • “the  more  popular  you  are,  the  less  popular  the  people  you  follow  
    are”
    %FHSFF"TTPSUBUJWJUZ

    View Slide

  21. %JTTDVTTJPO
    • Twitter  behaves  more  like  an  information  network,  but  other  
    analyses  show  that  it  exhibits  characteristics  consistent  with  
    social  networks.
    • Twitter  starts  as  information  network,
    evolves  to  behave  more  like  a  social  network
    • New  user  choose  popular  accounts(preferential  attachment)
    • ↓
    • User  follows  more  people  and  become  more  ”experienced”,  
    the  user  discovers  a  community  with  which  to  engage  

    View Slide

  22. 'VUVSFXPSLBOE$PODMVTJPO
    • Summary
    • This  paper  present  evidence  that  Twitter  differs  from  previously-‑
    studied  social  networks  
    • Also  social  properties  as  well
    • Hypothesis
    • There  are  two  major  “modes”
    • Information  consumption
    • reciprocated  social  ties
    • Further  analyze  this  mixture  is  needed
    • Intuitive  level,  this  hybrid  structure  seems  to  be  plausible

    View Slide