Upgrade to Pro — share decks privately, control downloads, hide ads and more …

HypTrails: A Bayesian Approach for Comparing Hypotheses about Human Trails on the Web

HypTrails: A Bayesian Approach for Comparing Hypotheses about Human Trails on the Web

Talk about HypTrails at WWW 2016

Philipp Singer

April 20, 2016
Tweet

More Decks by Philipp Singer

Other Decks in Science

Transcript

  1. GESIS  -­‐  Leibniz  Ins.tute  for  the  Social  Sciences   HypTrails:

     A  Bayesian  Approach  for  Comparing   Hypotheses  about  Human  Trails  on  the  Web   Philipp  Singer,  Denis  Helic,  Andreas  Hotho   and  Markus  Strohmaier   www.philippsinger.info/hyptrails    
  2. Vannevar  Bush   2   15.05.15   HypTrails  -­‐  Philipp

     Singer   image courtesy of brucesterling on Flickr Bush, V. (1945). As we may think. The Atlantic Monthly, 176(1):101– 108. Bush, V. (1945). As we may think. The Atlantic Monthly, 176(1): 101– 108. “[The  human  brain]  operates  by  associa5on.     With  one  item  in  its  grasp,  it  snaps  instantly  to  the    next  that  is  suggested  by  the  associa5on  of  thoughts.”  
  3. Human  trails  on  the  Web   15.05.15   HypTrails  -­‐

     Philipp  Singer   3   image courtesy of user Mmxx on Wikipedia
  4. Human  trails  on  the  Web   18.05.15   HypTrails  -­‐

     Philipp  Singer   4   image courtesy of user Mmxx on Wikipedia ?   ?   ?   ?   ?   What  are  the  mechanisms   producing  human  trails  on   the  Web?  
  5. Example:  Human  navigaRonal  trails   •  Humans  prefer  to  navigate

     …   –  H1:  over  semanRcally  similar  websites     –  H2:  via  self-­‐loops  (e.g.,  refreshing)     –  H3:  by  using  the  structural  link  network   –  H4:  by  preferring  similar  categories   –  H5:  by  uRlizing  structural  properRes   –  H6:  by  informaRon  scent     [West  et  al.  IJCAI  2009],  [Singer  et  al.  IJSWIS  2013],  [West  &  Leskovec  WWW  2012],  [Chi  et  al.  CHI   2001]   18.05.15   HypTrails  -­‐  Philipp  Singer   5  
  6. Example:  Human  navigaRonal  trails   •  Humans  prefer  to  navigate

     …   –  H1:  over  semanRcally  similar  websites     –  H2:  via  self-­‐loops  (e.g.,  refreshing)     –  H3:  by  using  the  structural  link  network   –  H4:  by  preferring  similar  categories   –  H5:  by  uRlizing  structural  properRes   –  H6:  by  informaRon  scent     [West  et  al.  IJCAI  2009],  [Singer  et  al.  IJSWIS  2013],  [West  &  Leskovec  WWW  2012],  [Chi  et  al.  CHI   2001]   18.05.15   HypTrails  -­‐  Philipp  Singer   6   What  is  the  relaRve   plausibility  of  these   hypotheses  given  data?  
  7. HypTrails  in  a  nutshell   •  Goal:  Express  and  compare

     hypotheses  about  human  trails   in  a  coherent  research  approach     •  Method:     –  First-­‐order  Markov  chain  model   –  Bayesian  inference     •  Idea:     –  Incorporate  hypotheses  as  priors   –  URlize  sensiRvity  of  marginal  likelihood  on  the  prior   •  Outcome:  ParRal  ordering  of  hypotheses   15.05.15   HypTrails  -­‐  Philipp  Singer   7  
  8. Markov  chain  model   •  StochasRc  model   •  TransiRon

     probabiliRes  between  states   15.05.15   HypTrails  -­‐  Philipp  Singer   8   0 B B B @ p1,1 p1,2 . . . p1,j p2,1 p2,2 . . . p2,j . . . . . . ... . . . pi,1 pi,2 . . . pi,j 1 C C C A S1   S2   S3   1/2   1/2   1/3   2/3   1  
  9. Structure  of  HypTrails   16.05.15   HypTrails  -­‐  Philipp  Singer

      14   MC  Model   Hypothesis   (H1)   Belief  in  parameters  
  10. Which  hypothesis  is     the  most  plausible  one?  

    15.05.15   HypTrails  -­‐  Philipp  Singer   15  
  11. Bayesian  model  comparison:   Marginal  likelihood   15.05.15   HypTrails

     -­‐  Philipp  Singer   16   Probability  of  data  given  hypothesis   =  Model  evidence  
  12. Bayesian  model  comparison:   Marginal  likelihood   15.05.15   HypTrails

     -­‐  Philipp  Singer   17   Probability  of  data  given  hypothesis   Model  evidence   Parameters  are  marginalized  out     Probability  of  observing  data   given  parameters  and  hypothesis  
  13. Bayesian  model  comparison:   Marginal  likelihood   15.05.15   HypTrails

     -­‐  Philipp  Singer   18   Probability  of  data  given  hypothesis   Model  evidence   Parameters  are  marginalized  out     Probability  of  observing  data   given  parameters  and  hypothesis   Probability  of  parameters   before  observing  data  
  14. Bayesian  model  comparison:   Marginal  likelihood   15.05.15   HypTrails

     -­‐  Philipp  Singer   19   Probability  of  data  given  hypothesis   Model  evidence   Parameters  are  marginalized  out     Probability  of  observing  data   given  parameters  and  hypothesis   Probability  of  parameters   before  observing  data   Hypothesis  
  15. Structure  of  HypTrails   16.05.15   HypTrails  -­‐  Philipp  Singer

      20   MC  Model   Hypothesis   (H1)   Belief  in  parameters   Prior  (H1)   ElicitaRon   Data  (Trails)   Marginal   likelihood  (H1)   Influence   Influence  
  16. EliciRng  priors   •  (Trial)  roulefe  method     20.05.15

      HypTrails  -­‐  Philipp  Singer   22  
  17. •  (Trial)  roulefe  method     EliciRng  priors   20.05.15

      HypTrails  -­‐  Philipp  Singer   23  
  18. •  (Trial)  roulefe  method     Prior  distribuRon   EliciRng

     priors   20.05.15   HypTrails  -­‐  Philipp  Singer   24  
  19. Conjugate  Dirichlet  prior     •  Hyperparameters  à  pseudo  counts

      15.05.15   HypTrails  -­‐  Philipp  Singer   25   0 B B B @ p1,1 p1,2 . . . p1,j p2,1 p2,2 . . . p2,j . . . . . . ... . . . pi,1 pi,2 . . . pi,j 1 C C C A MC  parameters   0 B B B @ ↵1,1 ↵1,2 . . . ↵1,j ↵2,1 ↵2,2 . . . ↵2,j . . . . . . ... . . . ↵i,1 ↵i,2 . . . ↵i,j 1 C C C A Dirichlet  hyperparameters  
  20. EliciRng  priors  from  hypotheses     about  human  trails  

    •  AdapRon  of  (trial)  roulefe  method   15.05.15   HypTrails  -­‐  Philipp  Singer   26   #Chips  =  k   Strength  of  hypothesis     k  =  18  
  21. EliciRng  priors  from  hypotheses     about  human  trails  

    •  AdapRon  of  (trial)  roulefe  method   16.05.15   HypTrails  -­‐  Philipp  Singer   27   #Chips  =  k   Strength  of  hypothesis     k  =  18   à  Dirichlet  hyperparameters  
  22. Example:  Structural  hypothesis     19.05.15   HypTrails  -­‐  Philipp

     Singer   29   α i 1 2 3 α i 1 2 3 nr. of chips 1 2 3 0.00 0.33 0.00 h3 1.00 0.33 1.00 h1 0.00 0.33 0.00 h1 h3 h1 h1 α i 1 2 3 α i 1 2 3 nr. of chips 1 2 3 0.00 0.99 0.00 h3 3.01 0.99 3.01 h1 0.00 0.99 0.00 h1 h3 h1 h1 α i 1 2 3 α i 1 2 3 nr. of chips 1 2 3 0.00 0.99 0.00 h3 0.01 0.99 0.01 h1 0.00 0.99 0.00 h1 h3 h1 h1 Input   Hypothesis   Output   Dirichlet  prior  
  23. Example:  Structural  hypothesis     15.05.15   HypTrails  -­‐  Philipp

     Singer   30   α i 1 2 3 α i 1 2 3 nr. of chips 1 2 3 0.00 0.33 0.00 h3 1.00 0.33 1.00 h1 0.00 0.33 0.00 h1 h3 h1 h1 α i 1 2 3 α i 1 2 3 nr. of chips 1 2 3 0.00 0.99 0.00 h3 3.01 0.99 3.01 h1 0.00 0.99 0.00 h1 h3 h1 h1 α i 1 2 3 α i 1 2 3 nr. of chips 1 2 3 0.00 0.99 0.00 h3 0.01 0.99 0.01 h1 0.00 0.99 0.00 h1 h3 h1 h1
  24. Example:  Structural  hypothesis     19.05.15   HypTrails  -­‐  Philipp

     Singer   31   α i 1 2 3 α i 1 2 3 nr. of chips 1 2 3 0.00 0.33 0.00 h3 1.00 0.33 1.00 h1 0.00 0.33 0.00 h1 h3 h1 h1 α i 1 2 3 α i 1 2 3 nr. of chips 1 2 3 0.00 0.99 0.00 h3 3.01 0.99 3.01 h1 0.00 0.99 0.00 h1 h3 h1 h1 α i 1 2 3 α i 1 2 3 nr. of chips 1 2 3 0.00 0.99 0.00 h3 0.01 0.99 0.01 h1 0.00 0.99 0.00 h1 h3 h1 h1
  25. Example:  Structural  hypothesis     19.05.15   HypTrails  -­‐  Philipp

     Singer   32   α i 1 2 3 α i 1 2 3 nr. of chips 1 2 3 0.00 0.33 0.00 h3 1.00 0.33 1.00 h1 0.00 0.33 0.00 h1 h3 h1 h1 α i 1 2 3 α i 1 2 3 nr. of chips 1 2 3 0.00 0.99 0.00 h3 3.01 0.99 3.01 h1 0.00 0.99 0.00 h1 h3 h1 h1 α i 1 2 3 α i 1 2 3 nr. of chips 1 2 3 0.00 0.99 0.00 h3 0.01 0.99 0.01 h1 0.00 0.99 0.00 h1 h3 h1 h1
  26. Structure  of  HypTrails   16.05.15   HypTrails  -­‐  Philipp  Singer

      33   MC  Model   Hypothesis   (H1)   Prior  (H1)   Data  (Trails)   Marginal   likelihood  (H1)   Hypothesis   (H2)   Prior  (H2)   Marginal   likelihood  (H2)   Compare  
  27. DemonstraRon  of  general  applicability   •  SyntheRc  data    

    •  Human  song  trails  (Last.fm)   •  Human  review  trails  (Yelp)   •  Human  naviga.on  trails  (Wikigame)   15.05.15   HypTrails  -­‐  Philipp  Singer   34  
  28. Wikigame   15.05.15   HypTrails  -­‐  Philipp  Singer   35

      0 1 2 3 4 hypothesis weighting factor k −1.40 −1.35 −1.30 −1.25 −1.20 −1.15 −1.10 −1.05 −1.00 −0.95 evidence 1e8 uniform self-loop structural similarity Higher   plausibility   Higher  belief   (more  chips)  
  29. Wikigame   15.05.15   HypTrails  -­‐  Philipp  Singer   36

      0 1 2 3 4 hypothesis weighting factor k −1.40 −1.35 −1.30 −1.25 −1.20 −1.15 −1.10 −1.05 −1.00 −0.95 evidence 1e8 uniform self-loop structural similarity
  30. Summary   •  Studying  mechanisms  producing  human  trails   • 

    HypTrails:  A  coherent  approach  for  expressing  and     comparing  hypotheses  about  human  trails   •  Can  be  applied  to  all  kinds  of  human  trails   •  ImplementaRons:  www.philippsinger.info/hyptrails   19.05.15   HypTrails  -­‐  Philipp  Singer   37  
  31. GESIS  -­‐  Leibniz  Ins.tute  for  the  Social  Sciences   for

     your  afenRon!     @ph_singer   www.philippsinger.info   T   H   A   N   K   S   www.philippsinger.info/hyptrails    
  32. References  1/2   •  [West  et  al.  WWW  2015]  

      –  Robert  West,  Ashwin  Paranjape,  and  Jure  Leskovec:  Mining  Missing  Hyperlinks  from  Human   NavigaRon  Traces:  A  Case  Study  of  Wikipedia.  24th  InternaRonal  World  Wide  Web  Conference   (WWW'15),  Florence,  Italy,  2015.   •  [De  Choudhury  et  al.  HT  2010]   –  De  Choudhury,  Munmun  and  Feldman,  Moran  and  Amer-­‐Yahia,  Sihem  and  Golbandi,  Nadav  and   Lempel,  Ronny  and  Yu,  Cong:  AutomaRc  construcRon  of  travel  iRneraries  using  social  breadcrumbs.   21st  ACM  conference  on  Hypertext  and  hypermedia,  2010.   •  [Bestavros  CIKM  1995]   –  Bestavros,  Azer:  Using  speculaRon  to  reduce  server  load  and  service  Rme  on  the  WWW.”  4th  InternaRonal   conference  on  InformaRon  and  knowledge  management.  1995.   •  [Perkowitz  IJCAI  1997]   –  Perkowitz,  Mike,  and  Oren  Etzioni:  AdapRve  web  sites:  an  AI  challenge.  15th  internaRonal   joint  conference  on  ArRfical  intelligence.  1997.   •  [West  et  al.  IJCAI  2009]     –  West,  Robert,  Joelle  Pineau,  and  Doina  Precup.  "Wikispeedia:  An  Online  Game  for  Inferring  SemanRc   Distances  between  Concepts."  IJCAI.  2009.   15.05.15   HypTrails  -­‐  Philipp  Singer   39  
  33. References  2/2   •  [Singer  et  al.  IJSWIS  2013]  

    –  Philipp  Singer,  Thomas  Niebler,  Markus  Strohmaier  and  Andreas  Hotho,  CompuRng  SemanRc   Relatedness  from  Human  NavigaRonal  Paths:  A  Case  Study  on  Wikipedia,  InternaRonal  Journal  on   SemanRc  Web  and  InformaRon  Systems  (IJSWIS),  vol  9(4),  41-­‐70,  2013   •  [West  &  Leskovec  WWW  2012]   –  Robert  West  and  Jure  Leskovec:  Human  Wayfinding  in  InformaRon  Networks  21st  InternaRonal   World  Wide  Web  Conference  (WWW'12),  pp.  619–628,  Lyon,  France,  2012.   •  [Chi  et  al.  CHI  2001]   –  Chi,  Ed  H.,  et  al.  "Using  informaRon  scent  to  model  user  informaRon  needs  and  acRons  and  the   Web."  Proceedings  of  the  SIGCHI  conference  on  Human  factors  in  compuRng  systems.  ACM,  2001.   15.05.15   HypTrails  -­‐  Philipp  Singer   40