Save 37% off PRO during our Black Friday Sale! »

Human-centric evaluation of similarity spaces of news articles

Human-centric evaluation of similarity spaces of news articles

In this paper we present a practical approach to evaluate similarity spaces of news articles, guided by human perception. This is motivated by applications that are expected by modern news audiences, most notably recommender systems. Our approach is laid out and contextualised with a brief background in human similarity measurement and perception. This is complimented with a discussion of computational methods for measuring similarity between news articles. We then go through a prototypical use of the evaluation in a practical setting before we point to future work enabled by this framework.
paper at http://ceur-ws.org/Vol-2411/paper10.pdf

77cacd503a0b6641e3b7a2e26dbbbaa1?s=128

Ben Fields

July 25, 2019
Tweet

Transcript

  1. CLARA HIGUERA CABAÑES, MICHEL SCHAMMEL, SHIRLEY KA KEI YU, BEN

    FIELDS 25 JULY 2019 NEWSIR WORKSHOP @SIGIR2019 HUMAN-CENTRIC EVALUATION OF SIMILARITY SPACES OF NEWS ARTICLES https://www.flickr.com/photos/woolamaloo_gazette/47571470732/ SLIDES: bit.ly/newsirbbc
  2. 2 Article similarity can be an effective mean to recommend

    news to readers
  3. 3 Problem: We need computed content similarity to match (mostly)

    people’s perception of news article similarity
  4. 4 HOW DO HUMANS PERCEIVE SIMILARITY?

  5. 5 Or rather: how can we efficiently measure the perception

    of similarity
  6. A proposed methodology: 1. Gather a collection of anchor articles

    from your corpus. 
 2. For each anchor select two additional articles for comparison 
 3. Present each of these triplets in turn to a human evaluator asking the evaluator to decide which of the two articles is most similar to the anchor 6 TRIANGLE TESTS HUMAN PERCEPTION
  7. 7 HOW CAN MACHINES COMPUTE SIMILARITY?

  8. 8 COMPUTER READABLE REPRESENTATION MACHINE PERCEPTION Article read (a 1

    ,a 2 a 3 ,a 4 ,…,a n ) (b1 ,b2 b3 ,b4 ,…,bn ) (c1 ,c2 c3 ,c4 ,…,cn ) (d1 ,d2 d3 ,d4 ,…,dn )
  9. 9 LATENT DIRICHLET ALLOCATION MACHINE PERCEPTION Docs 1 2 3

    4 5 6 ... The Irish border Brexit backstop 0.7 0 0 0 0.1 0 Scotland to get AI health research centre 0 0 0.9 0 0 0.1 ... Topics Matrix of docs Topics 1 2 3 4 5 6 .... brexit 0.6 0.3 0 0 0 0 hospital 0 0 0.8 0.2 0 0 ... Topics Matrix of topics Words Articles
  10. 10 SIMILARITY MEASURES MACHINE PERCEPTION • Discrete probability distributions •

    Kullback-Leibler divergence or relative entropy • Information gain between distributions Docs 1 2 3 4 5 6 ... The Irish border Brexit backstop 0.7 0 0.2 0 0.1 0 Scotland to get AI health research centre 0 0 0.9 0 0 0.1 KL = 6.74 KL pairwise distances Similar Different
  11. 11 A PROTOTYPICAL CASE

  12. 12 How can we measure alignment between humans and machines?

  13. 13 RUNNING TRIANGLE TESTS PROTOTYPICAL CASE a2 a3 a4 a5

    KL distribution of base article a1 KL Which article is more similar to a1 ? a2 or a5 Sample of 12 journalists
  14. 14 RUNNING TRIANGLE TESTS PROTOTYPICAL CASE

  15. 50 topic model Average agreement: 71% % of answers aligned

    with algorithm per user 15 ALIGNMENT CASE STUDY Random chance: 0.516 30 topic model Average agreement: 54% 70 topic model Average agreement: 62%
  16. • Content similarity recommenders: Use LDA for automatic topic scoring

    pipeline • Potential in capturing alignment between human and machine perception • Tests could be scaled to a much larger population to more formally assess a similarity model 16 CONCLUSIONS AND FUTURE WORK
  17. THANKS! LET’S HAVE SOME QUESTIONS! CLARA HIGUERA CABAÑES, MICHEL SCHAMMEL,

    SHIRLEY KA KEI YU, BEN FIELDS 25 JULY 2019 NEWSIR WORKSHOP @SIGIR2019 HUMAN-CENTRIC EVALUATION OF SIMILARITY SPACES OF NEWS ARTICLES HTTP://CEUR-WS.ORG/VOL-2411/PAPER9.PDF SLIDES: bit.ly/newsirbbc