Human-centric evaluation of similarity spaces of news articles

CLARA HIGUERA CABAÑES, MICHEL SCHAMMEL, SHIRLEY KA KEI YU, BEN
FIELDS 25 JULY 2019 NEWSIR WORKSHOP @SIGIR2019 HUMAN-CENTRIC EVALUATION OF SIMILARITY SPACES OF NEWS ARTICLES https://www.ﬂickr.com/photos/woolamaloo_gazette/47571470732/ SLIDES: bit.ly/newsirbbc

2 Article similarity can be an eﬀective mean to recommend
news to readers

3 Problem: We need computed content similarity to match (mostly)
people’s perception of news article similarity

4 HOW DO HUMANS PERCEIVE SIMILARITY?

5 Or rather: how can we eﬃciently measure the perception
of similarity

A proposed methodology: 1. Gather a collection of anchor articles
from your corpus.   2. For each anchor select two additional articles for comparison   3. Present each of these triplets in turn to a human evaluator asking the evaluator to decide which of the two articles is most similar to the anchor 6 TRIANGLE TESTS HUMAN PERCEPTION

7 HOW CAN MACHINES COMPUTE SIMILARITY?

8 COMPUTER READABLE REPRESENTATION MACHINE PERCEPTION Article read (a 1
,a 2 a 3 ,a 4 ,…,a n ) (b1 ,b2 b3 ,b4 ,…,bn ) (c1 ,c2 c3 ,c4 ,…,cn ) (d1 ,d2 d3 ,d4 ,…,dn )

9 LATENT DIRICHLET ALLOCATION MACHINE PERCEPTION Docs 1 2 3
4 5 6 ... The Irish border Brexit backstop 0.7 0 0 0 0.1 0 Scotland to get AI health research centre 0 0 0.9 0 0 0.1 ... Topics Matrix of docs Topics 1 2 3 4 5 6 .... brexit 0.6 0.3 0 0 0 0 hospital 0 0 0.8 0.2 0 0 ... Topics Matrix of topics Words Articles

10 SIMILARITY MEASURES MACHINE PERCEPTION • Discrete probability distributions •
Kullback-Leibler divergence or relative entropy • Information gain between distributions Docs 1 2 3 4 5 6 ... The Irish border Brexit backstop 0.7 0 0.2 0 0.1 0 Scotland to get AI health research centre 0 0 0.9 0 0 0.1 KL = 6.74 KL pairwise distances Similar Diﬀerent

11 A PROTOTYPICAL CASE

12 How can we measure alignment between humans and machines?

13 RUNNING TRIANGLE TESTS PROTOTYPICAL CASE a2 a3 a4 a5
KL distribution of base article a1 KL Which article is more similar to a1 ? a2 or a5 Sample of 12 journalists

14 RUNNING TRIANGLE TESTS PROTOTYPICAL CASE

50 topic model Average agreement: 71% % of answers aligned
with algorithm per user 15 ALIGNMENT CASE STUDY Random chance: 0.516 30 topic model Average agreement: 54% 70 topic model Average agreement: 62%

• Content similarity recommenders: Use LDA for automatic topic scoring
pipeline • Potential in capturing alignment between human and machine perception • Tests could be scaled to a much larger population to more formally assess a similarity model 16 CONCLUSIONS AND FUTURE WORK

THANKS! LET’S HAVE SOME QUESTIONS! CLARA HIGUERA CABAÑES, MICHEL SCHAMMEL,
SHIRLEY KA KEI YU, BEN FIELDS 25 JULY 2019 NEWSIR WORKSHOP @SIGIR2019 HUMAN-CENTRIC EVALUATION OF SIMILARITY SPACES OF NEWS ARTICLES HTTP://CEUR-WS.ORG/VOL-2411/PAPER9.PDF SLIDES: bit.ly/newsirbbc

Human-centric evaluation of similarity spaces o...

Human-centric evaluation of similarity spaces of news articles

Ben Fields

More Decks by Ben Fields

Other Decks in Science

Featured

Transcript

CLARA HIGUERA CABAÑES, MICHEL SCHAMMEL, SHIRLEY KA KEI YU, BEN

2 Article similarity can be an eﬀective mean to recommend

3 Problem: We need computed content similarity to match (mostly)

4 HOW DO HUMANS PERCEIVE SIMILARITY?

5 Or rather: how can we eﬃciently measure the perception

A proposed methodology: 1. Gather a collection of anchor articles

7 HOW CAN MACHINES COMPUTE SIMILARITY?

8 COMPUTER READABLE REPRESENTATION MACHINE PERCEPTION Article read (a 1

9 LATENT DIRICHLET ALLOCATION MACHINE PERCEPTION Docs 1 2 3

10 SIMILARITY MEASURES MACHINE PERCEPTION • Discrete probability distributions •

11 A PROTOTYPICAL CASE

12 How can we measure alignment between humans and machines?

13 RUNNING TRIANGLE TESTS PROTOTYPICAL CASE a2 a3 a4 a5

14 RUNNING TRIANGLE TESTS PROTOTYPICAL CASE

50 topic model Average agreement: 71% % of answers aligned

• Content similarity recommenders: Use LDA for automatic topic scoring

THANKS! LET’S HAVE SOME QUESTIONS! CLARA HIGUERA CABAÑES, MICHEL SCHAMMEL,