People in the loop machine learning: A case Study in news similarity

DR BEN FIELDS 21 NOVEMBER 2019 PEOPLE IN THE LOOP
MACHINE LEARNING: A CASE STUDY IN NEWS SIMILARITY https://www.ﬂickr.com/photos/woolamaloo_gazette/47571470732/ SLIDES: http://bit.ly/newssimbbc

2 Intro Machine Learning at the BBC Human vs Machine
Similarity Content-based News Recommenders Conclusions STRUCTURE

3 MACHINE LEARNING AT THE BBC

4 OVERVIEW ML AT THE BBC Audience data Content data
Audience-facing Internal-facing Audience segmentation Starfruit (autotagger) Mango (NER) Topic Segmentor Article Recommendation VoD Recommendation Kids App and Keyboard Content Origin Graph

•Other products of BBC use 3rd party solutions •Domain is
weird, our product expires! •ML aligned with BBC values ‣Inform, educate and entertain ‣Context of public service algorithm ‣Transparency •Keep editorial control of automated systems •Multiple language support 5 WHY BUILD IN HOUSE? ML AT THE BBC

6 ML AT THE BBC OWN IT

7 ML AT THE BBC OWN IT

8 ML AT THE BBC ENTITIES, TOPICS, AND THINGS, OH
MY! Starfruit Mango Named Entity Recogniser Autotagger BBC Things Linked Data Store and Ontology

MY!

11 ML AT THE BBC WORLD SERVICE NEWS RECS

14 HUMAN VS MACHINE SIMILARITY

15 Article similarity can be an eﬀective mean to recommend
news to readers

16 Problem: We need computed content similarity to match (mostly)
people’s perception of news article similarity

17 HOW DO HUMANS PERCEIVE SIMILARITY?

18 Or rather: how can we eﬃciently measure the perception
of similarity

A proposed methodology: 1. Gather a collection of anchor articles
from your corpus.   2. For each anchor select two additional articles for comparison   3. Present each of these triplets in turn to a human evaluator asking the evaluator to decide which of the two articles is most similar to the anchor 19 TRIANGLE TESTS HUMAN PERCEPTION

20 HOW CAN MACHINES COMPUTE SIMILARITY?

21 COMPUTER READABLE REPRESENTATION MACHINE PERCEPTION Article read (a 1
,a 2 a 3 ,a 4 ,…,a n ) (b1 ,b2 b3 ,b4 ,…,bn ) (c1 ,c2 c3 ,c4 ,…,cn ) (d1 ,d2 d3 ,d4 ,…,dn )

22 LATENT DIRICHLET ALLOCATION MACHINE PERCEPTION Docs 1 2 3
4 5 6 ... The Irish border Brexit backstop 0.7 0 0 0 0.1 0 Scotland to get AI health research centre 0 0 0.9 0 0 0.1 ... Topics Matrix of docs Topics 1 2 3 4 5 6 .... brexit 0.6 0.3 0 0 0 0 hospital 0 0 0.8 0.2 0 0 ... Topics Matrix of topics Words Articles

23 SIMILARITY MEASURES MACHINE PERCEPTION • Discrete probability distributions •
Kullback-Leibler divergence or relative entropy • Information gain between distributions Docs 1 2 3 4 5 6 ... The Irish border Brexit backstop 0.7 0 0.2 0 0.1 0 Scotland to get AI health research centre 0 0 0.9 0 0 0.1 KL = 6.74 KL pairwise distances Similar Diﬀerent

24 A PROTOTYPICAL CASE: CONTENT-BASED NEWS RECOMMENDERS

25 How can we measure alignment between humans and machines?

26 RUNNING TRIANGLE TESTS PROTOTYPICAL CASE a2 a3 a4 a5
KL distribution of base article a1 KL Which article is more similar to a1 ? a2 or a5 Sample of 12 journalists

27 RUNNING TRIANGLE TESTS PROTOTYPICAL CASE

50 topic model Average agreement: 71% % of answers aligned
with algorithm per user 28 ALIGNMENT CASE STUDY Random chance: 0.516 30 topic model Average agreement: 54% 70 topic model Average agreement: 62%

• Content similarity recommenders: Use LDA for automatic topic scoring
pipeline • Potential in capturing alignment between human and machine perception • Tests could be scaled to a much larger population to more formally assess a similarity model 29 CONCLUSIONS AND FUTURE WORK

THANKS! LET’S HAVE SOME QUESTIONS! DR BEN FIELDS PEOPLE IN
THE LOOP MACHINE LEARNING: A CASE STUDY IN NEWS SIMILARITY HTTP://CEUR-WS.ORG/VOL-2411/PAPER9.PDF SLIDES: bit.ly/newssimbbc HTTPS://PIRET.GITLAB.IO/FATREC2018/PROGRAM/FATREC2018-FIELDS.PDF HTTPS://WWW.BBC.CO.UK/THINGS/

People in the loop machine learning: A case St...

People in the loop machine learning: A case Study in news similarity

Ben Fields

More Decks by Ben Fields

Other Decks in Technology

Featured

Transcript

DR BEN FIELDS 21 NOVEMBER 2019 PEOPLE IN THE LOOP

2 Intro Machine Learning at the BBC Human vs Machine

3 MACHINE LEARNING AT THE BBC

4 OVERVIEW ML AT THE BBC Audience data Content data

•Other products of BBC use 3rd party solutions •Domain is

6 ML AT THE BBC OWN IT

7 ML AT THE BBC OWN IT

8 ML AT THE BBC ENTITIES, TOPICS, AND THINGS, OH

9 ML AT THE BBC ENTITIES, TOPICS, AND THINGS, OH

10 ML AT THE BBC ENTITIES, TOPICS, AND THINGS, OH

11 ML AT THE BBC WORLD SERVICE NEWS RECS

12 ML AT THE BBC WORLD SERVICE NEWS RECS

13 ML AT THE BBC WORLD SERVICE NEWS RECS

14 HUMAN VS MACHINE SIMILARITY

15 Article similarity can be an eﬀective mean to recommend

16 Problem: We need computed content similarity to match (mostly)

17 HOW DO HUMANS PERCEIVE SIMILARITY?

18 Or rather: how can we eﬃciently measure the perception

A proposed methodology: 1. Gather a collection of anchor articles

20 HOW CAN MACHINES COMPUTE SIMILARITY?

21 COMPUTER READABLE REPRESENTATION MACHINE PERCEPTION Article read (a 1

22 LATENT DIRICHLET ALLOCATION MACHINE PERCEPTION Docs 1 2 3

23 SIMILARITY MEASURES MACHINE PERCEPTION • Discrete probability distributions •

24 A PROTOTYPICAL CASE: CONTENT-BASED NEWS RECOMMENDERS

25 How can we measure alignment between humans and machines?

26 RUNNING TRIANGLE TESTS PROTOTYPICAL CASE a2 a3 a4 a5

27 RUNNING TRIANGLE TESTS PROTOTYPICAL CASE

50 topic model Average agreement: 71% % of answers aligned

• Content similarity recommenders: Use LDA for automatic topic scoring

THANKS! LET’S HAVE SOME QUESTIONS! DR BEN FIELDS PEOPLE IN