DR BEN FIELDS
21 NOVEMBER 2019
PEOPLE IN THE
LOOP MACHINE
LEARNING:
A CASE STUDY IN NEWS SIMILARITY https://www.flickr.com/photos/woolamaloo_gazette/47571470732/
SLIDES: http://bit.ly/newssimbbc
Slide 2
Slide 2 text
2
Intro
Machine Learning at the BBC
Human vs Machine Similarity
Content-based News
Recommenders
Conclusions
STRUCTURE
Slide 3
Slide 3 text
3
MACHINE LEARNING AT
THE BBC
Slide 4
Slide 4 text
4
OVERVIEW
ML AT THE BBC
Audience data
Content data
Audience-facing
Internal-facing
Audience
segmentation
Starfruit
(autotagger)
Mango
(NER)
Topic
Segmentor
Article
Recommendation
VoD
Recommendation
Kids App
and Keyboard
Content Origin
Graph
Slide 5
Slide 5 text
•Other products of BBC use 3rd party
solutions
•Domain is weird, our product expires!
•ML aligned with BBC values
‣Inform, educate and entertain
‣Context of public service algorithm
‣Transparency
•Keep editorial control of automated systems
•Multiple language support
5
WHY BUILD IN HOUSE?
ML AT THE BBC
Slide 6
Slide 6 text
6
ML AT THE BBC
OWN IT
Slide 7
Slide 7 text
7
ML AT THE BBC
OWN IT
Slide 8
Slide 8 text
8
ML AT THE BBC
ENTITIES, TOPICS, AND THINGS, OH MY!
Starfruit
Mango
Named Entity Recogniser Autotagger
BBC Things
Linked Data Store and Ontology
Slide 9
Slide 9 text
9
ML AT THE BBC
ENTITIES, TOPICS, AND THINGS, OH MY!
Slide 10
Slide 10 text
10
ML AT THE BBC
ENTITIES, TOPICS, AND THINGS, OH MY!
Slide 11
Slide 11 text
11
ML AT THE BBC
WORLD SERVICE NEWS RECS
Slide 12
Slide 12 text
12
ML AT THE BBC
WORLD SERVICE NEWS RECS
Slide 13
Slide 13 text
13
ML AT THE BBC
WORLD SERVICE NEWS RECS
Slide 14
Slide 14 text
14
HUMAN VS MACHINE
SIMILARITY
Slide 15
Slide 15 text
15
Article similarity can be an
effective mean to recommend
news to readers
Slide 16
Slide 16 text
16
Problem: We need computed
content similarity to match
(mostly) people’s perception of
news article similarity
Slide 17
Slide 17 text
17
HOW DO HUMANS
PERCEIVE SIMILARITY?
Slide 18
Slide 18 text
18
Or rather: how can we efficiently
measure the perception of
similarity
Slide 19
Slide 19 text
A proposed methodology:
1. Gather a collection of anchor articles from your
corpus.
2. For each anchor select two additional articles for
comparison
3. Present each of these triplets in turn to a human
evaluator asking the evaluator to decide which of
the two articles is most similar to the anchor
19
TRIANGLE TESTS
HUMAN PERCEPTION
22
LATENT DIRICHLET ALLOCATION
MACHINE PERCEPTION
Docs 1 2 3 4 5 6 ...
The Irish border
Brexit backstop
0.7 0 0 0 0.1 0
Scotland to get AI
health research
centre
0 0 0.9 0 0 0.1
...
Topics
Matrix of docs
Topics
1 2 3 4 5 6 ....
brexit 0.6 0.3 0 0 0 0
hospital 0 0 0.8 0.2 0 0
...
Topics
Matrix of topics
Words
Articles
Slide 23
Slide 23 text
23
SIMILARITY MEASURES
MACHINE PERCEPTION
• Discrete probability distributions
• Kullback-Leibler divergence or relative entropy
• Information gain between distributions
Docs 1 2 3 4 5 6 ...
The Irish border Brexit
backstop
0.7 0 0.2 0 0.1 0
Scotland to get AI
health research centre
0 0 0.9 0 0 0.1
KL = 6.74
KL pairwise distances
Similar Different
Slide 24
Slide 24 text
24
A PROTOTYPICAL CASE:
CONTENT-BASED NEWS
RECOMMENDERS
Slide 25
Slide 25 text
25
How can we measure alignment
between humans and machines?
Slide 26
Slide 26 text
26
RUNNING TRIANGLE TESTS
PROTOTYPICAL CASE
a2
a3
a4
a5
KL distribution of base article a1
KL
Which article is more
similar to a1
? a2
or a5
Sample of 12 journalists
Slide 27
Slide 27 text
27
RUNNING TRIANGLE TESTS
PROTOTYPICAL CASE
Slide 28
Slide 28 text
50 topic model
Average agreement: 71%
% of answers aligned with algorithm per user
28
ALIGNMENT
CASE STUDY
Random chance: 0.516
30 topic model
Average agreement: 54%
70 topic model
Average agreement: 62%
Slide 29
Slide 29 text
• Content similarity recommenders:
Use LDA for automatic topic
scoring pipeline
• Potential in capturing alignment
between human and machine
perception
• Tests could be scaled to a much
larger population to more
formally assess a similarity model
29
CONCLUSIONS AND FUTURE WORK
Slide 30
Slide 30 text
THANKS!
LET’S HAVE SOME QUESTIONS!
DR BEN FIELDS
PEOPLE IN THE LOOP
MACHINE LEARNING:
A CASE STUDY IN NEWS SIMILARITY
HTTP://CEUR-WS.ORG/VOL-2411/PAPER9.PDF
SLIDES: bit.ly/newssimbbc
HTTPS://PIRET.GITLAB.IO/FATREC2018/PROGRAM/FATREC2018-FIELDS.PDF
HTTPS://WWW.BBC.CO.UK/THINGS/