Upgrade to Pro — share decks privately, control downloads, hide ads and more …

People in the loop machine learning: A case Study in news similarity

Ben Fields
November 21, 2019

People in the loop machine learning: A case Study in news similarity

An overview of some of the currently active machine learning projects at the BBC and a deep dive into one of them: the content-based recommender systems used for article to article recommendations in World Service online news sites, in particular looking at how we align our similarity models with human opinion of what news articles are similar to one another.

Ben Fields

November 21, 2019
Tweet

More Decks by Ben Fields

Other Decks in Technology

Transcript

  1. DR BEN FIELDS 21 NOVEMBER 2019 PEOPLE IN THE LOOP

    MACHINE LEARNING: A CASE STUDY IN NEWS SIMILARITY https://www.flickr.com/photos/woolamaloo_gazette/47571470732/ SLIDES: http://bit.ly/newssimbbc
  2. 2 Intro Machine Learning at the BBC Human vs Machine

    Similarity Content-based News Recommenders Conclusions STRUCTURE
  3. 4 OVERVIEW ML AT THE BBC Audience data Content data

    Audience-facing Internal-facing Audience segmentation Starfruit (autotagger) Mango (NER) Topic Segmentor Article Recommendation VoD Recommendation Kids App and Keyboard Content Origin Graph
  4. •Other products of BBC use 3rd party solutions •Domain is

    weird, our product expires! •ML aligned with BBC values ‣Inform, educate and entertain ‣Context of public service algorithm ‣Transparency •Keep editorial control of automated systems •Multiple language support 5 WHY BUILD IN HOUSE? ML AT THE BBC
  5. 8 ML AT THE BBC ENTITIES, TOPICS, AND THINGS, OH

    MY! Starfruit Mango Named Entity Recogniser Autotagger BBC Things Linked Data Store and Ontology
  6. 16 Problem: We need computed content similarity to match (mostly)

    people’s perception of news article similarity
  7. A proposed methodology: 1. Gather a collection of anchor articles

    from your corpus. 
 2. For each anchor select two additional articles for comparison 
 3. Present each of these triplets in turn to a human evaluator asking the evaluator to decide which of the two articles is most similar to the anchor 19 TRIANGLE TESTS HUMAN PERCEPTION
  8. 21 COMPUTER READABLE REPRESENTATION MACHINE PERCEPTION Article read (a 1

    ,a 2 a 3 ,a 4 ,…,a n ) (b1 ,b2 b3 ,b4 ,…,bn ) (c1 ,c2 c3 ,c4 ,…,cn ) (d1 ,d2 d3 ,d4 ,…,dn )
  9. 22 LATENT DIRICHLET ALLOCATION MACHINE PERCEPTION Docs 1 2 3

    4 5 6 ... The Irish border Brexit backstop 0.7 0 0 0 0.1 0 Scotland to get AI health research centre 0 0 0.9 0 0 0.1 ... Topics Matrix of docs Topics 1 2 3 4 5 6 .... brexit 0.6 0.3 0 0 0 0 hospital 0 0 0.8 0.2 0 0 ... Topics Matrix of topics Words Articles
  10. 23 SIMILARITY MEASURES MACHINE PERCEPTION • Discrete probability distributions •

    Kullback-Leibler divergence or relative entropy • Information gain between distributions Docs 1 2 3 4 5 6 ... The Irish border Brexit backstop 0.7 0 0.2 0 0.1 0 Scotland to get AI health research centre 0 0 0.9 0 0 0.1 KL = 6.74 KL pairwise distances Similar Different
  11. 26 RUNNING TRIANGLE TESTS PROTOTYPICAL CASE a2 a3 a4 a5

    KL distribution of base article a1 KL Which article is more similar to a1 ? a2 or a5 Sample of 12 journalists
  12. 50 topic model Average agreement: 71% % of answers aligned

    with algorithm per user 28 ALIGNMENT CASE STUDY Random chance: 0.516 30 topic model Average agreement: 54% 70 topic model Average agreement: 62%
  13. • Content similarity recommenders: Use LDA for automatic topic scoring

    pipeline • Potential in capturing alignment between human and machine perception • Tests could be scaled to a much larger population to more formally assess a similarity model 29 CONCLUSIONS AND FUTURE WORK
  14. THANKS! LET’S HAVE SOME QUESTIONS! DR BEN FIELDS PEOPLE IN

    THE LOOP MACHINE LEARNING: A CASE STUDY IN NEWS SIMILARITY HTTP://CEUR-WS.ORG/VOL-2411/PAPER9.PDF SLIDES: bit.ly/newssimbbc HTTPS://PIRET.GITLAB.IO/FATREC2018/PROGRAM/FATREC2018-FIELDS.PDF HTTPS://WWW.BBC.CO.UK/THINGS/