Upgrade to Pro — share decks privately, control downloads, hide ads and more …

People in the loop machine learning: A case Study in news similarity

Ben Fields
November 21, 2019

People in the loop machine learning: A case Study in news similarity

An overview of some of the currently active machine learning projects at the BBC and a deep dive into one of them: the content-based recommender systems used for article to article recommendations in World Service online news sites, in particular looking at how we align our similarity models with human opinion of what news articles are similar to one another.

Ben Fields

November 21, 2019
Tweet

More Decks by Ben Fields

Other Decks in Technology

Transcript

  1. DR BEN FIELDS
    21 NOVEMBER 2019
    PEOPLE IN THE
    LOOP MACHINE
    LEARNING:
    A CASE STUDY IN NEWS SIMILARITY https://www.flickr.com/photos/woolamaloo_gazette/47571470732/
    SLIDES: http://bit.ly/newssimbbc

    View Slide

  2. 2
    Intro
    Machine Learning at the BBC
    Human vs Machine Similarity
    Content-based News
    Recommenders
    Conclusions
    STRUCTURE

    View Slide

  3. 3
    MACHINE LEARNING AT
    THE BBC

    View Slide

  4. 4
    OVERVIEW
    ML AT THE BBC
    Audience data
    Content data
    Audience-facing
    Internal-facing
    Audience
    segmentation
    Starfruit
    (autotagger)
    Mango
    (NER)
    Topic
    Segmentor
    Article
    Recommendation
    VoD
    Recommendation
    Kids App
    and Keyboard
    Content Origin
    Graph

    View Slide

  5. •Other products of BBC use 3rd party
    solutions
    •Domain is weird, our product expires!
    •ML aligned with BBC values
    ‣Inform, educate and entertain
    ‣Context of public service algorithm
    ‣Transparency
    •Keep editorial control of automated systems
    •Multiple language support
    5
    WHY BUILD IN HOUSE?
    ML AT THE BBC

    View Slide

  6. 6
    ML AT THE BBC
    OWN IT

    View Slide

  7. 7
    ML AT THE BBC
    OWN IT

    View Slide

  8. 8
    ML AT THE BBC
    ENTITIES, TOPICS, AND THINGS, OH MY!
    Starfruit
    Mango
    Named Entity Recogniser Autotagger
    BBC Things
    Linked Data Store and Ontology

    View Slide

  9. 9
    ML AT THE BBC
    ENTITIES, TOPICS, AND THINGS, OH MY!

    View Slide

  10. 10
    ML AT THE BBC
    ENTITIES, TOPICS, AND THINGS, OH MY!

    View Slide

  11. 11
    ML AT THE BBC
    WORLD SERVICE NEWS RECS

    View Slide

  12. 12
    ML AT THE BBC
    WORLD SERVICE NEWS RECS

    View Slide

  13. 13
    ML AT THE BBC
    WORLD SERVICE NEWS RECS

    View Slide

  14. 14
    HUMAN VS MACHINE
    SIMILARITY

    View Slide

  15. 15
    Article similarity can be an
    effective mean to recommend
    news to readers

    View Slide

  16. 16
    Problem: We need computed
    content similarity to match
    (mostly) people’s perception of
    news article similarity

    View Slide

  17. 17
    HOW DO HUMANS
    PERCEIVE SIMILARITY?

    View Slide

  18. 18
    Or rather: how can we efficiently
    measure the perception of
    similarity

    View Slide

  19. A proposed methodology:
    1. Gather a collection of anchor articles from your
    corpus. 

    2. For each anchor select two additional articles for
    comparison 

    3. Present each of these triplets in turn to a human
    evaluator asking the evaluator to decide which of
    the two articles is most similar to the anchor
    19
    TRIANGLE TESTS
    HUMAN PERCEPTION

    View Slide

  20. 20
    HOW CAN MACHINES
    COMPUTE SIMILARITY?

    View Slide

  21. 21
    COMPUTER READABLE REPRESENTATION
    MACHINE PERCEPTION
    Article read
    (a
    1
    ,a
    2
    a
    3
    ,a
    4
    ,…,a
    n
    )
    (b1
    ,b2
    b3
    ,b4
    ,…,bn
    ) (c1
    ,c2
    c3
    ,c4
    ,…,cn
    ) (d1
    ,d2
    d3
    ,d4
    ,…,dn
    )

    View Slide

  22. 22
    LATENT DIRICHLET ALLOCATION
    MACHINE PERCEPTION
    Docs 1 2 3 4 5 6 ...
    The Irish border
    Brexit backstop
    0.7 0 0 0 0.1 0
    Scotland to get AI
    health research
    centre
    0 0 0.9 0 0 0.1
    ...
    Topics
    Matrix of docs
    Topics
    1 2 3 4 5 6 ....
    brexit 0.6 0.3 0 0 0 0
    hospital 0 0 0.8 0.2 0 0
    ...
    Topics
    Matrix of topics
    Words
    Articles

    View Slide

  23. 23
    SIMILARITY MEASURES
    MACHINE PERCEPTION
    • Discrete probability distributions
    • Kullback-Leibler divergence or relative entropy
    • Information gain between distributions
    Docs 1 2 3 4 5 6 ...
    The Irish border Brexit
    backstop
    0.7 0 0.2 0 0.1 0
    Scotland to get AI
    health research centre
    0 0 0.9 0 0 0.1
    KL = 6.74
    KL pairwise distances
    Similar Different

    View Slide

  24. 24
    A PROTOTYPICAL CASE:
    CONTENT-BASED NEWS
    RECOMMENDERS

    View Slide

  25. 25
    How can we measure alignment
    between humans and machines?

    View Slide

  26. 26
    RUNNING TRIANGLE TESTS
    PROTOTYPICAL CASE
    a2
    a3
    a4
    a5
    KL distribution of base article a1
    KL
    Which article is more
    similar to a1
    ? a2
    or a5
    Sample of 12 journalists

    View Slide

  27. 27
    RUNNING TRIANGLE TESTS
    PROTOTYPICAL CASE

    View Slide

  28. 50 topic model
    Average agreement: 71%
    % of answers aligned with algorithm per user
    28
    ALIGNMENT
    CASE STUDY
    Random chance: 0.516
    30 topic model
    Average agreement: 54%
    70 topic model
    Average agreement: 62%

    View Slide

  29. • Content similarity recommenders:
    Use LDA for automatic topic
    scoring pipeline
    • Potential in capturing alignment
    between human and machine
    perception
    • Tests could be scaled to a much
    larger population to more
    formally assess a similarity model
    29
    CONCLUSIONS AND FUTURE WORK

    View Slide

  30. THANKS!
    LET’S HAVE SOME QUESTIONS!
    DR BEN FIELDS
    PEOPLE IN THE LOOP
    MACHINE LEARNING:
    A CASE STUDY IN NEWS SIMILARITY
    HTTP://CEUR-WS.ORG/VOL-2411/PAPER9.PDF
    SLIDES: bit.ly/newssimbbc
    HTTPS://PIRET.GITLAB.IO/FATREC2018/PROGRAM/FATREC2018-FIELDS.PDF
    HTTPS://WWW.BBC.CO.UK/THINGS/

    View Slide