Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Human-centric evaluation of similarity spaces of news articles

Human-centric evaluation of similarity spaces of news articles

In this paper we present a practical approach to evaluate similarity spaces of news articles, guided by human perception. This is motivated by applications that are expected by modern news audiences, most notably recommender systems. Our approach is laid out and contextualised with a brief background in human similarity measurement and perception. This is complimented with a discussion of computational methods for measuring similarity between news articles. We then go through a prototypical use of the evaluation in a practical setting before we point to future work enabled by this framework.
paper at http://ceur-ws.org/Vol-2411/paper10.pdf

Ben Fields

July 25, 2019
Tweet

More Decks by Ben Fields

Other Decks in Science

Transcript

  1. CLARA HIGUERA CABAÑES, MICHEL SCHAMMEL, SHIRLEY KA KEI YU, BEN FIELDS
    25 JULY 2019 NEWSIR WORKSHOP @SIGIR2019
    HUMAN-CENTRIC
    EVALUATION OF
    SIMILARITY SPACES
    OF NEWS ARTICLES https://www.flickr.com/photos/woolamaloo_gazette/47571470732/
    SLIDES: bit.ly/newsirbbc

    View full-size slide

  2. 2
    Article similarity can be an
    effective mean to recommend
    news to readers

    View full-size slide

  3. 3
    Problem: We need computed
    content similarity to match
    (mostly) people’s perception of
    news article similarity

    View full-size slide

  4. 4
    HOW DO HUMANS
    PERCEIVE SIMILARITY?

    View full-size slide

  5. 5
    Or rather: how can we efficiently
    measure the perception of
    similarity

    View full-size slide

  6. A proposed methodology:
    1. Gather a collection of anchor articles from your
    corpus. 

    2. For each anchor select two additional articles for
    comparison 

    3. Present each of these triplets in turn to a human
    evaluator asking the evaluator to decide which of
    the two articles is most similar to the anchor
    6
    TRIANGLE TESTS
    HUMAN PERCEPTION

    View full-size slide

  7. 7
    HOW CAN MACHINES
    COMPUTE SIMILARITY?

    View full-size slide

  8. 8
    COMPUTER READABLE REPRESENTATION
    MACHINE PERCEPTION
    Article read
    (a
    1
    ,a
    2
    a
    3
    ,a
    4
    ,…,a
    n
    )
    (b1
    ,b2
    b3
    ,b4
    ,…,bn
    ) (c1
    ,c2
    c3
    ,c4
    ,…,cn
    ) (d1
    ,d2
    d3
    ,d4
    ,…,dn
    )

    View full-size slide

  9. 9
    LATENT DIRICHLET ALLOCATION
    MACHINE PERCEPTION
    Docs 1 2 3 4 5 6 ...
    The Irish border
    Brexit backstop
    0.7 0 0 0 0.1 0
    Scotland to get AI
    health research
    centre
    0 0 0.9 0 0 0.1
    ...
    Topics
    Matrix of docs
    Topics
    1 2 3 4 5 6 ....
    brexit 0.6 0.3 0 0 0 0
    hospital 0 0 0.8 0.2 0 0
    ...
    Topics
    Matrix of topics
    Words
    Articles

    View full-size slide

  10. 10
    SIMILARITY MEASURES
    MACHINE PERCEPTION
    • Discrete probability distributions
    • Kullback-Leibler divergence or relative entropy
    • Information gain between distributions
    Docs 1 2 3 4 5 6 ...
    The Irish border Brexit
    backstop
    0.7 0 0.2 0 0.1 0
    Scotland to get AI
    health research centre
    0 0 0.9 0 0 0.1
    KL = 6.74
    KL pairwise distances
    Similar Different

    View full-size slide

  11. 11
    A PROTOTYPICAL CASE

    View full-size slide

  12. 12
    How can we measure alignment
    between humans and machines?

    View full-size slide

  13. 13
    RUNNING TRIANGLE TESTS
    PROTOTYPICAL CASE
    a2
    a3
    a4
    a5
    KL distribution of base article a1
    KL
    Which article is more
    similar to a1
    ? a2
    or a5
    Sample of 12 journalists

    View full-size slide

  14. 14
    RUNNING TRIANGLE TESTS
    PROTOTYPICAL CASE

    View full-size slide

  15. 50 topic model
    Average agreement: 71%
    % of answers aligned with algorithm per user
    15
    ALIGNMENT
    CASE STUDY
    Random chance: 0.516
    30 topic model
    Average agreement: 54%
    70 topic model
    Average agreement: 62%

    View full-size slide

  16. • Content similarity recommenders:
    Use LDA for automatic topic
    scoring pipeline
    • Potential in capturing alignment
    between human and machine
    perception
    • Tests could be scaled to a much
    larger population to more
    formally assess a similarity model
    16
    CONCLUSIONS AND FUTURE WORK

    View full-size slide

  17. THANKS!
    LET’S HAVE SOME QUESTIONS!
    CLARA HIGUERA CABAÑES, MICHEL SCHAMMEL, SHIRLEY KA KEI YU, BEN FIELDS
    25 JULY 2019 NEWSIR WORKSHOP @SIGIR2019
    HUMAN-CENTRIC
    EVALUATION OF
    SIMILARITY SPACES
    OF NEWS ARTICLES
    HTTP://CEUR-WS.ORG/VOL-2411/PAPER9.PDF
    SLIDES: bit.ly/newsirbbc

    View full-size slide