Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Sentence Embeddings for Automated Factchecking PyData London 2018

Sentence Embeddings for Automated Factchecking PyData London 2018

Talk by Full Fact at PyData London

Lev Konstantinovskiy

April 28, 2018
Tweet

More Decks by Lev Konstantinovskiy

Other Decks in Technology

Transcript

  1. Sentence Embeddings for
    Automated Factchecking
    Lev Konstantinovskiy
    NLP engineer

    View full-size slide

  2. What we do
    1) Write factchecking articles with links to
    primary sources
    2) Corrections
    3) Help data producers and users

    View full-size slide

  3. Full Fact is the UK's independent factchecking
    charity
    We are…
    - Independent
    - Impartial
    - Effective

    View full-size slide

  4. We exist to get rid of this problem…
    “70% of laws are made in the EU”
    or
    “7% of laws are made in the EU”

    View full-size slide

  5. It can neatly sort the entire world into
    True and False.
    Machine Learning
    solves everything

    View full-size slide

  6. Machine Learning
    solves everything
    It can neatly sort the entire world into
    True and False.
    Not yet!
    Need Human-
    level AI

    View full-size slide

  7. We exist to get rid of this problem…
    “70% of laws are made in the EU”
    or
    “7% of laws are made in the EU”

    View full-size slide

  8. It’s complicated
    Re-inserting the shades of grey
    when people are making it out to be
    black and white

    View full-size slide

  9. Factcheckers should spend time on the
    difficult questions
    Automating Factchecking

    View full-size slide

  10. 1. Monitor
    2. Spot claims
    3. Check claims
    4. Publish
    Factchecking process

    View full-size slide

  11. 1. Monitor
    2. Spot claims
    3. Check claims
    4. Publish
    Factchecking process
    Automatable

    View full-size slide

  12. 1. Monitor
    2. Spot claims
    3. Check claims
    4. Publish
    Factchecking process
    Automatable
    only in some
    cases

    View full-size slide

  13. Task: “Spot new claims”
    Input: document
    Output: all the claims in it.

    View full-size slide

  14. Which TV show has the most
    claims?
    Prime Minister’s Questions
    Question Time
    The Andrew Marr Show
    Sunday Politics

    View full-size slide

  15. Manual annotation of political TV
    Percentage of sentences that are claims.
    Average over 3-6 episodes
    Prime Minister's
    Questions 32
    Question Time 18
    The Andrew Marr Show 16

    View full-size slide

  16. MIN
    Link
    Each dot represents
    an annotated
    episode
    Manual annotation of political TV
    Percentage of sentences that are claims
    MAX
    MEDIAN
    75%
    25%

    View full-size slide

  17. What is a claim?
    Quantity in the past or present
    “The government has spent £3bn on reorganisation of
    the NHS”
    But excluding Personal Experience
    “I am a qualified social worker, qualified for over 20 years.”
    (abridged definition)

    View full-size slide

  18. The data is small
    - 1000 positive examples
    - 4500 negative examples
    What to do?

    View full-size slide

  19. Transfer learning
    Credit:Sebastian Ruder Blog http://ruder.io/transfer-learning/

    View full-size slide

  20. Transfer learning in NLP
    - 2013 - 2017: Download pre-trained FastText,
    GloVe, word2vec word vectors, then average.
    - 2018: pre-trained sentence vectors

    View full-size slide

  21. InferSent in PyTorch
    Not just a paper. Also code & embeddings!

    View full-size slide

  22. Data that went into InferSent
    - Input layer is pre-trained GloVe on Common
    Crawl 840B words (unsupervised)
    - Trained on a Natural Language Inference tasks
    1M sentence pairs. (supervised)
    - InferSent + logistic evaluated on 10 transfer
    tasks.
    - Better than attention, CNN and other

    View full-size slide

  23. Natural Language Inference Task
    Given premise “Two dogs are running
    through a field”
    How likely is the hypothesis “There are
    animals outdoors”?
    E. Definitely True? N. Might be True?
    C. Definitely False?

    View full-size slide

  24. SNLI dataset of 570k pairs
    Photo caption (photo not shown)
    “Two dogs are running through a field”
    - Definitely True: “There are animals outdoors”
    - Might be true: “Some puppies are running to
    catch a stick”
    - Definitely False: “The pets are sitting on a
    couch”

    View full-size slide

  25. Bi-directional recurrent network
    Read text forward. Then read backwards.
    “How can we base our understanding of what
    we’ve heard on something that hasn’t been said
    yet? ...Sounds, words, and even whole
    sentences that at first mean nothing are found to
    make sense in the light of future context.”
    Alex Graves and Jurgen Schmidhuber, “Framewise Phoneme
    Classification with Bidirectional LSTM and Other Neural Network
    Architectures”, 2005

    View full-size slide

  26. Max-pooling
    - Sentence has 4 words.
    - Network produces 4
    vectors
    - How to produce only one
    fixed-width output
    vector?

    View full-size slide

  27. Max-pooling
    The movie was great
    3
    1
    5
    5
    8
    7
    1
    0
    1
    3
    0
    9
    MAX
    POOL
    5
    8
    9

    View full-size slide

  28. Max-pooling allows interpretation
    How many times did we pick the max from this
    word’s vector?
    Combine with logreg coefficients for pos/negative
    features

    View full-size slide

  29. Task: “Spot new claims”
    Input: document
    Output: all the claims in it.

    View full-size slide

  30. Precision and recall
    Model can make two types of mistakes.
    1) “Chocolate is great.” is a claim
    2) “350 million a week for the EU”
    is not a claim

    View full-size slide

  31. Precision and recall
    Model can make two types of mistakes.
    1) Precision mistake. Label something that is not a
    claim as a claim.
    WRONG: “Chocolate is great.” is a claim
    2) Recall mistake. Label something that is a claim as not
    a claim.
    WRONG: “350 million a week for the EU” is not a claim

    View full-size slide

  32. MIN
    MEDIAN
    MAX
    75%
    25%
    Link

    View full-size slide

  33. Infersent
    is better
    than just
    naive
    word
    vector
    averaging

    View full-size slide

  34. Gael Varoquaux's "Interpreting Cross Validation"
    Gael’s intervals Prec Recall
    Infersent 88-92 77-87
    Mean GloVe 86-90 71-81
    Overlap 88-90 77-81
    Positive class 1000 samples
    Negative class 4500 samples
    Evaluation on small data

    View full-size slide

  35. Code credit to the Full Fact data
    science volunteer community
    Benj Pettit, Data Scientist, Elsevier
    Michal Lopuszynski, Data Scientist, Warsaw University
    Oliver Price, Masters student, Warwick University
    Andreas Geroski, our best annotator
    Let’s talk if you are interested in volunteering

    View full-size slide

  36. • Transfer learning is really useful.
    • ML can make democracy better.
    • Not perfect, but better.
    • If you would like to help us, volunteer or
    donate at https://fullfact.org/donate/
    Conclusion

    View full-size slide

  37. Extra slides

    View full-size slide

  38. Vanishing Gradient
    Credit: Alex Graves’ PhD Thesis

    View full-size slide

  39. Credit: Alex Graves’ PhD Thesis
    Gradient Preserved by LSTM

    View full-size slide

  40. Reading: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
    Long Short-Term Memory Unit
    Image Credit: Alex’ Graves PhD Thesis

    View full-size slide

  41. Text credit:http://text2vec.org/glove.html

    View full-size slide

  42. Harnessing Cognitive Features for Sarcasm Detection

    View full-size slide