Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Sentence Embeddings for Automated Factchecking PyData London 2018

Sentence Embeddings for Automated Factchecking PyData London 2018

Talk by Full Fact at PyData London

Lev Konstantinovskiy

April 28, 2018
Tweet

More Decks by Lev Konstantinovskiy

Other Decks in Technology

Transcript

  1. Sentence Embeddings for
    Automated Factchecking
    Lev Konstantinovskiy
    NLP engineer

    View Slide

  2. View Slide

  3. What we do
    1) Write factchecking articles with links to
    primary sources
    2) Corrections
    3) Help data producers and users

    View Slide

  4. Full Fact is the UK's independent factchecking
    charity
    We are…
    - Independent
    - Impartial
    - Effective

    View Slide

  5. We exist to get rid of this problem…
    “70% of laws are made in the EU”
    or
    “7% of laws are made in the EU”

    View Slide

  6. It can neatly sort the entire world into
    True and False.
    Machine Learning
    solves everything

    View Slide

  7. Machine Learning
    solves everything
    It can neatly sort the entire world into
    True and False.
    Not yet!
    Need Human-
    level AI

    View Slide

  8. We exist to get rid of this problem…
    “70% of laws are made in the EU”
    or
    “7% of laws are made in the EU”

    View Slide

  9. It’s complicated
    Re-inserting the shades of grey
    when people are making it out to be
    black and white

    View Slide

  10. Factcheckers should spend time on the
    difficult questions
    Automating Factchecking

    View Slide

  11. 1. Monitor
    2. Spot claims
    3. Check claims
    4. Publish
    Factchecking process

    View Slide

  12. 1. Monitor
    2. Spot claims
    3. Check claims
    4. Publish
    Factchecking process
    Automatable

    View Slide

  13. 1. Monitor
    2. Spot claims
    3. Check claims
    4. Publish
    Factchecking process
    Automatable
    only in some
    cases

    View Slide

  14. Task: “Spot new claims”
    Input: document
    Output: all the claims in it.

    View Slide

  15. Which TV show has the most
    claims?
    Prime Minister’s Questions
    Question Time
    The Andrew Marr Show
    Sunday Politics

    View Slide

  16. Manual annotation of political TV
    Percentage of sentences that are claims.
    Average over 3-6 episodes
    Prime Minister's
    Questions 32
    Question Time 18
    The Andrew Marr Show 16

    View Slide

  17. MIN
    Link
    Each dot represents
    an annotated
    episode
    Manual annotation of political TV
    Percentage of sentences that are claims
    MAX
    MEDIAN
    75%
    25%

    View Slide

  18. What is a claim?
    Quantity in the past or present
    “The government has spent £3bn on reorganisation of
    the NHS”
    But excluding Personal Experience
    “I am a qualified social worker, qualified for over 20 years.”
    (abridged definition)

    View Slide

  19. View Slide

  20. View Slide

  21. The data is small
    - 1000 positive examples
    - 4500 negative examples
    What to do?

    View Slide

  22. Transfer learning
    Credit:Sebastian Ruder Blog http://ruder.io/transfer-learning/

    View Slide

  23. Transfer learning in NLP
    - 2013 - 2017: Download pre-trained FastText,
    GloVe, word2vec word vectors, then average.
    - 2018: pre-trained sentence vectors

    View Slide

  24. InferSent in PyTorch
    Not just a paper. Also code & embeddings!

    View Slide

  25. View Slide

  26. Data that went into InferSent
    - Input layer is pre-trained GloVe on Common
    Crawl 840B words (unsupervised)
    - Trained on a Natural Language Inference tasks
    1M sentence pairs. (supervised)
    - InferSent + logistic evaluated on 10 transfer
    tasks.
    - Better than attention, CNN and other

    View Slide

  27. Natural Language Inference Task
    Given premise “Two dogs are running
    through a field”
    How likely is the hypothesis “There are
    animals outdoors”?
    E. Definitely True? N. Might be True?
    C. Definitely False?

    View Slide

  28. SNLI dataset of 570k pairs
    Photo caption (photo not shown)
    “Two dogs are running through a field”
    - Definitely True: “There are animals outdoors”
    - Might be true: “Some puppies are running to
    catch a stick”
    - Definitely False: “The pets are sitting on a
    couch”

    View Slide

  29. View Slide

  30. Bi-directional recurrent network
    Read text forward. Then read backwards.
    “How can we base our understanding of what
    we’ve heard on something that hasn’t been said
    yet? ...Sounds, words, and even whole
    sentences that at first mean nothing are found to
    make sense in the light of future context.”
    Alex Graves and Jurgen Schmidhuber, “Framewise Phoneme
    Classification with Bidirectional LSTM and Other Neural Network
    Architectures”, 2005

    View Slide

  31. Max-pooling
    - Sentence has 4 words.
    - Network produces 4
    vectors
    - How to produce only one
    fixed-width output
    vector?

    View Slide

  32. Max-pooling
    The movie was great
    3
    1
    5
    5
    8
    7
    1
    0
    1
    3
    0
    9
    MAX
    POOL
    5
    8
    9

    View Slide

  33. Max-pooling allows interpretation
    How many times did we pick the max from this
    word’s vector?
    Combine with logreg coefficients for pos/negative
    features

    View Slide

  34. Task: “Spot new claims”
    Input: document
    Output: all the claims in it.

    View Slide

  35. Precision and recall
    Model can make two types of mistakes.
    1) “Chocolate is great.” is a claim
    2) “350 million a week for the EU”
    is not a claim

    View Slide

  36. Precision and recall
    Model can make two types of mistakes.
    1) Precision mistake. Label something that is not a
    claim as a claim.
    WRONG: “Chocolate is great.” is a claim
    2) Recall mistake. Label something that is a claim as not
    a claim.
    WRONG: “350 million a week for the EU” is not a claim

    View Slide

  37. MIN
    MEDIAN
    MAX
    75%
    25%
    Link

    View Slide

  38. Infersent
    is better
    than just
    naive
    word
    vector
    averaging

    View Slide

  39. Gael Varoquaux's "Interpreting Cross Validation"
    Gael’s intervals Prec Recall
    Infersent 88-92 77-87
    Mean GloVe 86-90 71-81
    Overlap 88-90 77-81
    Positive class 1000 samples
    Negative class 4500 samples
    Evaluation on small data

    View Slide

  40. Code credit to the Full Fact data
    science volunteer community
    Benj Pettit, Data Scientist, Elsevier
    Michal Lopuszynski, Data Scientist, Warsaw University
    Oliver Price, Masters student, Warwick University
    Andreas Geroski, our best annotator
    Let’s talk if you are interested in volunteering

    View Slide

  41. • Transfer learning is really useful.
    • ML can make democracy better.
    • Not perfect, but better.
    • If you would like to help us, volunteer or
    donate at https://fullfact.org/donate/
    Conclusion

    View Slide

  42. View Slide

  43. Extra slides

    View Slide

  44. Vanishing Gradient
    Credit: Alex Graves’ PhD Thesis

    View Slide

  45. Credit: Alex Graves’ PhD Thesis
    Gradient Preserved by LSTM

    View Slide

  46. Reading: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
    Long Short-Term Memory Unit
    Image Credit: Alex’ Graves PhD Thesis

    View Slide

  47. Text credit:http://text2vec.org/glove.html

    View Slide

  48. Harnessing Cognitive Features for Sarcasm Detection

    View Slide