Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Sentence Embeddings for Automated Factchecking PyData London 2018

Sentence Embeddings for Automated Factchecking PyData London 2018

Talk by Full Fact at PyData London

Lev Konstantinovskiy

April 28, 2018
Tweet

More Decks by Lev Konstantinovskiy

Other Decks in Technology

Transcript

  1. What we do 1) Write factchecking articles with links to

    primary sources 2) Corrections 3) Help data producers and users
  2. We exist to get rid of this problem… “70% of

    laws are made in the EU” or “7% of laws are made in the EU”
  3. It can neatly sort the entire world into True and

    False. Machine Learning solves everything
  4. Machine Learning solves everything It can neatly sort the entire

    world into True and False. Not yet! Need Human- level AI
  5. We exist to get rid of this problem… “70% of

    laws are made in the EU” or “7% of laws are made in the EU”
  6. 1. Monitor 2. Spot claims 3. Check claims 4. Publish

    Factchecking process Automatable
  7. 1. Monitor 2. Spot claims 3. Check claims 4. Publish

    Factchecking process Automatable only in some cases
  8. Which TV show has the most claims? Prime Minister’s Questions

    Question Time The Andrew Marr Show Sunday Politics
  9. Manual annotation of political TV Percentage of sentences that are

    claims. Average over 3-6 episodes Prime Minister's Questions 32 Question Time 18 The Andrew Marr Show 16
  10. MIN Link Each dot represents an annotated episode Manual annotation

    of political TV Percentage of sentences that are claims MAX MEDIAN 75% 25%
  11. What is a claim? Quantity in the past or present

    “The government has spent £3bn on reorganisation of the NHS” But excluding Personal Experience “I am a qualified social worker, qualified for over 20 years.” (abridged definition)
  12. Transfer learning in NLP - 2013 - 2017: Download pre-trained

    FastText, GloVe, word2vec word vectors, then average. - 2018: pre-trained sentence vectors
  13. Data that went into InferSent - Input layer is pre-trained

    GloVe on Common Crawl 840B words (unsupervised) - Trained on a Natural Language Inference tasks 1M sentence pairs. (supervised) - InferSent + logistic evaluated on 10 transfer tasks. - Better than attention, CNN and other
  14. Natural Language Inference Task Given premise “Two dogs are running

    through a field” How likely is the hypothesis “There are animals outdoors”? E. Definitely True? N. Might be True? C. Definitely False?
  15. SNLI dataset of 570k pairs Photo caption (photo not shown)

    “Two dogs are running through a field” - Definitely True: “There are animals outdoors” - Might be true: “Some puppies are running to catch a stick” - Definitely False: “The pets are sitting on a couch”
  16. Bi-directional recurrent network Read text forward. Then read backwards. “How

    can we base our understanding of what we’ve heard on something that hasn’t been said yet? ...Sounds, words, and even whole sentences that at first mean nothing are found to make sense in the light of future context.” Alex Graves and Jurgen Schmidhuber, “Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures”, 2005
  17. Max-pooling - Sentence has 4 words. - Network produces 4

    vectors - How to produce only one fixed-width output vector?
  18. Max-pooling The movie was great 3 1 5 5 8

    7 1 0 1 3 0 9 MAX POOL 5 8 9
  19. Max-pooling allows interpretation How many times did we pick the

    max from this word’s vector? Combine with logreg coefficients for pos/negative features
  20. Precision and recall Model can make two types of mistakes.

    1) “Chocolate is great.” is a claim 2) “350 million a week for the EU” is not a claim
  21. Precision and recall Model can make two types of mistakes.

    1) Precision mistake. Label something that is not a claim as a claim. WRONG: “Chocolate is great.” is a claim 2) Recall mistake. Label something that is a claim as not a claim. WRONG: “350 million a week for the EU” is not a claim
  22. Gael Varoquaux's "Interpreting Cross Validation" Gael’s intervals Prec Recall Infersent

    88-92 77-87 Mean GloVe 86-90 71-81 Overlap 88-90 77-81 Positive class 1000 samples Negative class 4500 samples Evaluation on small data
  23. Code credit to the Full Fact data science volunteer community

    Benj Pettit, Data Scientist, Elsevier Michal Lopuszynski, Data Scientist, Warsaw University Oliver Price, Masters student, Warwick University Andreas Geroski, our best annotator Let’s talk if you are interested in volunteering
  24. • Transfer learning is really useful. • ML can make

    democracy better. • Not perfect, but better. • If you would like to help us, volunteer or donate at https://fullfact.org/donate/ Conclusion