Save 37% off PRO during our Black Friday Sale! »

Sentence Embeddings for Automated Factchecking PyData London 2018

Sentence Embeddings for Automated Factchecking PyData London 2018

Talk by Full Fact at PyData London

39368910dbd6371b507e0b2113dcf4fe?s=128

Lev Konstantinovskiy

April 28, 2018
Tweet

Transcript

  1. Sentence Embeddings for Automated Factchecking Lev Konstantinovskiy NLP engineer

  2. None
  3. What we do 1) Write factchecking articles with links to

    primary sources 2) Corrections 3) Help data producers and users
  4. Full Fact is the UK's independent factchecking charity We are…

    - Independent - Impartial - Effective
  5. We exist to get rid of this problem… “70% of

    laws are made in the EU” or “7% of laws are made in the EU”
  6. It can neatly sort the entire world into True and

    False. Machine Learning solves everything
  7. Machine Learning solves everything It can neatly sort the entire

    world into True and False. Not yet! Need Human- level AI
  8. We exist to get rid of this problem… “70% of

    laws are made in the EU” or “7% of laws are made in the EU”
  9. It’s complicated Re-inserting the shades of grey when people are

    making it out to be black and white
  10. Factcheckers should spend time on the difficult questions Automating Factchecking

  11. 1. Monitor 2. Spot claims 3. Check claims 4. Publish

    Factchecking process
  12. 1. Monitor 2. Spot claims 3. Check claims 4. Publish

    Factchecking process Automatable
  13. 1. Monitor 2. Spot claims 3. Check claims 4. Publish

    Factchecking process Automatable only in some cases
  14. Task: “Spot new claims” Input: document Output: all the claims

    in it.
  15. Which TV show has the most claims? Prime Minister’s Questions

    Question Time The Andrew Marr Show Sunday Politics
  16. Manual annotation of political TV Percentage of sentences that are

    claims. Average over 3-6 episodes Prime Minister's Questions 32 Question Time 18 The Andrew Marr Show 16
  17. MIN Link Each dot represents an annotated episode Manual annotation

    of political TV Percentage of sentences that are claims MAX MEDIAN 75% 25%
  18. What is a claim? Quantity in the past or present

    “The government has spent £3bn on reorganisation of the NHS” But excluding Personal Experience “I am a qualified social worker, qualified for over 20 years.” (abridged definition)
  19. None
  20. None
  21. The data is small - 1000 positive examples - 4500

    negative examples What to do?
  22. Transfer learning Credit:Sebastian Ruder Blog http://ruder.io/transfer-learning/

  23. Transfer learning in NLP - 2013 - 2017: Download pre-trained

    FastText, GloVe, word2vec word vectors, then average. - 2018: pre-trained sentence vectors
  24. InferSent in PyTorch Not just a paper. Also code &

    embeddings!
  25. None
  26. Data that went into InferSent - Input layer is pre-trained

    GloVe on Common Crawl 840B words (unsupervised) - Trained on a Natural Language Inference tasks 1M sentence pairs. (supervised) - InferSent + logistic evaluated on 10 transfer tasks. - Better than attention, CNN and other
  27. Natural Language Inference Task Given premise “Two dogs are running

    through a field” How likely is the hypothesis “There are animals outdoors”? E. Definitely True? N. Might be True? C. Definitely False?
  28. SNLI dataset of 570k pairs Photo caption (photo not shown)

    “Two dogs are running through a field” - Definitely True: “There are animals outdoors” - Might be true: “Some puppies are running to catch a stick” - Definitely False: “The pets are sitting on a couch”
  29. None
  30. Bi-directional recurrent network Read text forward. Then read backwards. “How

    can we base our understanding of what we’ve heard on something that hasn’t been said yet? ...Sounds, words, and even whole sentences that at first mean nothing are found to make sense in the light of future context.” Alex Graves and Jurgen Schmidhuber, “Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures”, 2005
  31. Max-pooling - Sentence has 4 words. - Network produces 4

    vectors - How to produce only one fixed-width output vector?
  32. Max-pooling The movie was great 3 1 5 5 8

    7 1 0 1 3 0 9 MAX POOL 5 8 9
  33. Max-pooling allows interpretation How many times did we pick the

    max from this word’s vector? Combine with logreg coefficients for pos/negative features
  34. Task: “Spot new claims” Input: document Output: all the claims

    in it.
  35. Precision and recall Model can make two types of mistakes.

    1) “Chocolate is great.” is a claim 2) “350 million a week for the EU” is not a claim
  36. Precision and recall Model can make two types of mistakes.

    1) Precision mistake. Label something that is not a claim as a claim. WRONG: “Chocolate is great.” is a claim 2) Recall mistake. Label something that is a claim as not a claim. WRONG: “350 million a week for the EU” is not a claim
  37. MIN MEDIAN MAX 75% 25% Link

  38. Infersent is better than just naive word vector averaging

  39. Gael Varoquaux's "Interpreting Cross Validation" Gael’s intervals Prec Recall Infersent

    88-92 77-87 Mean GloVe 86-90 71-81 Overlap 88-90 77-81 Positive class 1000 samples Negative class 4500 samples Evaluation on small data
  40. Code credit to the Full Fact data science volunteer community

    Benj Pettit, Data Scientist, Elsevier Michal Lopuszynski, Data Scientist, Warsaw University Oliver Price, Masters student, Warwick University Andreas Geroski, our best annotator Let’s talk if you are interested in volunteering
  41. • Transfer learning is really useful. • ML can make

    democracy better. • Not perfect, but better. • If you would like to help us, volunteer or donate at https://fullfact.org/donate/ Conclusion
  42. None
  43. Extra slides

  44. Vanishing Gradient Credit: Alex Graves’ PhD Thesis

  45. Credit: Alex Graves’ PhD Thesis Gradient Preserved by LSTM

  46. Reading: http://colah.github.io/posts/2015-08-Understanding-LSTMs/ Long Short-Term Memory Unit Image Credit: Alex’ Graves

    PhD Thesis
  47. Text credit:http://text2vec.org/glove.html

  48. Harnessing Cognitive Features for Sarcasm Detection