Manual annotation of political TV Percentage of sentences that are claims. Average over 3-6 episodes Prime Minister's Questions 32 Question Time 18 The Andrew Marr Show 16
What is a claim? Quantity in the past or present “The government has spent £3bn on reorganisation of the NHS” But excluding Personal Experience “I am a qualified social worker, qualified for over 20 years.” (abridged definition)
Data that went into InferSent - Input layer is pre-trained GloVe on Common Crawl 840B words (unsupervised) - Trained on a Natural Language Inference tasks 1M sentence pairs. (supervised) - InferSent + logistic evaluated on 10 transfer tasks. - Better than attention, CNN and other
Natural Language Inference Task Given premise “Two dogs are running through a field” How likely is the hypothesis “There are animals outdoors”? E. Definitely True? N. Might be True? C. Definitely False?
SNLI dataset of 570k pairs Photo caption (photo not shown) “Two dogs are running through a field” - Definitely True: “There are animals outdoors” - Might be true: “Some puppies are running to catch a stick” - Definitely False: “The pets are sitting on a couch”
Bi-directional recurrent network Read text forward. Then read backwards. “How can we base our understanding of what we’ve heard on something that hasn’t been said yet? ...Sounds, words, and even whole sentences that at first mean nothing are found to make sense in the light of future context.” Alex Graves and Jurgen Schmidhuber, “Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures”, 2005
Max-pooling allows interpretation How many times did we pick the max from this word’s vector? Combine with logreg coefficients for pos/negative features
Precision and recall Model can make two types of mistakes. 1) Precision mistake. Label something that is not a claim as a claim. WRONG: “Chocolate is great.” is a claim 2) Recall mistake. Label something that is a claim as not a claim. WRONG: “350 million a week for the EU” is not a claim
Gael Varoquaux's "Interpreting Cross Validation" Gael’s intervals Prec Recall Infersent 88-92 77-87 Mean GloVe 86-90 71-81 Overlap 88-90 77-81 Positive class 1000 samples Negative class 4500 samples Evaluation on small data
Code credit to the Full Fact data science volunteer community Benj Pettit, Data Scientist, Elsevier Michal Lopuszynski, Data Scientist, Warsaw University Oliver Price, Masters student, Warwick University Andreas Geroski, our best annotator Let’s talk if you are interested in volunteering
• Transfer learning is really useful. • ML can make democracy better. • Not perfect, but better. • If you would like to help us, volunteer or donate at https://fullfact.org/donate/ Conclusion