Sentence Embeddings for Automated Factchecking PyData London 2018

Sentence Embeddings for Automated Factchecking Lev Konstantinovskiy NLP engineer

What we do 1) Write factchecking articles with links to
primary sources 2) Corrections 3) Help data producers and users

Full Fact is the UK's independent factchecking charity We are…
- Independent - Impartial - Effective

We exist to get rid of this problem… “70% of
laws are made in the EU” or “7% of laws are made in the EU”

It can neatly sort the entire world into True and
False. Machine Learning solves everything

Machine Learning solves everything It can neatly sort the entire
world into True and False. Not yet! Need Human- level AI

We exist to get rid of this problem… “70% of
laws are made in the EU” or “7% of laws are made in the EU”

It’s complicated Re-inserting the shades of grey when people are
making it out to be black and white

Factcheckers should spend time on the difficult questions Automating Factchecking

1. Monitor 2. Spot claims 3. Check claims 4. Publish
Factchecking process

Factchecking process Automatable

Factchecking process Automatable only in some cases

Task: “Spot new claims” Input: document Output: all the claims
in it.

Which TV show has the most claims? Prime Minister’s Questions
Question Time The Andrew Marr Show Sunday Politics

Manual annotation of political TV Percentage of sentences that are
claims. Average over 3-6 episodes Prime Minister's Questions 32 Question Time 18 The Andrew Marr Show 16

MIN Link Each dot represents an annotated episode Manual annotation
of political TV Percentage of sentences that are claims MAX MEDIAN 75% 25%

What is a claim? Quantity in the past or present
“The government has spent £3bn on reorganisation of the NHS” But excluding Personal Experience “I am a qualified social worker, qualified for over 20 years.” (abridged definition)

The data is small - 1000 positive examples - 4500
negative examples What to do?

Transfer learning Credit:Sebastian Ruder Blog http://ruder.io/transfer-learning/

Transfer learning in NLP - 2013 - 2017: Download pre-trained
FastText, GloVe, word2vec word vectors, then average. - 2018: pre-trained sentence vectors

InferSent in PyTorch Not just a paper. Also code &
embeddings!

Data that went into InferSent - Input layer is pre-trained
GloVe on Common Crawl 840B words (unsupervised) - Trained on a Natural Language Inference tasks 1M sentence pairs. (supervised) - InferSent + logistic evaluated on 10 transfer tasks. - Better than attention, CNN and other

Natural Language Inference Task Given premise “Two dogs are running
through a field” How likely is the hypothesis “There are animals outdoors”? E. Definitely True? N. Might be True? C. Definitely False?

SNLI dataset of 570k pairs Photo caption (photo not shown)
“Two dogs are running through a field” - Definitely True: “There are animals outdoors” - Might be true: “Some puppies are running to catch a stick” - Definitely False: “The pets are sitting on a couch”

Bi-directional recurrent network Read text forward. Then read backwards. “How
can we base our understanding of what we’ve heard on something that hasn’t been said yet? ...Sounds, words, and even whole sentences that at first mean nothing are found to make sense in the light of future context.” Alex Graves and Jurgen Schmidhuber, “Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures”, 2005

Max-pooling - Sentence has 4 words. - Network produces 4
vectors - How to produce only one fixed-width output vector?

Max-pooling The movie was great 3 1 5 5 8
7 1 0 1 3 0 9 MAX POOL 5 8 9

Max-pooling allows interpretation How many times did we pick the
max from this word’s vector? Combine with logreg coefficients for pos/negative features

Task: “Spot new claims” Input: document Output: all the claims
in it.

Precision and recall Model can make two types of mistakes.
1) “Chocolate is great.” is a claim 2) “350 million a week for the EU” is not a claim

Precision and recall Model can make two types of mistakes.
1) Precision mistake. Label something that is not a claim as a claim. WRONG: “Chocolate is great.” is a claim 2) Recall mistake. Label something that is a claim as not a claim. WRONG: “350 million a week for the EU” is not a claim

MIN MEDIAN MAX 75% 25% Link

Infersent is better than just naive word vector averaging

Gael Varoquaux's "Interpreting Cross Validation" Gael’s intervals Prec Recall Infersent
88-92 77-87 Mean GloVe 86-90 71-81 Overlap 88-90 77-81 Positive class 1000 samples Negative class 4500 samples Evaluation on small data

Code credit to the Full Fact data science volunteer community
Benj Pettit, Data Scientist, Elsevier Michal Lopuszynski, Data Scientist, Warsaw University Oliver Price, Masters student, Warwick University Andreas Geroski, our best annotator Let’s talk if you are interested in volunteering

• Transfer learning is really useful. • ML can make
democracy better. • Not perfect, but better. • If you would like to help us, volunteer or donate at https://fullfact.org/donate/ Conclusion

Extra slides

Vanishing Gradient Credit: Alex Graves’ PhD Thesis

Credit: Alex Graves’ PhD Thesis Gradient Preserved by LSTM

Reading: http://colah.github.io/posts/2015-08-Understanding-LSTMs/ Long Short-Term Memory Unit Image Credit: Alex’ Graves
PhD Thesis

Text credit:http://text2vec.org/glove.html

Harnessing Cognitive Features for Sarcasm Detection

Sentence Embeddings for Automated Factchecking ...

Sentence Embeddings for Automated Factchecking PyData London 2018

More Decks by Lev Konstantinovskiy

Other Decks in Technology

Featured

Transcript