PAQ: 65 Million Probably-Asked Questions and What You Can Do With Them

PAQ: 65 Million Probably-Asked Questions and What You Can Do
With Them Patrick Lewis, Yuxiang Wu, Linqing Liu, Pasquale Minervini, Heinrich Küttler, Aleksandra Piktus, Pontus Stenetorp, Sebastian Riedel 1

Facebook AI “ODQA is emerging as a benchmark method of
measuring systems’ abilities to read, represent, and retrieve knowledge expressed in all of the documents on the web.” EfficientQA Organizers, NeurIPS 2020 2 Open-Domain Question Answering (ODQA) Q: who has the right of way in international waters? A: Neither Vessel Q: when was puerto rico added to the usa? A: 1950 Q: who's hosting the super bowl in 2019? A: Atlanta, Georgia Q: how many seasons are there of grey's anatomy? A: 14 Q: who plays the voice of maui in moana? A: dwayne johnson

Facebook AI 3 Q: last time la dodgers won the
world series? Wikipedia Retriever e.g. DPR, TF-IDF “Retrieve-and-Read” Accurate Interpretable Slow Large Train time Test time Min -logP(a|q) Seq2Seq Training QA pairs e.g. T5, BART, GPT-{1,2,3} “Closed-book QA” Inaccurate Black-box Faster Smaller Reader 1988 e.g. BERT, RAG, FiD 1988 Q: last time la dodgers won the world series? Seq2Seq

Facebook AI 4 who was the film chariots of fire
about Eric Liddell Closed- book QA who was originally cast to play indiana jones RePAQ + FiD backoff FiD Harrison Ford Q: who was the main character in chariots of fire A: Eric Liddell who was the film chariots of fire about RePAQ Retrieval Tom Selleck Training data high confidence ? yes no Passage selector Question Generator Answer Extractor Wikipedia Global Filtering 1 2 3 4 Generation PAQ 65M Probably-Asked Questions Q: who played indiana jones in the original A: Harrison Ford RePAQ BART PAQ RePAQ PAQ+RePAQ enables • Highly accurate QA • Efficient QA • Fast QA • Interpretable QA • Well-calibrated QA 48% NQ 2x winner at EfficientQA 2020 Up to 1000s of questions per second returns single best-matched QA pair + 10% over SOTA @ 50% coverage

Facebook company Why generate PAQ in the first place? 5

1. “Question Memorization” - Recall answers to questions from training
time 6 Train Q: who's hosting the super bowl in 2019? A: Atlanta, Georgia Test Q: where will the super bowl be in 2019? A: Atlanta, Georgia Train Q: who's hosting the super bowl in 2019? A: Atlanta, Georgia Test Q: who hosted the 1996 Olympic games? A: Atlanta, Georgia Train Q: who's hosting the super bowl in 2019? A: Atlanta, Georgia Test Q : who plays the voice of maui in moana? A: dwayne johnson Question Answering Competencies 3. “QA Generalization” - answer novel test questions with novel answers 2. “Answer Classification” - answer novel questions at test time with answers seen at training time

7 Question Answering Competencies 60% of Test questions only need
“Answer Classification” to answer correctly 30% of Test Questions only need “Question Memorization” to answer correctly WebQuestions 58% TriviaQA 72% NQ 64% Natural Questions TriviaQA WebQuestions No overlap Answer overlap Question overlap 33% 34% 28%

8 Question Answering Competencies Answer Test Question Train Question Jason
Marsden Who plays Max’ voice in a goofy movie Who does max voice in a goofy movie Alan Shearer Who has scored more goals in the premiere league Most goals scored by a premier league player Francisco Pizarro Who led the conquest of the incas in south america Conquistador who defeated the incan empire in peru

9 0 10 20 30 40 50 NaturalQuestions Exact Match
Score Question Memorization Answer Classification QA Generalization How well do models do on Question Competencies? RAG (retrieve and read) BART Closed- book QA 44% 27% 71% 25% 35% 1% 68% 10%

Facebook AI 10 QA database (just training set) Q: last
time la dodgers won the world series? Retriever Q: when is the last time the dodgers won a world series A: 1988 QA-pair retriever

Facebook AI 11 QA database (just training set) Q: who
sings i don't wanna miss a thing Retriever Q: who wrote i don't wanna miss a thing A: Diane Warren QA-pair retriever

Facebook AI Q: who wrote i don't wanna miss a
thing A: Diane Warren Q: who sang i don't wanna miss a thing first A: Aerosmith Q: movie with i don't want to miss a thing A: Armageddon 12 QA database (just training set) Retriever Reranker Q: who sings i don't wanna miss a thing Aerosmith QA-pair retriever - RePAQ

13 0 10 20 30 40 50 NaturalQuestions Exact Match
Score Question Memorization Answer Classification QA Generalization How well do models do on Question Competencies? RAG BART CBQA 46% 27% QA-pair Retriever 31%

14 Aim: Generate QA pairs at scale to: • Pre-empt
and Cache questions we may be asked at test time • Converting QA generalization and Answer Classification questions into Question Memorization questions • An alternative view: reduce open-domain QA -> community QA

Facebook company Expanding the coverage of QA pair KB ->
PAQ 15

Facebook AI 16 PAQ: Probably-asked Questions Increase Coverage of QA
pairs by generating probable QA pairs offline at SCALE Passage Selector (RoBERTa) The Tomb of Absalom also called Absalom's Pillar, is an ancient monumental rock-cut tomb[…] contains a burial chamber with three burial site Answer Extractor (RoBERTa) Question Generator (BART) three Consistency Filter (QA model) Q:how many burial sites are in the tomb of Absalom A:three PAQ Wikipedia

Answer Extraction Question Generation Passage Ranking Filtering Wikipedia PAQ Book
of a Thousand Days is a 2007 young adult fantasy novel by Shannon Hale. It is based on the Brothers Grimm fairy tale Maid Maleen. Dashti, a mucker from steppes of the Eight Realms, begins a diary as she looks for a job after her mother dies of illness. Eventually, she finds and accepts a position as the new maid of Lady Saren, the youngest child of the lord of Titor's Garden. Saren has defied her father's declaration that she will marry Lord Khasar of Thoughts of Under and revealed that she is engage… Shannon Hale Maid Maleen Lord Khasar 2007 Lady Saren maid of Lady Saren steppes of the Eight Realms Lord Khasar who wrote the book of a thousand days who is the book of a thousand days based on who does lady saren marry in book of a thousand days when was book of a thousand days written who is the maid in book of a thousand days who does dashti play in book of a thousand days where does book of a thousand days take place who does she marry in book of a thousand days? Book of a Thousand Days is a 2007 young adult fantasy novel by Shannon Hale. It is based on the Brothers Grimm fairy tale Maid Maleen. Dashti, a mucker from steppes of the Eight Realms, begins a diary as she looks for a job after her mother dies of illness. Eventually, she finds and accepts a position as the new maid of Lady Saren, the youngest child of the lord of Titor's Garden. Saren has defied her father's declaration that she will marry Lord Khasar of Thoughts of Under and revealed that she is engage…

Facebook AI 18 PAQ – Probably-asked Questions Probably-asked Questions PAQ
65 Million QA pairs (650x the size of NaturalQuestions) Generated from 1B words of Wikipedia (50% of Wikipedia)

Facebook company Question Answering Results 19

Facebook AI 20 25 30 35 40 45 50 DensePhrases
RAG FiD-large RePAQ Exact Match Score Open-Natural Questions R ePA Q + FiD -large Backoff BA R T-large CBQ A T5-11B+SSM CBQ A

Facebook AI 21 Global QA-pair Filtering matters • No Filter:
keep all generated QA-pairs • Local filter: check generated questions are consistent using MRC model • Global filter: ensure generated questions are consistent using ODQA model 25 30 35 40 45 Exact Match score Global Filter Local Filter No Filter

Facebook AI 22 More Globally-filtered questions ➞ better results •
More questions per answer span is better scores • Combining QA-pairs from different generators is better • Empirically RePAQ always improves with more globally- filtered QA-pairs 43 44 45 46 47 Exact Match score 1 Q / A 4 Q / A + diverse models

Facebook company A closer look at RePAQ 23

Facebook AI 24 Selective QA: Refuse to answer when confidence
low 40 50 60 70 80 90 100 0 20 40 60 80 100 Accuracy (%) Fraction of Questions Answered (%) RePAQ FiD

Facebook AI 25 Selective QA: Refuse to answer when confidence
low Eric Liddell Closed- book QA who was originally cast to play indiana jones RePAQ + FiD backoff FiD Harrison Ford the ter in fire dell l Tom Selleck high confidence ? yes no 65M Probably-Asked Questions Q: who played indiana jones in the original A: Harrison Ford PAQ RePAQ • Gives the best of both speed and accuracy

Facebook AI 26 Inference Speed Model Retriever Reranker Exact Match
Qs / sec FiD-large - - 51.4 0.5 RePAQ base - 40.9 1400 RePAQ xlarge - 41.5 800 RePAQ base base 45.7 55 RePAQ xlarge xxlarge 47.6 6 RePAQ + FiD-large Backoff 52.3 1

Facebook AI 27 CBQA BART-Large struggles to memorize PAQ BART
w/ NQ BART w/ NQ+PAQ + final NQ finetune RePAQ w/ NQ 0 10 20 30 40 50 Exact Match Score Question Memorization Answer Classification QA Generalization RePAQ w/ NQ+PAQ

Facebook company Efficient QA 28

Facebook AI 29 EfficientQA Competition • Develop a QA system
that contains all of the knowledge required to answer open-domain questions • It could be in documents, databases, the parameters of a neural network, or any other form • Encourage systems that store and access knowledge using the smallest number of bytes, including code, corpora, and model parameters

Facebook AI 30 EfficientQA Competition (Concretely) “Build a self-contained QA
system docker image, submit it to our server, and we’ll ask it 1800 hidden, newly-annotated questions” 4 Tracks/prizes: 1. The smallest system that achieves >25% accuracy 2. The most accurate system < 500MB 3. The most accurate system < 6GB 4. Highest scoring system (no constraints)

Facebook AI 31 EfficientQA Competition (Concretely) The size system is
measured by finding the size of the docker image before evaluation begins. Disk space required during evaluation not recorded Systems run on a 16-core machine with 100GB RAM and 2 gpus Systems allowed 6 hours to evaluate (12 seconds per question) -> the real task here is how to compress “knowledge” and the mechanisms to access it, not a building the lightest, fastest or most efficient system.

Facebook AI Implementing a tiny QA system Database: 140K QAs
(as few as possible), build index on fly Retriever: BERT-base -> TF-IDF -220MB Reranker: BERT-base -> ALBERT-base -200MB Debian -> Alpine Linux -110MB PyTorch-CPU -> TFLite -99MB Python -> C++ -65MB Multi-stage builds, compression, optimization -30MB *All models stored as fp16. System accuracy dropped with Int8 quantization

Facebook AI 34 Implementing a tiny QA system • Accuracy:
26.8% • Final size: 28MB • Same size as : • 1 image in RAW • 7 Bibles • 20 Floppy disks • 90 seconds of youtube • 16x smaller than next smallest entry Reranker 21M bash 0.7M Tokenizer 0.2M TFLite 0.4M QA Database 1.6M

Facebook AI 35 28Mb system visualized as an image:

Facebook AI 36 Implementing 500MB system – Scale approach back
up! • Still not enough space for GPU drivers or Pytorch • Limited by time limit not model size • Implement inference in NumPy • Database: 2.4M QAS • Retriever: replace TF-IDF with neural model (+22MB) • Reranker: Albert-base -> Albert-Large (+14MB) • Final size: 336 MB (2nd smallest model submitted) • Accuracy: 33.4% • Outperforms models with 100x more parameters

Facebook AI 37 Ours-tiny Ours-500MB Ours-Unconstrained REALM T5XL T5-base 6GB
6GB 500MB 500MB

Related work MRC/Extractive QA: Hirschman et al. 1999, Rajpurkar et
al. 2016, Joshi et al. 2017, Kwiatkowski et al. 2019, inter alia Open-domain QA: Vorhees and Tice 1999, Chen et al. 2017, inter alia Neural Memory Models: Graves et al. 2014, Weston et al. 2015, Sukhbaatar et al. 2015, Graves et al. 2016. inter alia Knowledge-grounded Dialogue: Weston et al. 2018, Dinan et al. 2019, inter alia Non-parametric Memory Models: Grave et al. 2017, Khandelwal et al. 2020, inter alia REALM: Guu et al. 2020 Recent Parametric Memory Literature: Petroni et al. 2019, Roberts et al. 2020, inter alia Recent ODQA literature: Lee et al. 2019, Karpukhin et al. 2020 , Izacard and Grave 2020, inter alia 38

Collaborators: 39 + UCLNLP and FAIR Aleksandra Piktus Heinrich Küttler
Pasquale Minervini Yuxiang Wu Sebastian Riedel Pontus Stenetorp Linqing Liu +

PAQ: 65 Million Probably-Asked Questions and Wh...

PAQ: 65 Million Probably-Asked Questions and What You Can Do With Them

More Decks by wing.nus

Other Decks in Research

Featured

Transcript