Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Applications of Question Generation in NLP

wing.nus
March 28, 2022

Applications of Question Generation in NLP

Talk at Pie & AI: Singapore @ deeplearning.ai — March 28th, 2022
URL: https://www.eventbrite.com/e/pie-ai-singapore-applications-of-question-generation-in-nlp-tickets-304213690337#

wing.nus

March 28, 2022
Tweet

More Decks by wing.nus

Other Decks in Research

Transcript

  1. Applications of Question Generation in NLP Liangming Pan Email: [email protected]

    Talk at Pie & AI: Singapore @ deeplearning.ai — March 28th, 2022
  2. Question Generation 3 Generated Question Input Context Answer-aware Answer-agnostic Text

    Image Video Table KG Dialogue Factoid Q Clarification Q Multiple-choice Q Sequential Q
  3. Question Generation 4 Generated Question Input Context Answer-aware Text Factoid

    Q Generated Question: What Shakespeare scholar is currently on the faculty? Input Context: Current faculty include the anthropologist Marshall Sahlins and the Shakespeare scholar David Bevington. Answer: David Bevington
  4. Methodology for Question Generation 5 q Rule-based Methods q Neural

    Methods • Apply linguistic rules to transform a declarative sentence into a question. • Fill out pre-defined question templates. • Sequence-to-Sequence Model • Encoder: encode the input passage and answer • Decoder: decode question token by token (Image Credit: R. Zhang et al. 2021)
  5. Applications of Question Generation 10 Education Generate quiz questions from

    course materials (Seyler et al., ICTIR 2017) generate evaluate
  6. (Marzieh et al., EMNLP 2018) Chatbot Applications of Question Generation

    11 Generate clarification questions in dialogue I am working for an employer in Canada. Do I need to carry on paying UK National Insurance Have you been working abroad 52 weeks or less?
  7. (Duan et al., EMNLP 2017) (Lewis et al., ACL 2019)

    (Puri et al., EMNLP 2020) (Yue et al., EMNLP 2021) Question Answering Applications of Question Generation 12 Generate training data for question answering
  8. Summarization Applications of Question Generation 13 Evaluate factual consistency of

    summarization (Krishna and Iyyer, ACL 2019) (Wang et al., ACL 2020)
  9. Contents 15 Question Generation for Fact Checking (ACL 2021) QA

    Model Generated QA pairs Train Tables Documents [Pan et al., NAACL 2021] Generated QA pairs Supported Refuted NEI Fact Verification [Pan et al., ACL 2021] Question Generation for Multi-hop QA (NAACL 2021)
  10. Contents 16 QA Model Generated QA pairs Train Tables Documents

    [Pan et al., NAACL 2021] Question Generation for Multi-hop QA (NAACL 2021)
  11. 17 Unsupervised Multi-hop Question Answering by Question Generation [NAACL 2021]

    Pan et al: Unsupervised Multi-hop Question Answering by Question Generation Liangming Pan, Wenhu Chen, Wenhan Xiong, Min-Yen Kan, William Yang Wang
  12. 18 q Multi-hop QA requires the integration and reasoning over

    different information sources to find the answer. Multi-hop QA
  13. 19 Human-written QA Pairs QA Model Train Training a Multi-hop

    QA System • 78K questions • 13K HITs • $2.3 USD/HIT • 2 min per sample • Total Cost: >30,000 USD • 112K questions Time-consuming and costly to create training data.
  14. 20 Multi-hop QA via Question Generation Generated QA Pairs Generator

    Input Sources Tables Documents Could we use Question Generation (QG) to automatically generate high-quality training data? QA Model Train
  15. 21 Challenges Lack of training data We lack human-annotated (source,

    multi-hop question, answer) as supervision. Composing sub-questions into multi-hop questions Quality of the generated data How to ensure the generated (source, question, answer) pair is valid ? o The question performs multi-hop reasoning over the input source. o The generated answer is correct. Defining an interpretable process that mimics human reasoning
  16. Answer Multi-hop Questions 22 When did the rock band that

    sang "All Join Hands" rise to prominence? Multi-hop Question Which rock band sang "All Join Hands"? Paragraph A: All Join Hands "All Join Hands" is a song by the British rock band Slade, released in 1984 as the lead single from the band's twelfth studio album "Rogues Gallery". Paragraph B: Slade Slade are an English glam rock band from Wolverhampton. They rose to prominence during the early 1970s with 17 consecutive top 20 hits and six number ones on the UK Singles Chart. When did Slade rise to prominence? Ans = Slade Ans = Early 1970s
  17. Generate Multi-hop Questions 23 When did the rock band that

    sang "All Join Hands" rise to prominence? Multi-hop Question Which rock band sang "All Join Hands"? Paragraph A: All Join Hands "All Join Hands" is a song by the British rock band Slade, released in 1984 as the lead single from the band's twelfth studio album "Rogues Gallery". Paragraph B: Slade Slade are an English glam rock band from Wolverhampton. They rose to prominence during the early 1970s with 17 consecutive top 20 hits and six number ones on the UK Singles Chart. When did Slade rise to prominence? Ans = Slade Ans = Early 1970s
  18. Generate Multi-hop Questions 24 When did the rock band that

    sang "All Join Hands" rise to prominence? Multi-hop Question Which rock band sang "All Join Hands"? Paragraph A: All Join Hands "All Join Hands" is a song by the British rock band Slade, released in 1984 as the lead single from the band's twelfth studio album "Rogues Gallery". Paragraph B: Slade Slade are an English glam rock band from Wolverhampton. They rose to prominence during the early 1970s with 17 consecutive top 20 hits and six number ones on the UK Singles Chart. When did Slade rise to prominence? Ans = Slade Composing sub-questions into multi-hop questions An interpretable process that mimics human reasoning
  19. 26 Operators • Retrieve / generate relevant information from a

    single input source • Aggregate information from two sources. Reasoning Graphs • Each corresponds to one type of multi- hop question • Formulated as a computation graph built upon the operators. General Framework
  20. 27 Group Operator Inputs Output Description Selection 𝐹𝑖𝑛𝑑𝐵𝑟𝑖𝑑𝑔𝑒 (Table +

    Text) or (Text + Text) Bridge Entity Select an entity to serve as the bridge between two texts (or between table and text) 𝐹𝑖𝑛𝑑𝐶𝑜𝑚𝐸𝑛𝑡 Text Comparative Entities Select potential comparative entities from the given text Generation 𝑄𝐺𝑤𝑖𝑡ℎ𝐴𝑛𝑠 Text + Answer Question Generate a question with a given answer from the input text 𝑄𝐺𝑤𝑖𝑡ℎ𝐸𝑛𝑡 Text + Entity Question Generate a question which contains the given entity from the input text 𝐷𝑒𝑠𝑐𝑟𝑖𝑏𝑒𝐸𝑛𝑡 Table + Entity Sentence Generate a sentence that describes the given entity based on the information in the table Q𝑢𝑒𝑠𝑇𝑜𝑆𝑒𝑛𝑡 Question Sentence Convert a question into a declarative sentence Fusion 𝐵𝑟𝑖𝑑𝑔𝑒𝐵𝑙𝑒𝑛𝑑 Question + Sentence + Bridge Multi-hop Question Generate a bridge-type multi-hop question 𝐶𝑜𝑚𝑝𝑎𝑟𝑒𝐵𝑙𝑒𝑛𝑑 Question + Question Multi-hop Question Generate a comparative-type multi-hop question Operators
  21. 28 𝑄𝐺𝑤𝑖𝑡ℎ𝐸𝑛𝑡 #𝟏 Slade When did the rock band that

    sang "All Join Hands" rise to prominence? Answer: Early 1970s Paragraph A: All Join Hands "All Join Hands" is a song by the British rock band Slade, released in 1984 ⋯ ⋯ Paragraph B: Slade Slade are an English glam rock band from Wolverhampton. They rose to prominence during the early 1970s with ⋯ ⋯ ⋯ #𝟐 When did the rock band Slade rose to prominence? Answer: Early 1970s #𝟑 What rock band sang “All Join Hands”? Answer: Slade #𝟒 Slade sang “All Join Hands”. #𝟓 𝐹𝑖𝑛𝑑𝐵𝑟𝑖𝑑𝑔𝑒 𝑄𝐺𝑤𝑖𝑡ℎ𝐴𝑛𝑠 𝐵𝑟𝑖𝑑𝑔𝑒𝐵𝑙𝑒𝑛𝑑 𝑄𝑢𝑒𝑠𝑇𝑜𝑆𝑒𝑛𝑡
  22. 29 Kirsten Carlijn Wild (born 15 October 1982) is a

    Dutch professional racing cyclist, ⋯ ⋯ ⋯. Wild competed in two track cycling events at the 2012 Summer Olympics. Kirsten Wild Kirsten Wild of Netherlands won the bronze medal in the 2011 Apeldoorn. What is the birthdate of Kirsten Wild? Answer: 15 October 1982 𝐵𝑟𝑖𝑑𝑔𝑒𝐵𝑙𝑒𝑛𝑑 #𝟒 What is the birthdate of the athlete that of Netherlands won the bronze medal in the 2011 Apeldoorn? Answer: 15 October 1982 Medal Championship Name Event Silver 2010 Pruszkow Tim Veldt Men’s omnium Bronze 2011 Apeldoorn Kristen Wild Women’s omnium Gold 2013 Apeldoorn Elis Ligtlee Women’s keirin 𝐹𝑖𝑛𝑑𝐵𝑟𝑖𝑑𝑔𝑒 #𝟏 𝐷𝑒𝑠𝑐𝑟𝑖𝑏𝑒𝐸𝑛𝑡 #𝟐 𝑄𝐺𝑤𝑖𝑡ℎ𝐸𝑛𝑡 #𝟑
  23. 35 Group Operator Inputs Output Description Selection Generation 𝑄𝐺𝑤𝑖𝑡ℎ𝐴𝑛𝑠 Text

    + Answer Question Generate a question with a given answer from the input text 𝑄𝐺𝑤𝑖𝑡ℎ𝐸𝑛𝑡 Text + Entity Question Generate a question which contains the given entity from the input text 𝐷𝑒𝑠𝑐𝑟𝑖𝑏𝑒𝐸𝑛𝑡 Table + Entity Sentence Generate a sentence that describes the given entity based on the information in the table Fusion 𝐵𝑟𝑖𝑑𝑔𝑒𝐵𝑙𝑒𝑛𝑑 Question + Sentence + Bridge Multi-hop Question Generate a bridge-type multi-hop question Operators
  24. 1. 𝑄𝐺𝑤𝑖𝑡ℎ𝐴𝑛𝑠 and 𝑄𝐺𝑤𝑖𝑡ℎ𝐸𝑛𝑡 • Github Link: https://github.com/patil-suraj/question_generation • An

    Google-T5 model jointly trained on three tasks based on the SQuAD dataset. • Answer Prediction, Question Generation, Question Answering extract answer: <hl> 42 is the answer to life, the universe and everything. <hl> generate question: <hl> 42 <hl> is the answer to life, the universe and everything. question: What is the answer to life? context: 42 is the answer to life, the universe and everything. 42 What is the answer to life, the universe and everything? 42 Google T5 36
  25. 2. 𝐷𝑒𝑠𝑐𝑟𝑖𝑏𝑒𝐸𝑛𝑡 (Table-to-Text) 37 • An GPT-2 model finetuned on

    the ToTTo dataset. Medal Championship Name Event Silver 2010 Pruszkow Tim Veldt Men’s omnium Bronze 2011 Apeldoorn Kristen Wild Women’s omnium Gold 2013 Apeldoorn Elis Ligtlee Women’s keirin Gold 2013 Apeldoorn Elis Ligtlee Women’s sprint Input Table + Target Entity Netherlands at the European Track Championships The table title is Netherlands at the European Track Championships . The Medal is Bronze . The Championship is 2011 Apeldoorn . The Name is Kirsten Wild . The Event is Women's omnium . Start describing Kirsten Wild : Kirsten Wild of Netherlands won the bronze medal in the 2011 Apeldoorn. Table Templatization Pretrained GPT-2
  26. Results 39 Model / Metrics BLEU-4 METEOR ROUGE-L NQG++ 13.51

    18.18 41.60 S2ga-mp-gsa 15.82 19.67 44.24 CGC-QG 17.55 21.24 44.53 Google-T5 21.32 27.09 43.60 UniLM 23.75 25.61 52.04 Model / Metrics BLEU-4 METEOR ROUGE-L Seq2Seq+Attn 28.31 27.61 56.63 GPT2-TabGen 33.92 32.46 55.61 GPT2-Medium 35.94 33.74 57.44 Models for QGwithAns/Ent Models for DescribeEnt
  27. 40 !: Kirsten Wild ": What is the birthdate of

    Kirsten Wild? Answer: 15 October 1982 #: Kirsten Wild of Netherlands won the bronze medal in the 2011 Apeldoorn. What is the birthdate of the _____ that of Netherlands won the bronze medal in the 2011 Apeldoorn? What is the birthdate of the athlete that of Netherlands won the bronze medal in the 2011 Apeldoorn? Answer: 15 October 1982 BERT 3. 𝐵𝑟𝑖𝑑𝑔𝑒𝐵𝑙𝑒𝑛𝑑 (Combining sub-parts)
  28. Evaluation Datasets 42 q HotpotQA q HybridQA Text + Text

    Table + Text (Chen et al., EMNLP 2020) (Yang et al., EMNLP 2018)
  29. 43 Supervised QA Performance In-Table In-Text Overall Supervised 58.6 46.4

    50 Zero-Shot 40.6 25 30.5 0 10 20 30 40 50 60 70 F1 Score HybridQA Bridge Comparison Overall Supervised 83.5 80.3 82.8 Zero-Shot 72.2 54.4 68.6 0 10 20 30 40 50 60 70 80 90 F1 Score HotpotQA ~90K human-labeled data ~60K human-labeled data
  30. 44 Zero-shot QA Performance In-Table In-Text Overall Supervised 58.6 46.4

    50 Zero-Shot 40.6 25 30.5 0 10 20 30 40 50 60 70 F1 Score HybridQA Bridge Comparison Overall Supervised 83.5 80.3 82.8 Zero-Shot 72.2 54.4 68.6 0 10 20 30 40 50 60 70 80 90 F1 Score HotpotQA 100K generated data 100K generated data
  31. 45 Few-shot QA Performance q HotpotQA q HybridQA The F1

    score for progressively larger training dataset sizes for finetuning. • 100K generated data + N human-labeled data • N human-labeled data
  32. 47 Ablation Study Single-hop questions performs bad when they are

    used to train the multi-hop QA model. We need data that performs multi-hop reasoning.
  33. 48 Model trained with a single reasoning chain only performs

    well on the corresponding question type We need data that performs diverse reasoning. Ablation Study
  34. 49 Examples of Generated Questions Type Question Answer Table-Text When

    did the one that won the Eurovision Song Contest in 1966 join Gals and Pals? 1963 How many students attend the teams that played in the Dryden Township Conference? 1900 Text-Table What album did the Oak Ridge Boys release in 1989? American Dreams When was the name that is the name of the bridge that crosses Youngs Bay completed? Summer Text-Text Which Canadian cinematographer is best known for his work on Fargo? Craig Wrobleski What is illegal in the country that is Bashar Hafez al - Assad ’s father? Cannabis Comparison Who was born first, Terry Southern or Neal Town Stephenson? Terry Southern Are Beth Ditto and Mary Beth Patterson of the same nationality? Yes
  35. 50 Limitations Naturalness of the generated question We rely on

    template-based method + BERT to compose questions. This sometimes makes the generated questions look unnatural. o Generated: What is illegal in the country that is Bashar’s father? o Human: What is illegal in Bashar’s father’s country? Hard to generate “less apparent” multi-hop questions For some multi-hop questions, the decomposition into sub-parts are not evident from the question itself. o Question: Did Aristotle use a laptop? o Evidence 1: Aristotle was died in 322BC. o Evidence 2: The first laptop was invented in 1980.
  36. 51 Summary When did the rock band that sang "All

    Join Hands" rise to prominence? Multi-hop Question Which rock band sang "All Join Hands"? Paragraph A: All Join Hands "All Join Hands" is a song by the British rock band Slade, released in 1984 as the lead single from the band's twelfth studio album "Rogues Gallery". Paragraph B: Slade Slade are an English glam rock band from Wolverhampton. They rose to prominence during the early 1970s with 17 consecutive top 20 hits and six number ones on the UK Singles Chart. When did Slade rise to prominence? Ans = Slade Composing simple questions into complex questions An interpretable process that mimics human reasoning Generator Tables Documents QA Model Train
  37. Contents 52 Question Generation for Fact Checking (ACL 2021) Generated

    QA pairs Supported Refuted NEI Fact Verification [Pan et al., ACL 2021]
  38. 53 Zero-shot Fact Verification with Claim Generation [ACL 2021] Pan

    et al: Zero-shot Fact Verification with Claim Generation Liangming Pan, Wenhu Chen, Wenhan Xiong, Min-Yen Kan, William Yang Wang
  39. Fact Checking 55 v Labels: Supports, Refutes, Not Enough Info

    v Pipeline v Document Retrieval v Sentence Retrieval v Claim Verification “Immigrants are a drain on the economy”
  40. 57 Training a Fact Verification System Evidence Claim Fact Checking

    Model Human-labeled (Evidence, Claim, Label) Training SUPPORT REFUTE NEI Label Claim: Penguin Books is a publishing house founded in 1930. Evidence: Penguin Books was founded in 1935 by Sir Allen Lane as a line of the publishers The Bodley Head, only becoming a separate company the following year. Label: REFUTES
  41. 58 Training a Fact Verification System Evidence Claim Fact Checking

    Model Human-labeled (Evidence, Claim, Label) Training SUPPORT REFUTE NEI Label Time-consuming and costly to create training data • 50 annotators • Write ~185,000 claims • Data validation
  42. 60 Document Evidence Supported Refuted Not Enough Info Generated (Evidence,

    Claim, Label) Pairs Pre-Training Fact Checking Model Fine-Tuning SUPPORT REFUTE NEI Claim Generation for Fact Verification Evidence Claim Human-labeled (Evidence, Claim, Label) Label
  43. Generated (Evidence, Claim, Label) Pairs 61 Evidence Supported Refuted Not

    Enough Info Claim Generation for Fact Verification Question Generation Claim Generation closely connected Q + A Q + A Q + A Supported Claim Refuted Claim NEI Claim Q / Q : answerable / unanswerable A / A : correct / wrong
  44. 62 Claim Generation with QG Evidence (𝓟) 1992 Los Angeles

    riots The 1992 Los Angeles riots, also known as the Rodney King riots were a series of riots, lootings, arsons, and civil disturbances that occurred in Los Angeles County, California in April and May 1992. By the time the riots ended, 63 people had been killed. Extra Contexts (𝓟𝒆𝒙𝒕 ) ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ Q: Where did the Rodney King riots happen? A: Los Angeles County Q: How many people were killed in the Rodney King riots? A: 63 Question Generator Q: Where did the Rodney King riots happen? A: San Francisco County Answer Replacement The Rodney King riots took place in Los Angeles County. The Rodney King riots took place in San Francisco County. 63 people were killed in the Rodney King riots. SUPPORTED REFUTED NOT ENOUGH INFO QA-to-Claim Model
  45. Key Components q Question Generator o BART model finetuned on

    the SQuAD dataset. q QA-to-Claim o BART model finetuned on the QA2D dataset (Demszky et al., 2018) 63 Who called Taylor? [SEP] Liz Liz called Taylor. BART generate question: <hl> 42 <hl> is the answer to life, the universe and everything. What is the answer to life, the universe and everything? BART
  46. Answer Replacement 64 Evidence: Homeland is an American spy thriller

    television series developed by Howard Gordon and Alex Gansa based on the Israeli series Prisoners of War, which was created by Gideon Raff . Local Replacement v Replace the answer with an entity with the same type within the evidence. Ques: Who developed Homeland? Ans: Howard Gordon→ Gideon Raff Claim: Gideon Raff developed the Homeland. Global Replacement v Replace the answer with an entity that is close in semantics. v We use the sense2vec (Trask et. al, 2015) to find the closest phrases. Ques: Who developed Homeland? Ans: Howard Gordon→Claire Danes Claim: Claire Danes developed the Homeland.
  47. 67 Zero-shot Fact Verification 95.1 87.8 85.5 78.1 62.6 77.1

    0 10 20 30 40 50 60 70 80 90 100 FEVER-S/R FEVER-S/R/N FEVER-Symmetric F1 Score Supervised QACG
  48. 68 Zero-shot Fact Verification 95.1 87.8 85.5 78.1 62.6 77.1

    70.2 49.8 67.8 55.6 35.3 52.7 0 10 20 30 40 50 60 70 80 90 100 FEVER-S/R FEVER-S/R/N FEVER-Symmetric F1 Score Supervised QACG LM for FC Perplexity
  49. 71 Discussions v Our model requires a good question generator.

    v To generate deep claims, you need to generate deep questions.
  50. Summary 74 q Benefit Multi-hop QA by composing simple questions

    into complex questions. q Benefit Fact Verification by connecting claim generation with question generation. How can question generation benefit downstream NLP applications?
  51. 76 Survey for Neural Question Generation • https://arxiv.org/pdf/1905.08949.pdf Know more

    about QG Question Generation Paper List • https://github.com/teacherpeterpan /Question-Generation-Paper-List
  52. 78 References [1] Knowledge Questions from Knowledge Graphs. Dominic Seyler,

    Mohamed Yahya, Klaus Berberich. ICTIR 2017. [2] Interpretation of Natural Language Rules in Conversational Machine Reading. Marzieh Saeidi, Max Bartolo, Patrick Lewis, Sameer Singh, Tim Rocktäschel, Mike Sheldon, Guillaume Bouchard, Sebastian Riedel. EMNLP 2018. [3] Question Generation for Question Answering. Nan Duan, Duyu Tang, Peng Chen, Ming Zhou. EMNLP 2017. [4] Unsupervised Question Answering by Cloze Translation. Patrick Lewis, Ludovic Denoyer, Sebastian Riedel. ACL 2019. [5] Training Question Answering Models From Synthetic Data. Raul Puri, Ryan Spring, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro. EMNLP 2020. [6] Generating Question-Answer Hierarchies. Kalpesh Krishna and Mohit Iyyer. ACL 2019. [7] Recent Advances in Neural Question Generation. Liangming Pan, Wenqiang Lei, Tat-Seng Chua, Min-Yen Kan. arXiv 2019. [8] SQuAD: 100,000+ Questions for Machine Comprehension of Text. Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, Percy Liang. EMNLP 2016.
  53. 79 References [9] Semantic Graphs for Generating Deep Questions. Liangming

    Pan, Yuxi Xie, Yansong Feng, Tat- Seng Chua, Min-Yen Kan. ACL 2020. [10] HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering. Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William W. Cohen, Ruslan Salakhutdinov, Christopher D. Manning. EMNLP 2018. [11] Exploring Question-Specific Rewards for Generating Deep Questions. Yuxi Xie, Liangming Pan, Dongzhe Wang, Min-Yen Kan, Yansong Feng. COLING 2020. [12] HybridQA: A Dataset of Multi-Hop Question Answering over Tabular and Textual Data. Wenhu Chen, Hanwen Zha, Zhiyu Chen, Wenhan Xiong, Hong Wang, William Wang. ENNLP 2020. [13] Unsupervised Multi-hop Question Answering by Question Generation. Liangming Pan, Wenhu Chen, Wenhan Xiong, Min-Yen Kan, William Yang Wang. arXiv 2020.