Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-based Question Answering

Annual Conference of the Associa1on for Computa1onal Linguis1cs Changmao Li
and Jinho D. Choi Emory University Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-based Question Answering

2 Recent Contextual Language modeling Approach From original BERT paper

3 • Corpus • Friends Dialogue Transcript • Character Mining
Project • Tasks • FriendsQA • Friends Reading Comprehension • Friends Emotion Detection • Friends Personality Detection

4 Corpus • 10 seasons of the Friends show •
Dialogue Example: From Friends transcript

5 FriendsQA Only Annotated first 4 seasons From FriendsQA dataset

6 Related question answering Tasks – general domain datasets •
SQuAD 1.0 • SQuAD 2.0 • MS Marco • TRIVIAQA • NEWSQA • Narra1veQA – Mul0-turn ques0on answering datasets • SQA • QuAC • CoQA • CQA – Dialogue based ques0on answering datasets • Dream

7 Approach • Problem with Original Transformer Based language modeling
Approach for Dialogue • They pretrained on the formal wri*ng not dialogue based corpus • Simply concatenaDng all the dialogue uEerances into whole context as input

8 Approach • Pretraining with transferred BERT/RoBERTa weights – Token
level masked language model – Utterance level masked language model – Utterance order prediction • Fine-tuning – Joint learning of two tasks

9 Pretraining • Stage 1: Token level Masked Language Model

10 Pretraining • Stage 2: UGerance Level Masked language model

11 Pretraining • Stage 3: Utterance order prediction

12 Fine-tuning • FriendsQA task – The utterance ID prediction
– The token span prediction

13 Experiments • FriendsQA task – Chronological Data Split

14 Experiments • FriendsQA task – EvaluaDon Metrics: • EM:
exact match – Check if the predic-on and gold answer are the exactly same • SM: Span-based Match – Each answer is treated as bag-of-words – Compute macro-average F1 score • UM: u>erance match – checks if the predic-on resides within the same u?erance as the gold answer span

15 Results for Friends QA

16 Analysis • Ablation Studies Method EM SM UM BERTpre
with uid_loss 45.7(±0.8) 61.1(±0.8) 71.5(±0.5) BERTpre without uid_loss 45.6(±0.9) 61.2(±0.7) 71.3(±0.6) BERTpre+ulm with uid_loss 46.2(±1.1) 62.4(±1.2) 72.5(±0.8) BERTpre+ulm without uid_loss 45.7(±0.9) 61.8(±0.9) 71.8(±0.5) BERTpre+ulm+uop with uid_loss 46.8(±1.3) 63.1(±1.1) 73.3(±0.7) BERTpre+ulm+uop without uid_loss 45.6(±0.9) 61.7(±0.7) 71.7(±0.6) RoBERTapre with uid_loss 52.8(±0.9) 68.7(±0.8) 81.9(±0.5) RoBERTapre without uid_loss 52.6(±0.7) 68.6(±0.6) 81.7(±0.7) RoBERTapre+ulm with uid_loss 53.2(±0.6) 69.2(±0.7) 82.4(±0.5) RoBERTapre+ulm without uid_loss 52.9(±0.8) 68.7(±1.1) 81.7(±0.6) RoBERTapre+ulm+uop with uid_loss 53.5(±0.7) 69.6(±0.8) 82.7(±0.5) RoBERTapre+ulm+uop without uid_loss 52.5(±0.8) 68.8(±0.5) 81.9(±0.7)

17 Analysis • FriendsQA Tasks – QuesDon type analysis

19 Analysis • Error Analysis

20 Analysis • Error examples

23 Analysis • FriendsQA Task Remained Challenges – Inference in
the dialogue? • Still mainly doing pattern matching. • In some cases, the utterance id prediction let model forcedly learn the right utterance of an answer span. – Deal with speakers and mentions? • Adding the speakers into the vocabulary cannot improve the results.

24 Conclusion • A novel transformer approach that interprets hierarchical
contexts in multiparty dialogue. • Evaluated on FriendsQA task and outperforms BERT and RoBERTa. • Although the model shows no help to other character mining tasks, it still gives promising idea for future studies.

25 List of Contributions • New pre-training tasks are introduced
to improve the quality of both token-level and u;erance-level embeddings generated by the transformers, that be;er suit to handle dialogue contexts. • A new mul@-task learning approach is proposed to ﬁne-tune the language model for span-based QA that takes full advantage of the hierarchical embeddings created from the pre-training. • The approach outperforms the previous state-of-the- art models using BERT and RoBERTa on the span-based QA task using dialogues as evidence documents.

26 Future work • Figure out how to represent speakers
and menQons in the dialogue. • Figure out how to inference in the dialogue. • Design new advanced dialogue language model that can ﬁt for all tasks.

27 References Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao,
Saurabh Tiwary, Rangan Majumder, and Li Deng. 2016. MS MARCO: A human generated machine reading comprehension dataset. InProceedings of the Workshopon CogniNve ComputaNon: IntegraNng neural and symbolic approaches 2016 co-located with the 30th AnnualConference on Neural InformaNon Processing Systems (NIPS 2016), Barcelona, Spain, December 9, 2016. Pranav Rajpurkar, Jian Zhang, KonstanNn Lopyrev, and Percy Liang. 2016. Squad: 100,000+ quesNons for ma-chine comprehension of text.Proceedings of the 2016 Conference on Empirical Methods in Natural LanguageProcessing. Pranav Rajpurkar, Robin Jia, and Percy Liang. 2018. Know what you don’t know: Unanswerable quesNons forsquad.Proceedings of the 56th Annual MeeNng of the AssociaNon for ComputaNonal LinguisNcs (Volume 2:Short Papers). Siva Reddy, Danqi Chen, and Christopher D. Manning. 2019. Coqa: A conversaNonal quesNon answering chal-lenge.TransacNons of the AssociaNon for ComputaNonal LinguisNcs, 7:249–266, Mar. Kai Sun, Dian Yu, Jianshu Chen, Dong Yu, Yejin Choi, and Claire Cardie. 2019. DREAM: A Challenge Data Setand Models for Dialogue-Based Reading Comprehension.TransacNons of the AssociaNon for ComputaNonalLinguisNcs, 7:217–231. Alon Talmor and Jonathan Berant. 2018. The web as a knowledge-base for answering complex quesNons.Pro-ceedings of the 2018 Conference of the North American Chapter of the AssociaNon for ComputaNonal Linguis-Ncs: Human Language Technologies, Volume 1 (Long Papers). Trieu H. Trinh and Quoc V. Le. 2018. A Simple Method for Commonsense Reasoning.arXiv, 1806.02847. Adam Trischler, Tong Wang, Xingdi Yuan, JusNn Harris, Alessandro Sordoni, Philip Bachman, and Kaheer Sule-man. 2017. Newsqa: A machine comprehension dataset.Proceedings of the 2nd Workshop on RepresentaNonLearning for NLP. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, andIllia Polosukhin. 2017. AienNon is all you need. InProceedings of the 31st InternaNonal Conference onNeural InformaNon Processing Systems, NIPS’17, pages 6000–6010, USA. Curran Associates Inc. Zhengzhe Yang and Jinho D. Choi. 2019. FriendsQA: Open-domain quesNon answering on TV show transcripts.InProceedings of the 20th Annual SIGdial MeeNng on Discourse and Dialogue, pages 188–197, Stockholm,Sweden, September. AssociaNon for ComputaNonal LinguisNcs. Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. 2019. Xlnet:Generalized autoregressive pretraining for language understanding. In H. Wallach, H. Larochelle, A. Beygelz-imer, F. d'Alch ́e-Buc, E. Fox, and R. Garnei, editors,Advances in Neural InformaNon Processing Systems 32,pages 5754–5764. Curran Associates, Inc

28 References Eunsol Choi, He He, Mohit Iyyer, Mark Yatskar,
Wen-tau Yih, Yejin Choi, Percy Liang, and Luke Zeilemoyer.2018. Quac: QuesNon answering in context.Proceedings of the 2018 Conference on Empirical Methods inNatural Language Processing. Alexis CONNEAU and Guillaume Lample. 2019. Cross-lingual language model pretraining. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alch ́e-Buc, E. Fox, and R. Garnei, editors,Advances in Neural InformaNonProcessing Systems 32, pages 7057–7067. Curran Associates, Inc. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and KrisNna Toutanova. 2019. BERT: Pre-training of Deep Bidirec-Nonal Transformers for Language Understanding. InProceedings of the 2019 Conference of the North AmericanChapter of the AssociaNon for ComputaNonal LinguisNcs: Human Language Technologies, NAACL’19, pages4171–4186. Aaron Gokaslan and Vanya Cohen, 2019. OpenWebText Corpus.Mohit Iyyer, Wen-tau Yih, and Ming-Wei Chang. 2017. Search-based neural structured learning for sequenNalquesNon answering. InProceedings of the 55th Annual MeeNng of the AssociaNon for ComputaNonal Lin-guisNcs (Volume 1: Long Papers), pages 1821–1831, Vancouver, Canada, July. AssociaNon for ComputaNonalLinguisNcs. Mandar Joshi, Eunsol Choi, Daniel Weld, and Luke Zeilemoyer. 2017. Triviaqa: A large scale distantly super-vised challenge dataset for reading comprehension.Proceedings of the 55th Annual MeeNng of the AssociaNonfor ComputaNonal LinguisNcs (Volume 1: Long Papers). Tom ́aˇs Koˇcisk ́y, Jonathan Schwarz, Phil Blunsom, Chris Dyer, Karl Moritz Hermann, G ́abor Melis, and EdwardGrefensteie. 2018. The narraNveqa reading comprehension challenge.TransacNons of the AssociaNon forComputaNonal LinguisNcs, 6:317–328, Dec. Zhenzhong Lan, Mingda Chen, SebasNan Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2019. Albert: A lite bert for self-supervised learning of language representaNons. Yinhan Liu, Myle Oi, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, LukeZeilemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly OpNmized BERT Pretraining Approach.arXiv, 1907.11692. SebasNan Nagel, 2016. News Dataset Available.

Transformers to Learn Hierarchical Contexts in ...

Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-based Question Answering

Emory NLP

More Decks by Emory NLP

Other Decks in Technology

Featured

Transcript

Annual Conference of the Associa1on for Computa1onal Linguis1cs Changmao Li

2 Recent Contextual Language modeling Approach From original BERT paper

3 • Corpus • Friends Dialogue Transcript • Character Mining

4 Corpus • 10 seasons of the Friends show •

5 FriendsQA Only Annotated first 4 seasons From FriendsQA dataset

6 Related question answering Tasks – general domain datasets •

7 Approach • Problem with Original Transformer Based language modeling

8 Approach • Pretraining with transferred BERT/RoBERTa weights – Token

9 Pretraining • Stage 1: Token level Masked Language Model

10 Pretraining • Stage 2: UGerance Level Masked language model

11 Pretraining • Stage 3: Utterance order prediction

12 Fine-tuning • FriendsQA task – The utterance ID prediction

13 Experiments • FriendsQA task – Chronological Data Split

14 Experiments • FriendsQA task – EvaluaDon Metrics: • EM:

15 Results for Friends QA

16 Analysis • Ablation Studies Method EM SM UM BERTpre

17 Analysis • FriendsQA Tasks – QuesDon type analysis

18

19 Analysis • Error Analysis

20 Analysis • Error examples

21 Analysis • Error examples

22 Analysis • Error examples

23 Analysis • FriendsQA Task Remained Challenges – Inference in

24 Conclusion • A novel transformer approach that interprets hierarchical

25 List of Contributions • New pre-training tasks are introduced

26 Future work • Figure out how to represent speakers

27 References Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao,

28 References Eunsol Choi, He He, Mohit Iyyer, Mark Yatskar,