▪ Beyond Accuracy: Behavioral Testing of NLP Models with CheckList [Ribeiro+] ▪ Best Paper (Honorable mention) ▪ Tangled up in BLEU: Reevaluating the Evaluation of Automatic Machine Translation Evaluation Metrics [Mathur+] ▪ Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks [Gururangan+] AGENDA
▪ Beyond Accuracy: Behavioral Testing of NLP Models with CheckList [Ribeiro+] ▪ Best Paper (Honorable mention) ▪ Tangled up in BLEU: Reevaluating the Evaluation of Automatic Machine Translation Evaluation Metrics [Mathur+] ▪ Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks [Gururangan+] AGENDA
Translation Dialogue and Interactive Systems Generation Question Answering Sentiment Analysis, Argument Mining Word-level Semantics Applications Resources and Evaluation Multidisciplinary, AC COI Sentience-level Semantics Tagging, Chunking, Syntax, Parsing Social Media Summarization Document Analysis Multilinguality Textual Inference, Other Areas of Semantics Discourse and Pragmatics Phonology, Morphology, Word Segmentation 2019 2020 Machine Learning for NLP Dialogue and Interactive Systems Machine Translation Information Extraction NLP Application Generation Question Answering Resources and Evaluation Summarization Computational Social Science and Social Media Semantics: Sentence Level Interpretability and Analysis of Models for NLP Semantics: Lexical Information Retrieval and Text Mining Language Grounding to Vision, Robotics and Beyond Theme Cognitive Modeling and Psycholinguistics Speech and Multimodality Syntax: Tagging, Chunking and Parsing Multidisciplinary and Area Chair COI Discourse and Pragmatics Phonology, Morphology and Word Segmentation Ethics and NLP Sentiment Analysis, Stylistic Analysis, and Argument Mining Semantics: Textual Inference and Other Areas of Semantics Theory and Formalism in NLP (Linguistic and Mathematical) Vision, Robotics,Multimodal Grounding, Speech Linguistic Theories, Cognitive, Psycholinguistics : New : 200+ submissions
▪ Beyond Accuracy: Behavioral Testing of NLP Models with CheckList [Ribeiro+] ▪ Best Paper (Honorable mention) ▪ Tangled up in BLEU: Reevaluating the Evaluation of Automatic Machine Translation Evaluation Metrics [Mathur+] ▪ Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks [Gururangan+] AGENDA
▪ Beyond Accuracy: Behavioral Testing of NLP Models with CheckList [Ribeiro+] ▪ Best Paper (Honorable mention) ▪ Tangled up in BLEU: Reevaluating the Evaluation of Automatic Machine Translation Evaluation Metrics [Mathur+] ▪ Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks [Gururangan+] AGENDA
DA: WMT2019で構築されたMTシステムの出力に対して、アノテータが 100段階の評価を付けた上で、アノテータ毎に標準化して平均を取る ▪ どの言語の翻訳タスクに対しても、BLEUは高い相関を持つという結果 ▪ 現在も事実上標準の評価指標として用いられている Tangled up in BLEU: Reevaluating the Evaluation of Automatic Machine Translation Evaluation Metrics [Mathur+] I have a pen. ペンを持つ。 ペンを持っています。 原文 MT Human annotation DA (Direct Assessment) 50 0 100 BLEU: 28
▪ Beyond Accuracy: Behavioral Testing of NLP Models with CheckList [Ribeiro+] ▪ Best Paper (Honorable mention) ▪ Tangled up in BLEU: Reevaluating the Evaluation of Automatic Machine Translation Evaluation Metrics [Mathur+] ▪ Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks [Gururangan+] AGENDA