and Yejin Choi. Unsupervised Commonsense Question Answering with Self-Talk. EMNLP 2020. (2) Lianhui (Karen) Qin, Vered Shwartz, Peter West, Chandra Bhagavatula, Jena Hwang, Ronan Le Bras, Antoine Bosselut, and Yejin Choi. Back to the Future: Unsupervised Backprop-based Decoding for Counterfactual and Abductive Commonsense Reasoning. EMNLP 2020. (3) Rachel Rudinger, Vered Shwartz, Jena D. Hwang, Chandra Bhagavatula, Maxwell Forbes, Ronan Le Bras, Noah A. Smith, and Yejin Choi. Thinking Like a Skeptic: Defeasible Inference in Natural Language. Findings of EMNLP 2020. (4) Faeze Brahman, Vered Shwartz, Rachel Rudinger, and Yejin Choi. Learning to Rationalize for Nonmonotonic Reasoning with Distant Supervision. AAAI 2021. (5) Vered Shwartz and Yejin Choi. Do Neural Language Models Overcome Reporting Bias? COLING 2020. (6) Ari Holtzman, Peter West, Vered Shwartz, Yejin Choi, and Luke Zettlemoyer. Surface Form Competition: Why the Highest Probability Answer Isn’t Always Right. arXiv 2021. (7) Maxwell Forbes, Jena D. Hwang, Vered Shwartz, Maarten Sap, and Yejin Choi. Social Chemistry 101: Learning to Reason about Social and Moral Norms. EMNLP 2020. (8) Maarten Sap, Vered Shwartz, Antoine Bosselut, Dan Roth, and Yejin Choi. Introductory Tutorial on Commonsense Reasoning. ACL 2020. (9) Adam Poliak, Jason Naradowsky, Aparajita Haldar, Rachel Rudinger, Benjamin Van Durme. Hypothesis Only Baselines in Natural Language Inference. *SEM 2018. (10) Aishwarya Agrawal, Dhruv Batra, and Devi Parikh. Analyzing the Behavior of Visual Question Answering Models. EMNLP 2016. (11) Allyson Ettinger. 2020. What BERT is not: Lessons from a new suite of psycholinguistic diagnostics for language models. TACL 2020 (12) Alon Talmor, Jonathan Herzig, Nicholas Lourie, and Jonathan Berant. CommonsenseQA: A question answering challenge targeting commonsense knowledge. NAACL 2019. (13) Antoine Bosselut, Hannah Rashkin, Maarten Sap, Chaitanya Malaviya, Asli Celikyilmaz, and Yejin Choi. 2019. COMET: Commonsense transformers for automatic knowledge graph construction. ACL 2019. (14) Ben Zhou, Daniel Khashabi, Qiang Ning, and Dan Roth. going on a vacation takes longer than going for a walk: A study of temporal commonsense understanding. EMNLP 2019. (15) Chandra Bhagavatula, Ronan Le Bras, Chaitanya Malaviya, Keisuke Sakaguchi, Ari Holtzman, Hannah Rashkin, Doug Downey, Wen-tau Yih, and Yejin Choi. Abductive Commonsense Reasoning. ICLR 2020. (16) Christian Szegedy, et al. Going deeper with convolutions. Proceedings of the IEEE conference on computer vision and pattern recognition. 2015. (17) Fernando Martínez-Plumed, Ricardo B.C. Prudêncio, Adolfo Martínez-Usó, and José Hernández-Orallo. Item response theory in AI: Analysing machine learning classifiers at the instance level. Artificial Intelligence 2019. (18) Jekaterina Novikova, Ondřej Dušek, Amanda Cercas Curry, Verena Rieser. Why We Need New Evaluation Metrics for NLG. EMNLP 2017. References (1) 79