Devlin+19, BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, NAACL2019 [3] Shen+18, Baseline Needs More Love: On Simple Word- Embedding-Based Models and Associated Pooling Mechanisms, ACL2018 ©The Asahi Shimbun Company 2020