ライティング⽀援のための⽂法誤り訂正 三⽥ 雅⼈(理化学研究所/東京都⽴⼤学) 2022-02-07, 招待講演@株式会社NTTドコモ

⾃⼰紹介 1 • 三⽥ 雅⼈(Masato Mita) − ⾃然⾔語処理(NLP)の研究者です − 特に,NLPの教育応⽤に関⼼があります − • 経歴 − 2016.3 NAIST 松本研で博⼠前期課程 修了 − 2016.4-2018.1 ⽇本マイクロソフト株式会社 勤務 − 2018.2-現在 理化学研究所 AIPセンター 勤務 − 2021.9 東北⼤学 乾研で博⼠号取得 − 2021.10-現在 東京都⽴⼤学 ⼩町研 特任助教兼任 • 最近のアクティビティ − ⽂法誤り訂正に関するアドベントカレンダーを企画しました −

研究トピック(抜粋) 2 • ⽂法誤り訂正 − Masato Mita, Hitomi Yanaka. Do Grammatical Error Correction Models Realize Grammatical Generalization? ACL 2021 (Findings). − Masato Mita, Shun Kiyono, Masahiro Kaneko, Jun Suzuki, Kentaro Inui. A Self- Refinement Strategy for Noise Reduction in Grammatical Error Correction. EMNLP 2020 (Findings). − Masahiro Kaneko, Masato Mita, Shun Kiyono, Jun Suzuki, Kentaro Inui. Can Encoder- decoder Models Benefit from Pre-trained Language Representation in Grammatical Error Correction? ACL 2020. − Shun Kiyono, Jun Suzuki, Masato Mita, Tomoya Mizumoto, Kentaro Inui. An Empirical Study of Incorporating Pseudo Data to Grammatical Error Correction. EMNLP 2019. − Masato Mita, Tomoya Mizumoto, Masahiro Kaneko, Ryo Nagata, Kentaro Inui. Cross- Corpora Evaluation and Analysis of Grammatical Error Correction Models ‒ Is Single- Corpus Evaluation Enough? NAACL 2019. • 解説⽂⽣成 − Ryo Nagata, Masato Hagiwara, Kazuaki Hanawa, Masato Mita, Artem Chernodub, Olena Nahorna. Shared Task on Feedback Comment Generation for Language Learners. INLG 2021. • ⾃動採点 − Hiroaki Funayama, Shota Sasaki, Yuichiro Matsubayashi, Tomoya Mizumoto, Jun Suzuki, Masato Mita, Kentaro Inui. Preventing Critical Scoring Errors in Short Answer Scoring with Confidence Estimation. ACL-SRW 2020.

NLP×教育(抜粋) 3 • リーディング⽀援 − テキスト平易化(Text Simplification) − 難解な表現で書かれたテキストをより平易な表現に変換 − 語彙推定(Vocabulary Prediction) − 学習者が覚えていない語・覚えるべき語の推定 • ライティング⽀援 − ⾃動採点(Automated Essay Scoring) − 記述されたエッセイや答案に評価値を⾃動付与 − ⻑⽂記述問題の⾃動採点(Essay Scoring) − 短答式答案の⾃動採点(Short Answer Scoring) − ⽂法誤り訂正(Grammatical Error Correction) − テキストに含まれる⽂法誤りを⾃動訂正 − 教育応⽤系研究の中では最も盛ん

NLP×教育(抜粋) 4 • リーディング⽀援 − テキスト平易化(Text Simplification) − 難解な表現で書かれたテキストをより平易な表現に変換 − 語彙推定(Vocabulary Prediction) − 学習者が覚えていない語・覚えるべき語の推定 • ライティング⽀援 − ⾃動採点(Automated Essay Scoring) − 記述されたエッセイや答案に評価値を⾃動付与 − ⻑⽂記述問題の⾃動採点(Essay Scoring) − 短答式答案の⾃動採点(Short Answer Scoring) − ⽂法誤り訂正(Grammatical Error Correction) − テキストに含まれる⽂法誤りを⾃動訂正 − 教育応⽤系研究の中では最も盛ん

本講演について 5 • ⽬標: ⽂法誤り訂正(GEC)における代表的なデータセット・評価⽅法・ アプローチと最新の研究成果を紹介し,研究全体の潮流と最新動向 の俯瞰的な理解を⽬指します • ターゲット: − GECに初めて触れる⽅ − 研究の全体像や最新動向をざっくりと把握したい⽅

⽬次 6 • 1. ⽂法誤り訂正の概要 − 研究の潮流と現在の到達点を知る − 代表的なデータセット・評価⽅法・アプローチなど研究の前提知識 を知る • 2. ⽂法誤り訂正の最前線 − 最新動向を知る − 現在分野が抱えている課題感と今後の⽅向性を知る

1. ⽂法誤り訂正の概要 7

⽂法誤り訂正 (Grammatical Error Correction) 8 • テキストに含まれる様々な⽂法誤りを⾃動訂正するタスク The machine is design to help people. The machine is designed to help people. GECモデル n (NLP全般に⾔えることだが)何を⽂ 法誤りとするかは分野や⽴場により⼀ 貫しないため,誤りの守備範囲はコー パスのアノテーションの定義に従う n 例えば,現在のGECでは狭義の⽂法誤 り以外にも語彙選択や語の並び替えな ど流暢性に関連する広義の誤りも対象 にしている n この辺りの話は「GECのタスク説明は なぜ難しいか」でも説明されている • 実⽤化もたくさんされている Ø Grammarly1, Ginger2 …など 1. 2.

History 9 Grundkiewicz et al. (2020) より • 黎明期(~2010) − 冠詞や前置詞などのclosed classな⽂法誤りを対象 − ルールや⾔語モデルに基づくアプローチが主流 − 各々の研究者が独⾃の評価スクリプトで評価 • 過渡期(2011~2015) − 共通のベンチマーク(⾃動評価尺度・評価データ)の上でシステム性能を競 うShared Taskが4年連続開催(HOO2011-12, CoNLL2013-14) − CoNLL-2014からは全ての誤りが対象 − 分類器に基づく⼿法と統計的機械翻訳(SMT)に基づく⼿法が2トップ • 近年(2016~) − 深層ニューラルネットワーク(DNN)に基づく⼿法が台頭

主流なアプローチ 10 The machine is design to help people. The machine is designed to help people. DNNに基づく系列変換モデル(Seq2Seqモデル): Ø ⽂法的に誤った⽂から正しい⽂への機械翻訳(MT) 利点: ü パラレルデータ(誤り⽂, 訂正⽂)さえあればモデルが訓練可能 ü シンプル,かつ⾔語依存のツールが必要ない ü 全ての誤りを訂正可能 ü MTの最先端の研究成果を援⽤可能

様々なアプローチが提案されている 11 アプローチ リファレンス RNN Yuan and Briscoe (2016); Xie et al. (2016); Sakaguchi et al. (2017); Schmaltz et al. (2017); Ji et al. (2017); Grundkiewicz and Junczys-Dowmunt (2018); Junczys-Dowmunt et al. (2018); Lo et al. (2018); Nadejde and Tetreault (2019) CNN Chollampatt and Ng (2018a,b); Hotate et al. (2019); Ge et al. (2019); Chollampatt et al. (2019) Transformer Zhao et al. (2019); Hotate et al. (2020); Zhao and Wang (2020); Lichtarge et al. (2020); Kaneko et al. (2020); Mita et al. (2020); Katsumata and Komachi (2020); Liu et al. (2021); Yuan and Bryant (2021); Rothe et al. (2021); Sun et al. (2021) GAN Raheja and Alikaniotis (2020); Parnow et al. (2021) 系列ラベリング Awasthi et al. (2019); Malmi et al. (2019); Omelianchuk et al. (2020); Stahlberg and Kumar (2020); Parnow et al. (2021) 教師なし/半教師あり Bryant (2018); Stahlberg et a. (2019); Grundkiewicz and Junczys-Dowmunt (2019); Náplava and Straka (2019); Alikaniotis and Raheja (2019); Flachs et al. (2021); Yasunaga et al. (2021

システム性能の変遷 12 Wang et al. (2020) より CoNLL-2014のトップシステム [Junczys-Dowmunt and Grundkiewicz, 2014] CNNに基づくシステム [Chollampatt+2018] 初のNMTアプローチ [Yuan and Briscoe, 2016] Transformerに基づくシステム [Zhao+2019]

擬似データの活⽤ [Kiyono et al., 2019; Zhao et al., 2019] 13 擬似誤り訂正ペアデータ 訓練 モデル 真の誤り訂正ペアデータ GECはMTと⽐べて利⽤できるデータが限られている(=低資源タスク) Ø 擬似誤りを作って訓練データとして活⽤! ⽣成元コーパス (e.g. Wikipedia) ⽂法的に正しい⽂集合 擬似データ ⽣成⼿法 擬似誤りデータ “He goes to school.” “He go at school.” 上図は Kiyono et al. (2019) の著者スライドを参考

様々な擬似データ⽣成⼿法が提案されている 14 擬似データ⽣成⼿法 リファレンス ルールベース/確率的 Foster and Andersen (2009); Felice and Yuan (2014); Awasthi et al. (2019); Choe et al. (2019); Grundkiewicz and Junczys-Dowmunt (2019); Kiyono et al. (2019); Qiu et al. (2019); Xu et al. (2019); Zhao et al. (2019); Takahashi et al. (2020); White and Rozovskaya (2020); Yin et al. (2020); Flachs et al. (2021); Koyama et al. (2021) SMT 逆翻訳 Rei et al. (2017) NMT 逆翻訳 Kasewa et al.(2018); Xie et al. (2018); Htut and Tetreault (2019); Kiyono et al. (2019); Koyama et al. (2021) NMT 折り返し翻訳 Lichtarge et al. (2019) 敵対的⽣成 Wang and Zheng (2020); Yin et al. (2020)

現在の到達点 15 Precision Recall F0.5 Our nearly SOTA system [Kiyono et al., 2019] 89.38 53.36 78.75 ⼈間の専⾨家 [Ge et al., 2018] - - 72.58 CoNLL-10 ベンチマーク[Bryant and Ng, 2015] • 全体の約53%の誤りに対して約89%の精度で訂正可能 Ø 実際には⾒かけよりもかなり良い数値

ゴール感の⾒直し: Fluency editの登場 16 従来のGECのゴール: ⽂法的に正しい⽂章にするための最⼩限の編集(Minimal edit) Ø ⽂法的に正しい⽂章が必ずしも⺟語話者にとって⾃然なものとは限らない [Sakaguchi et al., 2016; Napoles et al., 2017] Sakaguchi et al. (2016)の提唱: • GECのゴールを「⽂法的に正しい⽂章の作成」から「⺟語話者の流暢さをもつ⽂ 章の作成(Fluency edit)」へと根本的にシフトすべき Ø Napoles et al. (2017)によりFluency editに対応した評価データ “JFLEG”が提供 され,以後GECの標準的なベンチマークとなった 上の例は著者ブログより抜粋 Original From this scope, social media has shorten our distance. Minimal edit From this scope, social media has shortened our distance. Fluency edit From this perspective, social media has shortened the distance between us.

学習者の習熟度付きデータセットの提供 17 • GECシステムの性能は書き⼿の習熟度や⺟語などに起因した誤りのバリ エーションに⼤きく影響される [Mita et al., 2019] Ø BEA 2019 Shard Task [Bryant et al., 2019]ではCEFRに準拠した3段階の習熟 度(A,B,C)の学習者および⺟語話者(N)が書いた作⽂からなるデー タセット W&I+LOCNESS [Bryant et al., 2019, Granger, 1998]を提供 三⽥ら (2021) より Kiyono et al. (2020)より

現在のシステムの得意・不得意 18 • 機能語・形態素語に関する誤りは上⼿く対処できている – MORPH (Morphology): quick → quickly – VERB INFL (Verb Inflection): getted → got – NOUN INFL (NOUN Inflection): informations → information – VERB SVA (Subject-Verb Agreement): (He) have → (He) has Bryant et al. (2019) より

現在のシステムの得意・不得意 19 • 内容⽤語に関する誤り(語彙選択)は苦戦 – ADJ (Adjective): big → wide – ADV (Adverb): speedliy → quickly – NOUN (Noun): person → people – VERB (Verb): ambulate → walk → ⽂外⽂脈や書き⼿の意図などテ キストをより深く理解する必要あり Bryant et al. (2019) より

英語以外の⾔語を対象とした研究も増えてきた 20 徐々に多⾔語GEC研究のためのリソースが整備され始めてきた ⾔語 コーパス 多⾔語 GitHub Typo Corpus [Hagiwara and Mita, 2020] アラビア語 QLAB [Zaghouani et al., 2014], ALC [Alfaifi and Atwell,2014] 中国語 TOCFL [Lee et al., 2018] チェコ語 AKCES-GEC [Náplava and Straka, 2019] ドイツ語 Falko-MERLIN [Boyd, 2018] ⽇本語 TEC-JL [Koyama et al., 2020] ロシア語 RULEC-GEC [Rozovskaya and Roth, 2019], Ru-Lang8 [Trinh and Rozovskaya, 2021] スペイン語 COWS-L2H [Davidson et al., 2020] ウクライナ語 UA-GEC [Syvokon and Nahorna, 2021] ルーマニア語 RONACC [Cotet et al., 2020] ヒンディー語 HiWikiEd [Sonawane et al., 2020] n 英語以外を対象としたGECについては「中国語GEC」や「英語・⽇本語・中国語以外の⾔語のGEC」に詳しく説明されている

代表的なデータセット 22 コーパス ⽂数 参照数 習熟度 NUCLE [Dahlmeier and Wu, 2013] 57K 1 上級 CLC-FCE [Yannakoudakis et al., 2011] 32.8K 1 中・上級 Lang-8 [Mizumoto et al., 2012; Tajiri et al., 2012] 1.04M 1 多様 W&I+LOCNESS [Bryant et al., 2019; Granger 1998] 80.9K 5* 多様 CoNLL-2013 [Ng et al., 2013] 1.3K 1 上級 CoNLL-2014 [Ng et al., 2014] 1.3K 2 上級 JFLEG [Napoles et al., 2017] 1.4K 4* 多様 *評価セットのみ BEA-2019 Shared Task でこれら4つをまとめて 公式データセットとし て提供したため,これ らのデータセット群に 対して“BEA-2019 dataset” と呼ぶことも 最近の「⼀般的な」実験設定: Ø BEA-2019 Shared Taskの分割フォーマットに準拠 訓練セット: “BEA-train (NUCLE,CLC-FCE, Lang-8,W&I train)” 開発セット: “BEA-dev (W&I+LOCNESS dev)” and/or CoNLL-2013 and/or JFLEG (dev) 評価セット: “BEA-test (W&I+LOCNESS test)” and/or CoNLL-2014 and/or JFLEG (test) とりあえずこの設定で 実験すれば⽂句は⾔わ れない(はず…)

⼀般的な評価⼿法 23 参照あり評価: 原⽂, システム出⼒, 参照訂正⽂の3つ組を使って評価 代表的な参照あり評価⼿法: • M2 Scorer [Dahlmeier and Ng, 2012] • GLEU [Napoles et al. 2015, Napoles et al. 2016] • ERRANT [Bryant et al. 2017] People get certain disease because of genetic changes . People get certain diseases because of genetic changes . People get certain diseases because of genetic mutations . 原⽂: システム出⼒: 参照訂正⽂: スコアラ スコア

M2 (Max Match) Scorer 24 • CoNLL-2013/2014 Shard Taskの公式スコアラ 1. レーベンシュタインを⽤いて原⽂とシステム出⼒のアラインメントを取る際,参 照⽂における編集と最も⻑く⼀致するようなアラインメントを動的に選択 2. True positive (TP), False Positive (FP), False Negative (FN)をカウントすること で適合率(#TP/(#TP+#FP)), 再現率 (#TP/(#TP+#FN)), F値を算出 • Pros: − ⼈間のアラインメントと直感的に合う • Cons: − 部分的なマッチが無視される システム: is eat→has eaten vs. 参照: is eat → has eaten − FPの数が不当に削減される 原⽂: He looked at the cat . vs. システム: He looks at a cat . M2: looked at the → looks at a = 1FP ⼈間: looked → looks, the → a = 2FP 適合率= 1/(1+1) = 0.5 再現率= 1/(1+1) = 0.5 I has eat meal . We have eaten meal . I have eaten meals . 原⽂ システム 参照⽂ CoNLL-2014以来, F0.5 (適合率重視) が⼀般的に⽤いられる

GLEU(Generalized BLEU) 25 • MTの評価に使われるBLEUをGEC⽤に改良した評価尺度 • 最初に提案されたGLEU [Napoles et al., 2016] は重み項が⽤意されていたが,その後 チューニング不要な簡略化版 GLEU+が提案された[Napoles et al., 2017] • システム出⼒⽂(H)と参照⽂(R)で⼀致するn-gram数から,⼊⼒⽂(S)に 出現するが参照⽂に出現しないn-gram数を減算することで算出 𝐺𝐿𝐸𝑈! = 𝐵𝑃・ exp(, "#$ % 1 𝑛 log(𝑝" & )) 𝑝" & = 𝑁 𝐻, 𝑅 − [𝑵 𝑯, 𝑺 − 𝑵(𝑯, 𝑺, 𝑹)] 𝑁 (𝐻) n 現在はGLEU+が使⽤されて ることが⼀般的 n 著者のコードのデフォルト 設定がGLEU+で設定してい るため,意図せず使⽤して いることも含めて n そのため,GLEU+を使⽤し ている(だろう)と思われ る場合でも Napoles et al. (2017)が引⽤されていない 論⽂もしばしば⾒る(かく いう私も昔は…スッ) ※ N (A,B,C, …)は集合間でのn-gram重なり数, BPはBLEUと同様にbrave penaltyを表す • Pros: − パラレルデータだけでok(M2のように編集情報付きの参照が必要ない) − M2と⽐べて⼈間の判断と⾮常に⾼い相関がある • Cons: − 解釈性が低い − 識別⼒が低い(例: 68−78 GLEU ≈ 40−75 𝐹'.) )

ERRANT 26 • M2 Scorerの改良版 • BEA-2019 Shard Taskの公式スコアラ • M2との⼤まかな差分: − マージルールや⾔語的情報(POS, lemma情報など)によって強化 されたレーベンシュタインを⽤いてより⾼精度な原⽂とシステム出 ⼒間の⾃動アラインメントを実現 − パラレルデータから⾃動的にシステムの編集抽出および誤りタイプ の分類が可能 n 最近はERRANTの多⾔語化も進んでいる(逆にいうと,それくらい分野にとってインパクトが⼤きかった) n ERRANTの使い⽅は,GECアドカレ:⽂法誤り訂正の評価ツール ERRANT の使い⽅ に丁寧に説明されている • Pros: − 誤りタイプ毎の性能が評価できエラー分析がしやすい − パラレルデータだけでok − M2のconsであったFP数不当に削減される問題を解消 • Cons: − 他のリソース(spaCyなど)に依存 − M2の拡張なので⼈間の判断との相関は⽐較的低い

参照あり評価⼿法の限界 27 • 妥当な参照訂正⽂(正解データ)は多岐にわたる − 参照あり評価では正解データに含まれない「正解」を扱えない − 正解データ作成はコストが⾼いためスケールさせるのも難しい Grundkiewicz et al. (2020) より

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 41

Slide 42

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47

n チュートリアル・サーベイ関連 • Grundkiewicz, Roman and Bryant, Christopher and Felice, Mariano. 2020. A Crash Course in Automatic Grammatical Error Correction. In Proceedings of the 28th International Conference on Computational Linguistics (COLING): Tutorial Abstracts, pages33‒38. • Wang, Yu, Yuelin Wang, Jie Liu, and Zhuo Liu. 2020. A comprehensive survey of grammar error correction. n アプローチ関連 • Yuan, Zheng and Ted Briscoe. 2016. Grammatical error correction using neural machine translation. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 380‒386. • Xie, Ziang, Anand Avati, Naveen Arivazhagan, Dan Jurafsky, and Andrew Y. Ng. 2016. Neural language correction with character-based attention. • Sakaguchi, Keisuke, Matt Post, and Benjamin Van Durme. 2017. Grammatical error correction with neural reinforcement learning. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 366‒372. • Schmaltz, Allen, Yoon Kim, Alexander Rush, and Stuart Shieber. 2017. Adapting sequence models for sentence correction. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2807‒2813.

• Ji, Jianshu, Qinlong Wang, Kristina Toutanova, Yongen Gong, Steven Truong, and Jianfeng Gao. 2017. A nested attention neural hybrid model for grammatical error correction. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 753‒762. • Grundkiewicz, Roman and Marcin Junczys-Dowmunt. 2018. Near human-level performance in grammatical error correction with hybrid machine translation. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 284‒290. • Junczys-Dowmunt, Marcin, Roman Grundkiewicz, Shubha Guha, and Kenneth Heafield. 2018. Approaching neural grammatical error correction as a low- resource machine translation task. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 595‒606. • Lo, Yu-Chun, Jhih-Jie Chen, Chingyu Yang, and Jason Chang. 2018. Cool English: a grammatical error correction system based on large learner corpora. In Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations, pages 82‒85. • Nadejde, Maria and Joel Tetreault. 2019. Personalizing grammatical error correction: Adaptation to proficiency level and L1. In Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019), pages 27‒33.

• Chollampatt, Shamil and Hwee Tou Ng. 2018a. A multilayer convolutional encoder-decoder neural network for grammatical error correction. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), pages 5755‒5762. • Chollampatt, Shamil and Hwee Tou Ng. 2018b. Neural quality estimation of grammatical error correction. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2528‒2539. • Hotate, Kengo, Masahiro Kaneko, Satoru Katsumata, and Mamoru Komachi. 2019. Controlling grammatical error correction using word edit rate. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pages 149‒154. • Ge, Tao, Xingxing Zhang, Furu Wei, and Ming Zhou. 2019. Automatic grammatical error correction for sequence-to-sequence text generation: An empirical study. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 6059‒6064. • Chollampatt, Shamil, Weiqi Wang, and Hwee Tou Ng. 2019. Cross-sentence grammatical error correction. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 435‒445. • Zhao, Wei, Liang Wang, Kewei Shen, Ruoyu Jia, and Jingming Liu. 2019. Improving grammatical error correction via pre-training a copy-augmented architecture with unlabeled data. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 156‒165.

• Hotate, Kengo, Masahiro Kaneko, and Mamoru Komachi. 2020. Generating diverse corrections with local beam search for grammatical error correction. In Proceedings of the 28th International Conference on Computational Linguistics, pages 2132‒2137. • Zhao, Zewei and Houfeng Wang. 2020. Maskgec: Improving neural grammatical error correction via dynamic masking. Proceedings of the AAAI Conference on Artificial Intelligence, 34(01):1226‒1233. • Lichtarge, Jared, Chris Alberti, and Shankar Kumar. 2020. Data weighted training strategies for grammatical error correction. Transactions of the Association for Computational Linguistics, 8:634‒646. • Kaneko, Masahiro, Masato Mita, Shun Kiyono, Jun Suzuki, and Kentaro Inui. 2020. Encoder-decoder models can benefit from pre-trained masked language models in grammatical error correction. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4248‒4254. • Mita, Masato, Shun Kiyono, Masahiro Kaneko, Jun Suzuki, and Kentaro Inui. 2020. A self-refinement strategy for noise reduction in grammatical error correction. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 267‒280. • Katsumata, Satoru and Mamoru Komachi. 2020. Stronger baselines for grammatical error correction using a pretrained encoder-decoder model. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, pages 827‒832.

• Liu, Zhenghao, Xiaoyuan Yi, Maosong Sun, Liner Yang, and Tat-Seng Chua. 2021. Neural quality estimation with multiple hypotheses for grammatical error correction. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5441‒5452. • Yuan, Zheng and Christopher Bryant. 2021. Document-level grammatical error correction. In Proceedings of the 16th Workshop on Innovative Use of NLP for Building Educational Applications, pages 75‒84. • Rothe, Sascha, Jonathan Mallinson, Eric Malmi, Sebastian Krause, and Aliaksei Severyn. 2021. A simple recipe for multilingual grammatical error correction. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 702‒707. • Sun, Xin, Tao Ge, Furu Wei, and Houfeng Wang. 2021. Instantaneous grammatical error correction with shallow aggressive decoding. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 5937‒5947. • Raheja, Vipul and Dimitris Alikaniotis. 2020. Adversarial Grammatical Error Correction. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 3075‒3087. • Parnow, Kevin, Zuchao Li, and Hai Zhao. 2021. Grammatical error correction as GAN-like sequence labeling. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 3284‒3290.

• Awasthi, Abhijeet, Sunita Sarawagi, Rasna Goyal, Sabyasachi Ghosh, and Vihari Piratla. 2019. Parallel iterative edit models for local sequence transduction. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4260‒4270. • Malmi, Eric, Sebastian Krause, Sascha Rothe, Daniil Mirylenka, and Aliaksei Severyn. 2019. Encode, tag, realize: High-precision text editing. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP- IJCNLP), pages 5054‒5065. • Omelianchuk, Kostiantyn, Vitaliy Atrasevych, Artem Chernodub, and Oleksandr Skurzhanskyi. 2020. GECToR ‒ grammatical error correction: Tag, not rewrite. In Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 163‒170. • Stahlberg, Felix and Shankar Kumar. 2020. Seq2Edits: Sequence transduction using span-level edit operations. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5147‒5159. • Parnow, Kevin, Zuchao Li, and Hai Zhao. 2021. Grammatical error correction as GAN-like sequence labeling. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 3284‒3290.

• Bryant, Ted, Christopherand Briscoe. 2018. Language Model Based Grammatical Error Correction without Annotated Training Data. In Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 247‒253. • Stahlberg, Felix, Christopher Bryant, and Bill Byrne. 2019. Neural Grammatical Error Correction with Finite State Transducers. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4033‒4039. • Grundkiewicz, Roman and Marcin Junczys-Dowmunt. 2019. Minimally-augmented grammatical error correction. In Proceedings of the 5th Workshop on Noisy User- generated Text (W-NUT 2019), pages 357‒363. • Náplava, Jakub and Milan Straka. 2019. Grammatical error correction in low- resource scenarios. In Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019), pages 346‒356. • Alikaniotis, Dimitris and Vipul Raheja. 2019. The unreasonable effectiveness of transformer language models in grammatical error correction. In Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 127‒133. • Flachs, Simon, Felix Stahlberg, and Shankar Kumar. 2021. Data strategies for low- resource grammatical error correction. In Proceedings of the 16th Workshop on Innovative Use of NLP for Building Educational Applications, pages 117‒122.

• Yasunaga, Michihiro, Jure Leskovec, and Percy Liang. 2021. LM-Critic: Language models for unsupervised grammatical error correction. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7752‒7763. • Junczys-Dowmunt, Marcin and Roman Grundkiewicz. 2014. The amu system in the conll-2014 shared task: Grammatical error correction by data-intensive and feature-rich statistical machine translation. In Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task, pages 25‒33. n 擬似データ⽣成⼿法関連 • Foster, Jennifer and Oistein Andersen. 2009. GenERRate: Generating errors for use in grammatical error detection. In Proceedings of the Fourth Workshop on Innovative Use of NLP for Building Educational Applications, pages 82‒90. • Felice, Mariano and Zheng Yuan. 2014. Generating artificial errors for grammatical error correction. In Proceedings of the Student Research Workshop at the 14th Conference of the European Chapter of the Association for Computational Linguistics, pages 116‒126. • Choe, Yo Joong, Jiyeon Ham, Kyubyong Park, and Yeoil Yoon. 2019. A neural grammatical error correction system built on better pre-training and sequential transfer learning. In Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 213‒227.

• Kiyono, Shun, Jun Suzuki, Masato Mita, Tomoya Mizumoto, and Kentaro Inui. 2019. An empirical study of incorporating pseudo data into grammatical error correction. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 1236‒1242. • Qiu, Mengyang, Xuejiao Chen, Maggie Liu, Krishna Parvathala, Apurva Patil, and Jungyeul Park. 2019. Improving precision of grammatical error correction with a cheat sheet. In Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 240‒245. • Xu, Shuyao, Jiehao Zhang, Jin Chen, and Long Qin. 2019. Erroneous data generation for grammatical error correction. In Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 149‒158. • Takahashi, Yujin, Satoru Katsumata, and Mamoru Komachi. 2020. Grammatical error correction using pseudo learner corpus considering learnerʼs error tendency. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pages 27‒32. • White, Max and Alla Rozovskaya. 2020. A comparative study of synthetic data generation methods for grammatical error correction. In Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 198‒208.

• Yin, Fan, Quanyu Long, Tao Meng, and Kai-Wei Chang. 2020. On the robustness of language encoders against grammatical errors. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 3386‒ 3403. • Koyama, Shota and Takamura, Hiroya and Okazaki, Naoaki. 2021. Various Errors Improve Neural Grammatical Error Correction. In Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation. • Rei, Marek, Mariano Felice, Zheng Yuan, and Ted Briscoe. 2017. Artificial error generation with machine translation and syntactic patterns. In Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, pages 287‒292. • Kasewa, Sudhanshu, Pontus Stenetorp, and Sebastian Riedel. 2018. Wronging a right: Generating better errors to improve grammatical error detection. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4977‒4983. • Xie, Ziang, Guillaume Genthial, Stanley Xie, Andrew Ng, and Dan Jurafsky. 2018. Noising and denoising natural language: Diverse backtranslation for grammar correction. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 619‒628. • Htut, Phu Mon and Joel Tetreault. 2019. The unbearable weight of generating artificial errors for grammatical error correction. In Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 478‒483.

• Koyama, Aomi, Kengo Hotate, Masahiro Kaneko, and Mamoru Komachi. 2021. Comparison of grammatical error correction using back-translation models. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop, pages 126‒135. • Lichtarge, Jared, Chris Alberti, Shankar Kumar, Noam Shazeer, Niki Parmar, and Simon Tong. 2019. Corpora generation for grammatical error correction. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 3291‒3301. • Wang, Lihao and Xiaoqing Zheng. 2020. Improving grammatical error correction models with purpose-built adversarial examples. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2858‒2869. n データセット関連 • Bryant, Christopher and Hwee Tou Ng. 2015. How far are we from fully automatic high quality grammatical error correction? In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 697‒707. • Ge, Tao, Furu Wei, and Ming Zhou. 2018b. Reaching human-level performance in automatic grammatical error correction: An empirical study.

• Sakaguchi, Keisuke and Napoles, Courtney and Post, Matt and Tetreault, Joel. 2016. Reassessing the Goals of Grammatical Error Correction: Fluency Instead of Grammaticality. Transactions of the Association for Computational Linguistics, Vol 4, pages 169‒182. • Napoles, Courtney, Keisuke Sakaguchi, and Joel Tetreault. 2017. JFLEG: a fluency corpus and benchmark for grammatical error correction. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pages 229‒234. • Mita, Masato and Mizumoto, Tomoya and Kaneko, Masahiro and Nagata, Ryo and Inui, Kentaro. 2019. Cross-Corpora Evaluation and Analysis of Grammatical Error Correction Models --- Is Single-Corpus Evaluation Enough? Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1, pages 1309‒1314. • Bryant, Christopher, Mariano Felice, Øistein E. Andersen, and Ted Briscoe. 2019. The BEA-2019 shared task on grammatical error correction. In Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 52‒75. • Sylviane Granger. 1998. The computer learner corpus: A versatile new source of data for SLA research. In Sylviane Granger, editor, Learner English on Computer, pages 3‒18. • 三⽥雅⼈, ⽔本智也, ⾦⼦正弘, 永⽥亮, 乾健太郎. 2021. ⽂法誤り訂正モデルの横断評 価. ⾃然⾔語処理, 28 巻 1 号 p. 160‒182.

• Shun. Kiyono, Jun. Suzuki, Tomoya. Mizumoto and Kentaro. Inui. 2020. Massive Exploration of Pseudo Data for Grammatical Error Correction. In IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 2134-2145. • Hagiwara, Masato and Masato Mita. 2020. GitHub typo corpus: A large-scale multilingual dataset of misspellings and grammatical errors. In Proceedings of the 12th Language Resources and Evaluation Conference, pages 6761‒6768. • Zaghouani, Wajdi, Behrang Mohit, Nizar Habash, Ossama Obeid, Nadi Tomeh, Alla Rozovskaya, Noura Farra, Sarah Alkuhlani, and Kemal Oflazer. 2014. Large scale Arabic error annotation: Guidelines and framework. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LRECʼ14). • Alfaifi, Abdullah and Atwell, Eric. 2014. An evaluation of the Arabic error tagset v2. In Proceedings of the AACL, 26‒28. • Lee, Lung-Hao, Yuen-Hsien Tseng, and Li-Ping Chang. 2018. Building a TOCFL learner corpus for Chinese grammatical error diagnosis. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). • Boyd, Adriane. 2018. Using Wikipedia edits in low resource grammatical error correction. In Proceedings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text, pages 79‒84.

• Koyama, Aomi, Tomoshige Kiyuna, Kenji Kobayashi, Mio Arai, and Mamoru Komachi. 2020. Construction of an evaluation corpus for grammatical error correction for learners of Japanese as a second language. In Proceedings of the 12th Language Resources and Evaluation Conference, pages 204‒211. • Rozovskaya, Alla and Dan Roth. 2019. Grammar error correction in morphologically rich languages: The case of Russian. Transactions of the Association for Computational Linguistics, 7:1‒17. • Trinh, Viet Anh and Alla Rozovskaya. 2021. New dataset and strong baselines for the grammatical error correction of Russian. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 4103‒4111. • Davidson, Sam, Aaron Yamada, Paloma Fernandez Mira, Agustina Carando, Claudia H. Sanchez Gutierrez, and Kenji Sagae. 2020. Developing NLP tools with a new corpus of learner Spanish. In Proceedings of the 12th Language Resources and Evaluation Conference, pages 7238‒7243. • Oleksiy Syvokon and Olena Nahorna. 2021. UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language. • Teodor-Mihai. Cotet, Stefan. Ruseti and Mihai. 2020. Dascalu, Neural Grammatical Error Correction for Romanian. IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI), pp. 625-631. • Sonawane, Ankur and Vishwakarma, Sujeet Kumar and Srivastava, Bhavana and Kumar Singh, Anil. 2020. Generating Inflectional Errors for Grammatical Error Correction in Hindi. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing: Student Research Workshop, pages 165‒171.

• Dahlmeier, Daniel, Hwee Tou Ng, and Siew Mei Wu. 2013. Building a large annotated corpus of learner English: The NUS corpus of learner english. In Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications, pages 22‒31. • Yannakoudakis, Helen, Ted Briscoe, and Ben Medlock. 2011. A new dataset and method for automatically grading ESOL texts. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 180‒189. • Mizumoto, Tomoya, Yuta Hayashibe, Mamoru Komachi, Masaaki Nagata, and Yuji Matsumoto. 2012. The effect of learner corpus size in grammatical error correction of ESL writings. In Proceedings of COLING 2012: Posters, pages 863‒ 872. • Tajiri, Toshikazu, Mamoru Komachi, and Yuji Matsumoto. 2012. Tense and aspect error correction for ESL learners using global context. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 198‒202. • Ng, Hwee Tou, Siew Mei Wu, Yuanbin Wu, Christian Hadiwinoto, and Joel Tetreault. 2013. The CoNLL-2013 shared task on grammatical error correction. In Proceedings of the Seventeenth Conference on Computational Natural Language Learning: Shared Task, pages 1‒12. • Ng, Hwee Tou, Siew Mei Wu, Ted Briscoe, Christian Hadiwinoto, Raymond Hendy Susanto, and Christopher Bryant. 2014. The CoNLL-2014 shared task on grammatical error correction. In Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task, pages 1‒14.

n 評価⽅法関連 • Dahlmeier, Daniel and Hwee Tou Ng. 2012. Better evaluation for grammatical error correction. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 568‒572. • Napoles, Courtney, Keisuke Sakaguchi, Matt Post, and Joel Tetreault. 2015. Ground truth for grammatical error correction metrics. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 588‒593. • Napoles, Courtney, Keisuke Sakaguchi, Matt Post, and Joel R. Tetreault. 2016. GLEU without tuning. CoRR, abs/1605.02592. • Bryant, Christopher, Mariano Felice, and Ted Briscoe. 2017. Automatic annotation and evaluation of error types for grammatical error correction. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 793‒805. • Napoles, Courtney, Keisuke Sakaguchi, Matt Post, and Joel Tetreault. 2015. Ground truth for grammatical error correction metrics. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 588‒593.

• Asano, Hiroki and Mizumoto, Tomoya and Inui, Kentaro. 2017. Reference-based Metrics can be Replaced with Reference-less Metrics in Evaluating Grammatical Error Correction Systems. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 343‒348. • Choshen, Leshem and Omri Abend. 2018b. Reference-less measure of faithfulness for grammatical error correction. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 124‒129. • Yoshimura, Ryoma, Masahiro Kaneko, Tomoyuki Kajiwara, and Mamoru Komachi. 2020. SOME: Reference-less sub-metrics optimized for manual evaluations of grammatical error correction. In Proceedings of the 28th International Conference on Computational Linguistics, pages 6516‒6522. • Islam, Md Asadul and Magnani, Enrico. 2021. Is this the end of the gold standard? A straightforward reference-less grammatical error correction metric. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3009̶3015. n その他 • Michihiro Yasunaga and Percy Liang. 2021. Break-It- Fix-It: Unsupervised Learning for Program Repair. In International Conference on Machine Learning (ICML).