Grammatical error correction using hybrid systems and type filtering

Grammatical error correction using hybrid systems and type filtering Mariano
Felice, Zheng Yuan, Øistein E. Andersen, Helen Yannakoudakis, and Ekaterina Kochmar Proceedings of the 18th Conference on Computational Natural Language Learning: Shared Task, pages 15–24, 2014 自然言語処理研究室小川耀一朗 0

概要 • 英語の⽂法誤り訂正タスクに取り組む • ルールベースとSMTの誤り訂正システムを組み合わせたハイブリッドシステムを提案 • CoNLL-2014 shared task
において、オリジナルのテストセットでは1位、アノテーションを修正したテストセットでは2位の成績 1 /12

背景現在の⽂法誤り訂正⼿法は多くのエラータイプにおいて⾼性能を達成していない[1] CoNLL-2014 shared task: Grammatical Error Correction[1] •
英語以外のネイティブスピーカーが作成した短い⽂章に存在する全ての⽂法誤りを訂正する • 参加チームには⽂法誤りがアノテーションされた共通の訓練データを配布 • 未知のテストデータを共有の評価基準で評価 • 14チームが参加 Shared task に提出した⼿法を紹介 2 /12

アプローチ誤り訂正モデル：ルールベースシステム、SMTシステム提案システムの概要 3 /12

ルールベース誤り訂正システム Cambridge Learner Corpus2(CLC) から⾃動的に得られたルールを使って訂正 CLC: 1600万語の学習者英語コーパス、86の異なる⺟国語の英語学習者によって書かれた全ての誤りを保持している[2] 4
/12

SMTシステム間違った⽂をソース側、訂正された⽂をターゲット側にしたパラレルコーパスをトレーニング間違った英語を正しい英語に翻訳する訓練データ：・CoNLL-2014 shared task 開発セット (1,382⽂)
・NUCLE v3.1 (57,152⽂) ・FCE (16,068⽂) ・IELTS (64,628⽂) 5 /12

言語モデルランキング⽣成された候補⽂をランク付け最も確からしい候補⽂を最終的な訂正として選択 Microsoft Web N-gram Servicesを使⽤した 6 /12

タイプフィルタリングシステムが不必要な訂正をしている場合は除外訓練データで観測された共通のパターンに基づいて、⼊⼒⽂と訂正⽂の単語形式と品詞タグの違いから不要な訂正を推定訂正の精度が0のタイプのみを削除する 7 /12

実験ルールベースとSMT ルールベースシステムとSMTシステムの個々の精度を検証テストデータ：CoNLL-2014 shared task 開発セット(1,882⽂) CE: correct edits,
ME: missed edits, UE: unnecessary edits, P: precision, R: recall 8 /12

実験システムの組み合わせ 9 /12

実験フィルタリング訂正の精度が0のタイプを除外・Reordering: 並べ替え you also can → you
can also ・Srun: カンマ区切り The issue is highly [debatable, a → debatable. A] genetic risk could come from either side of the family.[1] ・Wa: 頭字語 After [WOWII → World War II], the population of China decreased rapidly.[1] 10 /12

CoNLL 2014 shared task の結果テストデータ：シンガポール⼤学の学⽣25⼈が書いた50のエッセイ（1,312⽂） Original: 主催者が提供するテストセット Revised:
参加チームがテストセットのアノテーションを修正したもの 11 /12

まとめ • ルールベースとSMTの誤り訂正システムのハイブリッドシステムを提案 • システム間の受け渡し、候補⽂⽣成、⾔語モデルランキングを異なる組み合わせで検証 • エラータイプ推定による不必要な訂正のフィルタリングにより、再現率を下げることなく適合率を向上させた
• CoNLL 2014 shared task においてオリジナルのテストセットでは 1位、アノテーションを修正したテストセットでは2位の成績 12 /12

参考文献 [1]Hwee Tou Ng, Siew Mei Wu, Ted Briscoe, Christian
Hadiwinoto, Raymond Hendy Susanto, and Christopher Bryant. 2014. The CoNLL- 2014 Shared Task on Grammatical Error Correction. In Proceedings of the Eighteenth Conference on Computational Natu- ral Language Learning: Shared Task (CoNLL-2014 Shared Task), Baltimore, Maryland, USA, June. Association for Computational Linguistics. To appear. [2]Diane Nicholls. 2003. The Cambridge Learner Corpus: Error coding and analysis for lexicography and ELT. In Dawn Archer, Paul Rayson, Andrew Wilson, and Tony McEnery, editors, Proceedings of the Corpus Linguistics 2003 conference, pages 572– 581, Lancaster, UK. University Centre for Computer Corpus Research on Language, Lancaster University. 13

参加チームのスコア[1] 14

Grammatical error correction using hybrid syste...

Grammatical error correction using hybrid systems and type filtering

youichiro

More Decks by youichiro

Other Decks in Technology

Featured

Transcript