Grammatical error correction using neural machine translation

Grammatical error correction using neural machine translation Zheng Yuan and
Ted Briscoe Proceedings of NAACL-HLT 2016, pages 380–386 文献紹介（2017/07/20）自然言語処理研究室小川耀一朗 0

概要 l ⽂法誤り訂正タスクに対して、NMTを使った⼿法を提案 l NMTの低頻度語問題を取り扱うための2ステップのアプローチを提案 l これまでのSMTシステムよりも精度が向上 1 /13

文法誤り訂正タスク l ⾮ネイティブの英語作⽂での⽂法上の誤りを訂正するタスク l 統計的機械翻訳(SMT)が⾼いパフォーマンスを⽰しており、主要なアプローチとなっている 2 /13

NMTのアプローチ l ニューラル機械翻訳(NMT)は、訓練データには⾒られない誤りも訂正できる[1] ⼤規模な学習者コーパスのアノテーションの⽋如を改善することができる 3 /13

NMTの低頻度語問題 l NMTは訓練データの語彙サイズを制限する必要がある低頻度な単語はUNK記号に置き換えられてしまい、翻訳できない l ⾮ネイティブの⽂章には低頻度語だけでなく、スペルミスも含まれているスペルミスを訂正できない Original sentence
I am goign to make a plan System hypothesis I am UNK to make a plan Gold standard I am going to make a plan 4 /13

低頻度語問題のアプローチ 2ステップのアプローチを提案 (1) 出⼒⽂と⼊⼒⽂の単語アライメントをとり、出⼒⽂中に出現するUNK の元の単語を取得する Original sentence I am goign
to make a plan System hypothesis I am UNK to make a plan →UNKの元の単語はgoing 5 /13

低頻度語問題のアプローチ 2ステップのアプローチを提案 (2) 単語レベルの翻訳モデルを事前に構築しておき、UNKの元の単語を後処理で置き換える単語レベル翻訳モデル：goign → going Original sentence
I am goign to make a plan System hypothesis I am going to make a plan 6 /13

低頻度語問題のアプローチ単語レベルの翻訳モデルの構築： GIZA++：単語アライメントツール GIZA++に学習者コーパスを訓練、単語アライメントデータを作成 I am goign to make a
plan → I am going to make a plan METEORを併⽤することで語幹や同義語、⾔い換えも考慮する 7 /13 I → I am → am goign → going to → to a → a plan → plan

実験 l ターゲットの語彙サイズを30Kに制限 l ソースの語彙サイズを30K、50K、80K l 学習者データ：Cambridge Learner Corpus(CLC) ソース側：
約25万語彙ターゲット側：約14万語彙 l 評価尺度 I-measure：システムの性能とベースラインの性能を⽐較 M2 score： CoNLL-2014と同様 GLUE：⼈間の判断と良い相関を⽰す 8 /13

実験結果 9 /13

実験結果 10 /13

実験結果 11 /13

実験結果 Original sentence There are kidnaps everywhere and not all
of the family can afford the ransom ... SMT hypothesis There are kidnaps everywhere and not all of the families can afford the ransom ... NMT hypothesis There are kidnappings everywhere and not all of the families can afford the ransom ... Gold standard There are kidnappings everywhere and not all of the families can afford the ransom ... (kidnaps→kidnappings)がSMTのフレーズテーブルにない→SMT× この2つの単語は訓練データに存在→NMT◦ 12 /13

まとめ l NMTにおける低頻度語問題に対処することでNMTが⽂法誤り訂正にうまく適⽤できることを⽰した l SMTの state-of-the-art なスコアを上回るスコアを達成した 13
/13

参考文献 [1] Thang Luong, Ilya Sutskever, Quoc Le, Oriol Vinyals,
and Wojciech Zaremba. 2015. Addressing the Rare Word Problem in Neural Machine Translation. In Pro- ceedings of the ACL- IJCNLP, pages 11–19. 14 /13

Grammatical error correction using neural machi...

Grammatical error correction using neural machine translation

youichiro

More Decks by youichiro

Other Decks in Technology

Featured

Transcript