Systematically Adapting Machine Translation for Grammatical Error Correction

Systematically Adapting Machine Translation for Grammatical Error Correction Courtney Napoles
and Chris Callison-Burch Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, pages 345–356, 2017 文献紹介（2018/03/27）長岡技術科学大学自然言語処理研究室小川耀一朗 1

概要 l 英語学習者作⽂の⽂法誤り訂正⼿法を提案 l 統計的機械翻訳(SMT)を⽂法誤り訂正に適⽤ l 少ない訓練データにおいて最⾼性能のモデルに匹敵する性能を⽰した 2

誤り訂正のアプローチルールベース(rule-based system) 誤りタイプの分類器(classifiers targeting specific error types) 統計的機械翻訳(statistical machine
translation) ニューラル機械翻訳(neural machine translation) 3 ࠷ઌ୺ (Yuan and Briscoe, 2016)

提案手法: SMEC l ⽂法誤り訂正に適した処理をSMTと組み合わせる uスペルミス訂正ルールを追加 u訂正操作のスコア素性 u⽂法誤り訂正の適した評価指標でチューニングを適⽤ 4

提案手法: SMEC uスペルミス訂正ルール *1 u名詞の単数形・複数形の変換*2（singular ⇆ plural） u動詞の基本形、3⼈称単数形、過去形、過去分詞形、進⾏形の変換*2（wake, wakes,
woke, woken, waking） *1: PyEnchantを使⽤ *2: RASPʼs morphological generator, morphg (Minnen et al., 2001) を使⽤ 5

提案手法: SMEC u訂正操作のスコアを素性に⽤いる uSMTの最適化 Ø BLEUではなくGLEU 6

実験設定 l SMT: hierarchical phase-based translation model with Thrax (Weese
et al., 2011) l 訓練データ：Lang-8 corpus(1000kペア) l 開発データ：JFLEG tuning set(751ペア) l テストデータ：JFLEG test set(747ペア) l ⾔語モデル：English Gigaword 5-gram LM 7

訂正実験の結果 • Sp. Baseline: スペルミス訂正モデル • MT baseline: 特別な素性を⽤いずにBLUEで最適化 •
YB16: 最⾼性能のNMTモデル(CLC corpus: 2000kペア) Ø 最⾼性能と同じくらいの性能を⽰す 8

コンポーネントの比較 • SMEC –GLEU： BLEUでSMTを最適化 • SMEC –feats：特別な素性を⽤いない • SMEC
–sp：スペルミス訂正ルールを⽤いない Ø スペル訂正による効果が⼤きい 9

まとめ n 統計的機械翻訳(SMT)を⽂法誤り訂正に適⽤ l スペル訂正ルールの追加 l 訂正操作のスコア素性 l GLEUによるSMTの最適化を適⽤
n 半分の訓練データで、最⾼性能モデルの性能に達した 10

Systematically Adapting Machine Translation for...

Systematically Adapting Machine Translation for Grammatical Error Correction

youichiro

More Decks by youichiro

Other Decks in Technology

Featured

Transcript