Near Human-Level Performance in Grammatical Error Correction with Hybrid Machine Translation

Near Human-Level Performance in Grammatical Error Correction with Hybrid Machine
Translation Roman Grundkiewicz and Marcin Junczys-Dowmunt Proceedings of NAACL-HLT 2018, pages 284–290 ⽂献紹介（2018-09-11）⻑岡技術科学⼤学⾃然⾔語処理研究室⼩川耀⼀朗 1

Abstract l This paper proposes combining SMT and NMT for
GEC l The hybrid system achieves new state-of-the-art results on CoNLL-2014 and JFLEG benchmarks l The system is closer to reaching human-level performance than any other GEC system 2

Data and preprocessing 3 l Training data: NUCLE, Lang-8 l
Development and test data: CoNLL 2013 and 2014, JFLEG l Tokens were split into 50k subword units using Byte Pair Encoding (BPE) [1] for dealing with OOV words.

SMT systems 4 l Using proposed phrase-based SMT [2] l
+ character-level dense features (Char. ops) l Different things: l using the original tokenization l applying subword units l extending edit-based features

NMT systems 5 l NMT: l Attentional encoder-decoder model with
a bidirectional single-layer encoder and decoder using GRUs l + RNN-LM [3]: l Candidates generated by NMT are re-ranked by adding RNNLM score l NMT×4 [4]: l An ensemble of four independently trained models

Hybrid SMT-NMT systems 6 l SMT-NMT pipelines SMT NMT input
output pipeline l Rescoring with NMT [5] SMT input output 1000 n-best list rescored list +NMT +RNNLM l Final system with spelling correction NMT-rescored SMT NMT input output Spell checking using a character-level SMT [6]

Results 7

Comparison with human annotations 8

Conclusion 9 l This paper proposes combining SMT and NMT
for GEC l The hybrid system achieves new state-of-the-art results on CoNLL-2014 and JFLEG benchmarks l The system is closer to reaching human-level performance than any other GEC system

References 10 [1] Neural Machine Translation of Rare Words with
Subword Units Rico Sennrich, Barry Haddow and Alexandra Birch. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pages 1715–1725, 2016. [2] Phrase-based Machine Translation is State-of-the-Art for Automatic Grammatical Error Correction Marcin Junczys-Dowmunt and Roman Grundkiewicz. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 1546–1556, 2016. [3] A Nested Attention Neural Hybrid Model for Grammatical Error Correction Jianshu Ji, Qinlong Wang, Kristina Toutanova, Yongen Gong, Steven Truong and Jianfeng Gao. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pages 753–762, 2017. [4] Edinburgh Neural Machine Translation Systems for WMT 16 Rico Sennrich, Barry Haddow and Alexandra Birch. Proceedings of the First Conference on Machine Translation, Volume 2: Shared Task Papers, pages 371–376, 2016. [5] Batch Tuning Strategies for Statistical Machine Translation Colin Cherry and George Foster. 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 427–436, 2012. [6] Connecting the Dots: Towards Human-Level Grammatical Error Correction Shamil Chollampatt and Hwee Tou Ng. Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, pages 327–333, 2017.

Output examples 11

Others 12

Near Human-Level Performance in Grammatical Err...

Near Human-Level Performance in Grammatical Error Correction with Hybrid Machine Translation

youichiro

More Decks by youichiro

Other Decks in Technology

Featured

Transcript