GEC l The hybrid system achieves new state-of-the-art results on CoNLL-2014 and JFLEG benchmarks l The system is closer to reaching human-level performance than any other GEC system 2
Development and test data: CoNLL 2013 and 2014, JFLEG l Tokens were split into 50k subword units using Byte Pair Encoding (BPE) [1] for dealing with OOV words.
+ character-level dense features (Char. ops) l Different things: l using the original tokenization l applying subword units l extending edit-based features
a bidirectional single-layer encoder and decoder using GRUs l + RNN-LM [3]: l Candidates generated by NMT are re-ranked by adding RNNLM score l NMT×4 [4]: l An ensemble of four independently trained models
output pipeline l Rescoring with NMT [5] SMT input output 1000 n-best list rescored list +NMT +RNNLM l Final system with spelling correction NMT-rescored SMT NMT input output Spell checking using a character-level SMT [6]
for GEC l The hybrid system achieves new state-of-the-art results on CoNLL-2014 and JFLEG benchmarks l The system is closer to reaching human-level performance than any other GEC system
Subword Units Rico Sennrich, Barry Haddow and Alexandra Birch. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pages 1715–1725, 2016. [2] Phrase-based Machine Translation is State-of-the-Art for Automatic Grammatical Error Correction Marcin Junczys-Dowmunt and Roman Grundkiewicz. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 1546–1556, 2016. [3] A Nested Attention Neural Hybrid Model for Grammatical Error Correction Jianshu Ji, Qinlong Wang, Kristina Toutanova, Yongen Gong, Steven Truong and Jianfeng Gao. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pages 753–762, 2017. [4] Edinburgh Neural Machine Translation Systems for WMT 16 Rico Sennrich, Barry Haddow and Alexandra Birch. Proceedings of the First Conference on Machine Translation, Volume 2: Shared Task Papers, pages 371–376, 2016. [5] Batch Tuning Strategies for Statistical Machine Translation Colin Cherry and George Foster. 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 427–436, 2012. [6] Connecting the Dots: Towards Human-Level Grammatical Error Correction Shamil Chollampatt and Hwee Tou Ng. Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, pages 327–333, 2017.