Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Near Human-Level Performance in Grammatical Error Correction with Hybrid Machine Translation

youichiro
September 10, 2018

Near Human-Level Performance in Grammatical Error Correction with Hybrid Machine Translation

長岡技術科学大学
自然言語処理研究室
文献紹介 (2018-09-11)

youichiro

September 10, 2018
Tweet

More Decks by youichiro

Other Decks in Technology

Transcript

  1. Near Human-Level Performance in Grammatical Error Correction with Hybrid Machine

    Translation Roman Grundkiewicz and Marcin Junczys-Dowmunt Proceedings of NAACL-HLT 2018, pages 284–290 ⽂献紹介(2018-09-11) ⻑岡技術科学⼤学 ⾃然⾔語処理研究室 ⼩川 耀⼀朗 1
  2. Abstract l This paper proposes combining SMT and NMT for

    GEC l The hybrid system achieves new state-of-the-art results on CoNLL-2014 and JFLEG benchmarks l The system is closer to reaching human-level performance than any other GEC system 2
  3. Data and preprocessing 3 l Training data: NUCLE, Lang-8 l

    Development and test data: CoNLL 2013 and 2014, JFLEG l Tokens were split into 50k subword units using Byte Pair Encoding (BPE) [1] for dealing with OOV words.
  4. SMT systems 4 l Using proposed phrase-based SMT [2] l

    + character-level dense features (Char. ops) l Different things: l using the original tokenization l applying subword units l extending edit-based features
  5. NMT systems 5 l NMT: l Attentional encoder-decoder model with

    a bidirectional single-layer encoder and decoder using GRUs l + RNN-LM [3]: l Candidates generated by NMT are re-ranked by adding RNNLM score l NMT×4 [4]: l An ensemble of four independently trained models
  6. Hybrid SMT-NMT systems 6 l SMT-NMT pipelines SMT NMT input

    output pipeline l Rescoring with NMT [5] SMT input output 1000 n-best list rescored list +NMT +RNNLM l Final system with spelling correction NMT-rescored SMT NMT input output Spell checking using a character-level SMT [6]
  7. Conclusion 9 l This paper proposes combining SMT and NMT

    for GEC l The hybrid system achieves new state-of-the-art results on CoNLL-2014 and JFLEG benchmarks l The system is closer to reaching human-level performance than any other GEC system
  8. References 10 [1] Neural Machine Translation of Rare Words with

    Subword Units Rico Sennrich, Barry Haddow and Alexandra Birch. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pages 1715–1725, 2016. [2] Phrase-based Machine Translation is State-of-the-Art for Automatic Grammatical Error Correction Marcin Junczys-Dowmunt and Roman Grundkiewicz. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 1546–1556, 2016. [3] A Nested Attention Neural Hybrid Model for Grammatical Error Correction Jianshu Ji, Qinlong Wang, Kristina Toutanova, Yongen Gong, Steven Truong and Jianfeng Gao. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pages 753–762, 2017. [4] Edinburgh Neural Machine Translation Systems for WMT 16 Rico Sennrich, Barry Haddow and Alexandra Birch. Proceedings of the First Conference on Machine Translation, Volume 2: Shared Task Papers, pages 371–376, 2016. [5] Batch Tuning Strategies for Statistical Machine Translation Colin Cherry and George Foster. 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 427–436, 2012. [6] Connecting the Dots: Towards Human-Level Grammatical Error Correction Shamil Chollampatt and Hwee Tou Ng. Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, pages 327–333, 2017.