Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A Multilayer Convolutional Encoder-Decoder Neural Network for Grammatical Error Correction

66cc992074ab4522374e429c11fef225?s=47 youichiro
November 12, 2018

A Multilayer Convolutional Encoder-Decoder Neural Network for Grammatical Error Correction

長岡技術科学大学
自然言語処理研究室
文献紹介(2018-11-13)

66cc992074ab4522374e429c11fef225?s=128

youichiro

November 12, 2018
Tweet

More Decks by youichiro

Other Decks in Technology

Transcript

  1. A Multilayer Convolutional Encoder-Decoder Neural Network for Grammatical Error Correction

    Shamil Chollampatt and Hwee Tou Ng Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018  2018-11-13       0
  2. • Convolutional Encoder-DecoderGEC •    • RNN •

    Pre-trained word embeddings  A Multilayer Convolutional Encoder-Decoder Neural Network for Grammatical Error Correction 1
  3. A Multilayer Convolutional Encoder-Decoder NN 2

  4. • fastTextword embeddings  üEnglish   ü  Pre-trained

    word embeddings 3
  5.   # • Edit Operation (EO) • !$"$" !

    • Language model (LM) • 5-gram LM  • $"  Rescore 4
  6. • Training • Lang-8 + NUCLE (1.3M sentence pairs) •

    Development • NUCLE (5.4K sentence pairs) • Pre-training word embeddings • Wikipedia (1.78B words) • Training language model • Common Crawl corpus (94B words) Dataset 5
  7. Result 6

  8. Result Pre-trained embeddings → 7

  9. Result Ensemble → 8

  10. Result +Rescore → 9

  11. Result +SpellCheck → 10

  12. Result → SoTA 11

  13. RNN vs CNN 12

  14. RNN vs CNN 13

  15. RNN vs CNN 14 RNN*-'+)%$ -'"&   → Precision

    CNN,'#   → (!   → Recall
  16. Embedding Initialization 15

  17. • Convolutional Encoder-DecoderGEC • CNNRNN   •  SoTA

    • Pre-trained word embeddings • Language modelEdit Operation  •  Conclusion 16
  18. 17

  19. Model and Training Details • Source and target embeddings: 500

    dimensions • Source and target vocabularies: 30K (BPE) • Pre-trained word embeddings • Using fastText • On the Wikipedia corpus • Using a skip-gram model with a window size of 5 • Character N-gram sequences of size between 3 and 6 • Encoder-decoder • 7 convolutional layers • With a convolution window width of 3 • Output of each encoder and decoder layer: 1024 dimensions • Dropout: 0.2 • Batch size: 32 • Learning rate: 0.25 with learning rate annealing factor of 0.1 • Momentum value: 0.99 • Beam width: 12 • Training a single model tales around 18 hours 18
  20. 19 Other Result

  21. 20 Analysis

  22. 21 https://github.com/nusnlp/mlconvgec2018