Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A Multilayer Convolutional Encoder-Decoder Neural Network for Grammatical Error Correction

youichiro
November 12, 2018

A Multilayer Convolutional Encoder-Decoder Neural Network for Grammatical Error Correction

長岡技術科学大学
自然言語処理研究室
文献紹介(2018-11-13)

youichiro

November 12, 2018
Tweet

More Decks by youichiro

Other Decks in Technology

Transcript

  1. A Multilayer Convolutional
    Encoder-Decoder Neural Network
    for Grammatical Error Correction
    Shamil Chollampatt and Hwee Tou Ng
    Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

    2018-11-13

    0

    View Slide

  2. • Convolutional Encoder-DecoderGEC


    • RNN
    • Pre-trained word embeddings
    A Multilayer Convolutional Encoder-Decoder
    Neural Network for Grammatical Error Correction
    1

    View Slide

  3. A Multilayer Convolutional Encoder-Decoder NN
    2

    View Slide

  4. • fastTextword embeddings
    üEnglish

    ü
    Pre-trained word embeddings
    3

    View Slide


  5. #
    • Edit Operation (EO)
    • !$"$" !
    • Language model (LM)
    • 5-gram LM
    • $"
    Rescore
    4

    View Slide

  6. • Training
    • Lang-8 + NUCLE (1.3M sentence pairs)
    • Development
    • NUCLE (5.4K sentence pairs)
    • Pre-training word embeddings
    • Wikipedia (1.78B words)
    • Training language model
    • Common Crawl corpus (94B words)
    Dataset
    5

    View Slide

  7. Result
    6

    View Slide

  8. Result
    Pre-trained embeddings

    7

    View Slide

  9. Result
    Ensemble

    8

    View Slide

  10. Result
    +Rescore

    9

    View Slide

  11. Result
    +SpellCheck

    10

    View Slide

  12. Result

    SoTA
    11

    View Slide

  13. RNN vs CNN
    12

    View Slide

  14. RNN vs CNN
    13

    View Slide

  15. RNN vs CNN
    14
    RNN*-'+)%$ -'"&

    → Precision
    CNN,'#
    → (!

    → Recall

    View Slide

  16. Embedding Initialization
    15

    View Slide

  17. • Convolutional Encoder-DecoderGEC
    • CNNRNN

    • SoTA
    • Pre-trained word embeddings
    • Language modelEdit Operation

    Conclusion
    16

    View Slide

  18. 17

    View Slide

  19. Model and Training Details
    • Source and target embeddings: 500 dimensions
    • Source and target vocabularies: 30K (BPE)
    • Pre-trained word embeddings
    • Using fastText
    • On the Wikipedia corpus
    • Using a skip-gram model with a window size of 5
    • Character N-gram sequences of size between 3 and 6
    • Encoder-decoder
    • 7 convolutional layers
    • With a convolution window width of 3
    • Output of each encoder and decoder layer: 1024 dimensions
    • Dropout: 0.2
    • Batch size: 32
    • Learning rate: 0.25 with learning rate annealing factor of 0.1
    • Momentum value: 0.99
    • Beam width: 12
    • Training a single model tales around 18 hours
    18

    View Slide

  20. 19
    Other Result

    View Slide

  21. 20
    Analysis

    View Slide

  22. 21
    https://github.com/nusnlp/mlconvgec2018

    View Slide