A Multilayer Convolutional
Encoder-Decoder Neural Network
for Grammatical Error Correction
Shamil Chollampatt and Hwee Tou Ng
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018
2018-11-13
0
Slide 2
Slide 2 text
• Convolutional Encoder-DecoderGEC
•
• RNN
• Pre-trained word embeddings
A Multilayer Convolutional Encoder-Decoder
Neural Network for Grammatical Error Correction
1
Slide 3
Slide 3 text
A Multilayer Convolutional Encoder-Decoder NN
2
Slide 4
Slide 4 text
• fastTextword embeddings
üEnglish
ü
Pre-trained word embeddings
3
Slide 5
Slide 5 text
#
• Edit Operation (EO)
• !$"$" !
• Language model (LM)
• 5-gram LM
• $"
Rescore
4
Slide 6
Slide 6 text
• Training
• Lang-8 + NUCLE (1.3M sentence pairs)
• Development
• NUCLE (5.4K sentence pairs)
• Pre-training word embeddings
• Wikipedia (1.78B words)
• Training language model
• Common Crawl corpus (94B words)
Dataset
5
• Convolutional Encoder-DecoderGEC
• CNNRNN
• SoTA
• Pre-trained word embeddings
• Language modelEdit Operation
•
Conclusion
16
Slide 18
Slide 18 text
17
Slide 19
Slide 19 text
Model and Training Details
• Source and target embeddings: 500 dimensions
• Source and target vocabularies: 30K (BPE)
• Pre-trained word embeddings
• Using fastText
• On the Wikipedia corpus
• Using a skip-gram model with a window size of 5
• Character N-gram sequences of size between 3 and 6
• Encoder-decoder
• 7 convolutional layers
• With a convolution window width of 3
• Output of each encoder and decoder layer: 1024 dimensions
• Dropout: 0.2
• Batch size: 32
• Learning rate: 0.25 with learning rate annealing factor of 0.1
• Momentum value: 0.99
• Beam width: 12
• Training a single model tales around 18 hours
18