A Multilayer Convolutional Encoder-Decoder Neural Network for Grammatical Error Correction Shamil Chollampatt and Hwee Tou Ng Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018
• Training • Lang-8 + NUCLE (1.3M sentence pairs) • Development • NUCLE (5.4K sentence pairs) • Pre-training word embeddings • Wikipedia (1.78B words) • Training language model • Common Crawl corpus (94B words) Dataset 5
Model and Training Details • Source and target embeddings: 500 dimensions • Source and target vocabularies: 30K (BPE) • Pre-trained word embeddings • Using fastText • On the Wikipedia corpus • Using a skip-gram model with a window size of 5 • Character N-gram sequences of size between 3 and 6 • Encoder-decoder • 7 convolutional layers • With a convolution window width of 3 • Output of each encoder and decoder layer: 1024 dimensions • Dropout: 0.2 • Batch size: 32 • Learning rate: 0.25 with learning rate annealing factor of 0.1 • Momentum value: 0.99 • Beam width: 12 • Training a single model tales around 18 hours 18