Slide 11
Slide 11 text
Experiments setup (including appendix)
● The models are SOTA NMT systems using our own implementation of NMT
with attention over the source sequence (Bahdnau et al., 2014).
○ bidirectional GRUs
○ AdaDelta
○ train all systems for 500,000 iterations, with validation every 5,000 steps,
best single model from validation is used.
○ use ℓ2 regularization (α = 1e^(-5))
○ Dropout is used on the output layers, rate is 0.5.
○ beam size is 10.
● corpus
○ English-German
■ 4.4M segments from the Europarl and CommonCrawl
○ English-French
■ 4.9M segments from the Europarl and CommonCrawl
○ English-Portuguese
■ 28.5M segments from the Europarl, JRC-Aquis and OpenSubtitles
● subword
○ they created a shared subword representation for each language pair
by extracting a vocabulary of 80,000 symbols from the concatenated source and target data. 11