Slide 17
Slide 17 text
Experiments
Data
● WMT17, de-en (out-of-domain)
train: 5.9M , dev: 2,299 (WMT16) . test: 3,004
● IWST (TED), de-en (in-domain)
train: 153k, dev:6,969, test:6750
Architecture
● subword-based encoder-decoder with attention
○ bidirectional encoder and single layer decoder
● subword embedding size: 500
● sample k: 5, softmax temperature: 0.5