2,299 (WMT16) . test: 3,004 • IWST (TED), de-en (in-domain) train: 153k, dev:6,969, test:6750 Architecture • subword-based encoder-decoder with attention ◦ bidirectional encoder and single layer decoder • subword embedding size: 500 • sample k: 5, softmax temperature: 0.5