Slide 16
Slide 16 text
Augmented Human Communication Laboratory
Experiments
• Training: WMT 2014 Fr-En / De-En
• Monolingual News Crawl (2007-2013)
• #tokens: 749M (Fr), 1,606M (De), 2,109M (En)
• 2,000 sentences chosen for tuning
• Moses-based normalization, tokenization, truecasing
• Test: in-house (Fr-En), newstest2014/2016 (De-En)
• SMT: Moses, KenLM (5-gram), Z-MERT, FastAlign
• NMT: fairseq-based Transformer-Big
16