Translation Authors: Mikel Artetxe, Gorka Labaka, Eneko Agirre (University of Basque Country) ACL 2019 - https://www.aclweb.org/anthology/P19-1019/ Also in arXiv:1902.01313 - https://arxiv.org/abs/1902.01313 Presented by Katsuhito Sudoh (AHC Lab., NAIST) Study group of Unsupervised Machine Translation (2019/12/04)
similarity over n-gram embeddings (phrase2vec) • Mapping embeddings into a cross-lingual space (VecMap) • Initialize phrase translation probabilities using 100 nearest neighbors with the formula: !" ̅ $ ̅ % = exp ⁄ cos ̅ %, ̅ $ / ∑ 12 exp ⁄ cos ̅ %, 3 $4 / • Lexical probs. !567 3 %8 3 $8 are estimated in the word level 7
can we optimize log-linear weights? • MERT [Och 2003] is not available in unsupervised settings • Previous work [Artetxe+ 2018b] used synthetic corpus • Lample et al. (2018b) don’t perform any tuning(!) • More principled approach • A novel unsupervised optimization objective • Cyclic consistency loss • Language model loss • Inspired by CycleGANs [Zhu+ 2017] and dual learning [He+ 2016] 9
search & greedy update using n-best (as MERT) • Naively !" -best: # $→& ' and # &→$ # $→& ' • Alternating optimization instead, for efficiency • Optimize one model with the fixed parameters on the other 11 BLEU () * () * MERT-like parameter update only works with relatively small number of parameters (~20).
iteration, we do not perform any tuning and use default Moses weights instead, which we found to be more robust during development. Note, however, that using unsupervised tuning during the previous steps was still strongly beneficial. 13
by: • Subword information (only applicable to the same scripts) • Principled unsupervised tuning • Joint bidirectional refinement • SMT-initialized NMT with iterative training • Performed best in WMT De-En newstest, even better than supervised SMT-based winner in WMT 2014 • Code available: https://github.com/artetxem/monoses 24