Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Neural Machine Translation: Latest Developments

Neural Machine Translation: Latest Developments

This is the starting session of the Machine Translation Summit tutorial on post-editing run by the Welocalize NLP Engineering team. This session provides an introduction to core neural machine translation concepts, including key architectures, the domain customisation process, and related research in neural network interpretability and monolingual data inclusion.

Avatar for nslatysheva

nslatysheva

August 20, 2019
Tweet

More Decks by nslatysheva

Other Decks in Technology

Transcript

  1. • The machine translation problem • Most successful architectures for

    NMT • Latest research: • Interpretability • Monolingual data • Customising NMT Agenda
  2. The machine translation problem • Need to build a model

    that: • given an input string in a source language • outputs semantically corresponding string in a target language • Machine translation is an example of a sequence-to- sequence problem
  3. The machine translation problem • Need to build a model

    that: • given an input string in a source language • outputs semantically corresponding string in a target language • Machine translation is an example of a sequence-to- sequence problem
  4. Deep learning in machine translation • Deep learning approaches gives

    state-of- the-art performance in sequence modelling problems 1. Able to leverage huge quantities of data 2. Compute power getting increasingly cheap and accessible
  5. Deep learning in machine translation • Two dominant architectures: 1.

    Recurrent neural networks (RNNs) 2. Transformers • Both are encoder-decoder models
  6. Transformers in NMT • Transformers introduced in 2017 • Can

    often give a few points boost in performance compared to RNNs • The core computations are purely “attention-based” • Better usage of GPUs
  7. State-of-the-art (SOTA) systems for NMT • MT progress very well

    organized • WMT yearly conferences: specific MT tasks, baselines, direct comparisons • Details unclear about exact methodology of ready-made MT solutions (e.g. Microsoft Hub, Google AutoML) but they’re likely highly similar • Microsoft – LSTM RNNs • Google – created Transformers so probably being used in MT nlpprogress.com
  8. Error types for NMT systems • Lakew et al. (2018)

    show no significant differences in error types between RNN and Transformer- based systems – 80% lexical errors, 15% morphological errors, 5% reordering errors
  9. NMT: Latest Developments • The machine translation problem • Most

    successful architectures for NMT • Latest research: • Interpretability • Monolingual data • Customising NMT
  10. NMT Interpretability • Interpretability = intuitively understanding why a model

    made the predictions it made 1. Attention – which source words was model looking at when generating the target word?
  11. NMT Interpretability • Interpretability = intuitively understanding why a model

    made the predictions it made 1. Attention – which source words was model looking at when generating the target word? Img: Bahdanau et al, 2014
  12. NMT Interpretability 2. Tracing back inspiration. Which training data made

    the model think this translation was good? • Look at decoder state for a strange output, find similar states in the training data • Look at final encoder state that eventually led to strange output, find similar states in training data
  13. NMT Interpretability 3. NMT as a linguist. Which linguistic phenomena

    are these models capturing? • Try to use specific internal states to predict properties, e.g. whether word is pronoun
  14. Including monolingual data • Large parallel corpora not available for

    that many language pairs. Even less for domain-specific data. • Monolingual data is much, much easier to acquire than parallel texts • E.g. ”export all of Swedish Wikipedia” • SMT: monolingual data easily incorporated with language model portion of the model. • Language model (LM): predictive model that predicts the next word(s) given some previous word(s) Img: https://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia
  15. Including monolingual data • Main ways to include monolingual data

    in NMT: • 1. Backtranslation • 2. Language models • Task: NMT for English -> Hungarian restaurant text. Very difficult to acquire data for this.
  16. Including monolingual data 1. Backtranslation approach: 1. Gather monolingual Hungarian

    text 2. Backtranslate to English using separate model 3. Add this to your training data as if it were “real” data • As long as target language text is fluent, source language text can be a bit wobbly and it’s still often quite beneficial
  17. Customising Neural MT • The machine translation problem • Most

    successful architectures for NMT • Latest research: • Interpretability • Monolingual data • Customising NMT
  18. • General-domain MT can give disappointing results on domain-specific language

    • E.g. legal documents, technical manuals • Customisation: take a decent (general domain) NMT engine and adapt it to a specific domain (fine-tuning) • Trying to learn new vocabulary, style, usage patterns Goals of customisation
  19. • NMT model is trained until convergence on as much

    general domain data as possible Customisation process ≈ ≈
  20. • NMT model is trained until convergence on as much

    general domain data as possible • Then, you train for some additional number of epochs on the in- domain data and monitor performance on both datasets • Can easily slide into forgetting general domain language • MT providers automate this process for you: • Good general-domain baseline models trained on tons of data; heuristics for stopping training Customisation process ≈ ≈
  21. • Machine translation is a sequence-to-sequence modelling problem • Neural

    network based approaches are SOTA – see RNNs, Transformers, CNNs • Interesting research topics in NMT include neural network interpretability and making use of monolingual data • Domain adaptation is important Key Takeaways