Neural Machine Translation: Latest Developments

Session 1 NMT: Latest Developments

• The machine translation problem • Most successful architectures for
NMT • Latest research: • Interpretability • Monolingual data • Customising NMT Agenda

The machine translation problem • Need to build a model
that: • given an input string in a source language • outputs semantically corresponding string in a target language • Machine translation is an example of a sequence-to- sequence problem

Deep learning in machine translation • Deep learning approaches gives
state-of- the-art performance in sequence modelling problems 1. Able to leverage huge quantities of data 2. Compute power getting increasingly cheap and accessible

Deep learning in machine translation • Two dominant architectures: 1.
Recurrent neural networks (RNNs) 2. Transformers • Both are encoder-decoder models

Recurrent Neural Networks (RNNs) in MT

Recurrent Neural Networks in MT • Useful additions: • Bidirectionality
• Attention • LSTMs/GRUs

Transformers in NMT • Transformers introduced in 2017 • Can
often give a few points boost in performance compared to RNNs • The core computations are purely “attention-based” • Better usage of GPUs

State-of-the-art (SOTA) systems for NMT • MT progress very well
organized • WMT yearly conferences: specific MT tasks, baselines, direct comparisons • Details unclear about exact methodology of ready-made MT solutions (e.g. Microsoft Hub, Google AutoML) but they’re likely highly similar • Microsoft – LSTM RNNs • Google – created Transformers so probably being used in MT nlpprogress.com

Error types for NMT systems • Lakew et al. (2018)
show no significant differences in error types between RNN and Transformer- based systems – 80% lexical errors, 15% morphological errors, 5% reordering errors

NMT: Latest Developments • The machine translation problem • Most
successful architectures for NMT • Latest research: • Interpretability • Monolingual data • Customising NMT

NMT Interpretability • Interpretability = intuitively understanding why a model
made the predictions it made 1. Attention – which source words was model looking at when generating the target word?

NMT Interpretability • Interpretability = intuitively understanding why a model
made the predictions it made 1. Attention – which source words was model looking at when generating the target word? Img: Bahdanau et al, 2014

NMT Interpretability 2. Tracing back inspiration. Which training data made
the model think this translation was good? • Look at decoder state for a strange output, find similar states in the training data • Look at final encoder state that eventually led to strange output, find similar states in training data

NMT Interpretability 3. NMT as a linguist. Which linguistic phenomena
are these models capturing? • Try to use specific internal states to predict properties, e.g. whether word is pronoun

Including monolingual data • Large parallel corpora not available for
that many language pairs. Even less for domain-specific data. • Monolingual data is much, much easier to acquire than parallel texts • E.g. ”export all of Swedish Wikipedia” • SMT: monolingual data easily incorporated with language model portion of the model. • Language model (LM): predictive model that predicts the next word(s) given some previous word(s) Img: https://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia

Including monolingual data • Main ways to include monolingual data
in NMT: • 1. Backtranslation • 2. Language models • Task: NMT for English -> Hungarian restaurant text. Very difficult to acquire data for this.

Including monolingual data 1. Backtranslation approach: 1. Gather monolingual Hungarian
text 2. Backtranslate to English using separate model 3. Add this to your training data as if it were “real” data • As long as target language text is fluent, source language text can be a bit wobbly and it’s still often quite beneficial

Including monolingual data 2. Language model approach • Shallow fusion
of NMT and LM models

Customising Neural MT • The machine translation problem • Most
successful architectures for NMT • Latest research: • Interpretability • Monolingual data • Customising NMT

• General-domain MT can give disappointing results on domain-specific language
• E.g. legal documents, technical manuals • Customisation: take a decent (general domain) NMT engine and adapt it to a specific domain (fine-tuning) • Trying to learn new vocabulary, style, usage patterns Goals of customisation

• NMT model is trained until convergence on as much
general domain data as possible Customisation process ≈ ≈

• NMT model is trained until convergence on as much
general domain data as possible • Then, you train for some additional number of epochs on the in- domain data and monitor performance on both datasets • Can easily slide into forgetting general domain language • MT providers automate this process for you: • Good general-domain baseline models trained on tons of data; heuristics for stopping training Customisation process ≈ ≈

• Machine translation is a sequence-to-sequence modelling problem • Neural
network based approaches are SOTA – see RNNs, Transformers, CNNs • Interesting research topics in NMT include neural network interpretability and making use of monolingual data • Domain adaptation is important Key Takeaways

Neural Machine Translation: Latest Developments

Neural Machine Translation: Latest Developments

nslatysheva

More Decks by nslatysheva

Other Decks in Technology

Featured

Transcript

Session 1 NMT: Latest Developments

• The machine translation problem • Most successful architectures for

The machine translation problem • Need to build a model

The machine translation problem • Need to build a model

Deep learning in machine translation • Deep learning approaches gives

Deep learning in machine translation • Two dominant architectures: 1.

Recurrent Neural Networks (RNNs) in MT

Recurrent Neural Networks in MT • Useful additions: • Bidirectionality

Transformers in NMT • Transformers introduced in 2017 • Can

State-of-the-art (SOTA) systems for NMT • MT progress very well

Error types for NMT systems • Lakew et al. (2018)

NMT: Latest Developments • The machine translation problem • Most

NMT Interpretability • Interpretability = intuitively understanding why a model

NMT Interpretability • Interpretability = intuitively understanding why a model

NMT Interpretability 2. Tracing back inspiration. Which training data made

NMT Interpretability 3. NMT as a linguist. Which linguistic phenomena

Including monolingual data • Large parallel corpora not available for

Including monolingual data • Main ways to include monolingual data

Including monolingual data 1. Backtranslation approach: 1. Gather monolingual Hungarian

Including monolingual data 2. Language model approach • Shallow fusion

Including monolingual data 2. Language model approach • Shallow fusion

Customising Neural MT • The machine translation problem • Most

• General-domain MT can give disappointing results on domain-specific language

• NMT model is trained until convergence on as much

• NMT model is trained until convergence on as much

• Machine translation is a sequence-to-sequence modelling problem • Neural