Neural Machine Translations in Booking.com

Neural Machine Translations in Booking.com Talaat Khalil, Data Scientist

Presentation Outline • MT use cases at Booking.com • Neural
sequence to sequence models • OpenNMT • Transformer model • tensor2tensor • Evaluations • Challenges & Recommendations

Why MT? • Why Translation? ◦ ⅔ of daily bookings
are not booked in English • Why MT? ◦ 1M+ property, and growing ◦ Very frequent property description updates ◦ New user generated content every second (cs email, review, etc.) • Presentation focus is Property Descriptions Translations

Neural seq2seq models Encoder Decoder 0.5 0.2 -0.1 -0.3 0.4
1.2 Wifi is free Wifi ist kostenlos

Neural seq2seq models [3]

• Problem: ◦ Source sentence is encoded in one vector
◦ More serious when you have long sequences ◦ Models still struggle to remember early tokens • How to approach this? ◦ Attention mechanisms Neural seq2seq models

Attention [1] Given q (query) and <k,v> pair, the attention
value is calculated according to the compatibility between the query q and it’s key k Context vector c of a query q is then calculated as a weighted average of all values given the attention weights

Attention Score/compatibility functions can have different forms, for example:

Neural seq2seq models (Attention) Wifi is free Wifi ist kostenlos
<s> Wifi ist </s> kostenlos 0.6 0.3 0.1 Attention Vector Context Vector Attention Weights

Our OpenNMT lua configuration Architecture ** Variant of (Bahdanau et
al) Input dim 1000 RNN dim 1000 # of hidden layers Encoder: 4 Decoder: 4 Attention mechanism Global RNN Type LSTM ** Bidirectional encoder Residual connections Yes Data Preprocessing Input text unit Lowercased BPE Tokenization Aggressive, with case features Max. sent length 50 units Vocab Size 30,000-50,000 * * Joint or Separate Optimization ** Standard pipeline Optimizer SGD Learning rate decay 0.7 Decay strategy Validation perplexity increase or Epoch > 20 Stopping Criteria Based on validation perplexity Dropout rate 0.3 Max Batch size 120 Others Inference Beam size 5 GPU Nvidia P100

Transformer Model [2]

Transformer Model (scaled dot product attention)

Transformer Model (Multi-head attention)

Transformer Model (point-wise FFN, positional encodings)

Our tensor2tensor configuration • Preprocessing using native subword encoding •
Big Transformer (6 encoder and 6 decoder attention blocks) • Embedding size = 1024 • Mostly with Nvidia P100 GPU • Some hyper-params were tuned based on the validation loss

Evaluations in Booking.com • Automatic Evaluation: ◦ BLUE score •
Human Evaluation ◦ Adequacy (1-4) ◦ Fluency (1-4) ◦ Publication score: ▪ The translation does is publishable if it does not mislead the user or stop her/him from booking.

Evaluations (EN->DE) System BLUE Adequacy Publication OpenNMT, lua 43.75 3.74
98% tensor2tensor, big transformer 45.68 3.94 98% • > 10M training examples, shared sub-words

Evaluations (EN->TR) System BLUE Adequacy Publication OpenNMT, lua 45.47 3.63
90% tensor2tensor, big transformer 45.77 3.72 95% • > 5M training examples, separate sub-words

Translation challenges • Common mistakes: ◦ Named entities Translations/Transliteration ◦
Rare words translations ◦ Omission and addition of information ◦ Wrong translations in general

Translation challenges • Common mistakes: ◦ Context ignorance: ▪ “It
offers free WiFi and air-conditioned rooms with an LCD TV and private bathroom.” ▪ In some languages (fr, ru, ar, etc.), “It” could be masculine, feminine or neutral ◦ Wrong sentence segmentation (error propagation from another component)

Conclusions • Move smoothly to adopt Tensor2tensor: ◦ Better accuracy
◦ Faster to converge ◦ TPU integration ◦ Options for training using multiple clusters ◦ Modular implementation ◦ Translation using multiple GPUs ◦ Native integration with tensorboard ◦ OpenNMT lua is going to be on maintenance mode (more focus on pytorch and tensorflow implementations)

Conclusions • Others: ◦ Fine Tuning General Purpose Models ◦
Encoding context ◦ Improving segments tokenization is critical ◦ Accelerate training times with TPUs

References [1] Bahdanau et al. Neural Machine Translation by Jointly
Learning to Align and Translate, ICLR 2015. [2] Vaswani et al. Attention is all you need, 31st Conference on Neural Information Processing Systems (NIPS 2017) [3] Sutskever et al. Sequence to Sequence Learning with Neural Networks Conference on Neural Information Processing Systems (NIPS 2014) [3] Tensor2tensor https://github.com/tensorflow/tensor2tensor [4] OpenNMT http://opennmt.net/OpenNMT/ [5] The Annotated Transformer http://nlp.seas.harvard.edu/2018/04/03/attention.html

Thank You! Questions? We’re hiring (NLP Data Scientist)! Contact us
if you are interested! [email protected]

Neural Machine Translations in Booking.com

Neural Machine Translations in Booking.com

pydata

More Decks by pydata

Other Decks in Technology

Featured

Transcript