Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Neural Machine Translations in Booking.com

Neural Machine Translations in Booking.com

Talk given by Talaat Khalil at PyData Amsterdam, 2018/05/08

pydata

May 08, 2018
Tweet

More Decks by pydata

Other Decks in Technology

Transcript

  1. Presentation Outline • MT use cases at Booking.com • Neural

    sequence to sequence models • OpenNMT • Transformer model • tensor2tensor • Evaluations • Challenges & Recommendations
  2. Why MT? • Why Translation? ◦ ⅔ of daily bookings

    are not booked in English • Why MT? ◦ 1M+ property, and growing ◦ Very frequent property description updates ◦ New user generated content every second (cs email, review, etc.) • Presentation focus is Property Descriptions Translations
  3. • Problem: ◦ Source sentence is encoded in one vector

    ◦ More serious when you have long sequences ◦ Models still struggle to remember early tokens • How to approach this? ◦ Attention mechanisms Neural seq2seq models
  4. Attention [1] Given q (query) and <k,v> pair, the attention

    value is calculated according to the compatibility between the query q and it’s key k Context vector c of a query q is then calculated as a weighted average of all values given the attention weights
  5. Neural seq2seq models (Attention) Wifi is free Wifi ist kostenlos

    <s> Wifi ist </s> kostenlos 0.6 0.3 0.1 Attention Vector Context Vector Attention Weights
  6. Our OpenNMT lua configuration Architecture ** Variant of (Bahdanau et

    al) Input dim 1000 RNN dim 1000 # of hidden layers Encoder: 4 Decoder: 4 Attention mechanism Global RNN Type LSTM ** Bidirectional encoder Residual connections Yes Data Preprocessing Input text unit Lowercased BPE Tokenization Aggressive, with case features Max. sent length 50 units Vocab Size 30,000-50,000 * * Joint or Separate Optimization ** Standard pipeline Optimizer SGD Learning rate decay 0.7 Decay strategy Validation perplexity increase or Epoch > 20 Stopping Criteria Based on validation perplexity Dropout rate 0.3 Max Batch size 120 Others Inference Beam size 5 GPU Nvidia P100
  7. Our tensor2tensor configuration • Preprocessing using native subword encoding •

    Big Transformer (6 encoder and 6 decoder attention blocks) • Embedding size = 1024 • Mostly with Nvidia P100 GPU • Some hyper-params were tuned based on the validation loss
  8. Evaluations in Booking.com • Automatic Evaluation: ◦ BLUE score •

    Human Evaluation ◦ Adequacy (1-4) ◦ Fluency (1-4) ◦ Publication score: ▪ The translation does is publishable if it does not mislead the user or stop her/him from booking.
  9. Evaluations (EN->DE) System BLUE Adequacy Publication OpenNMT, lua 43.75 3.74

    98% tensor2tensor, big transformer 45.68 3.94 98% • > 10M training examples, shared sub-words
  10. Evaluations (EN->TR) System BLUE Adequacy Publication OpenNMT, lua 45.47 3.63

    90% tensor2tensor, big transformer 45.77 3.72 95% • > 5M training examples, separate sub-words
  11. Translation challenges • Common mistakes: ◦ Named entities Translations/Transliteration ◦

    Rare words translations ◦ Omission and addition of information ◦ Wrong translations in general
  12. Translation challenges • Common mistakes: ◦ Context ignorance: ▪ “It

    offers free WiFi and air-conditioned rooms with an LCD TV and private bathroom.” ▪ In some languages (fr, ru, ar, etc.), “It” could be masculine, feminine or neutral ◦ Wrong sentence segmentation (error propagation from another component)
  13. Conclusions • Move smoothly to adopt Tensor2tensor: ◦ Better accuracy

    ◦ Faster to converge ◦ TPU integration ◦ Options for training using multiple clusters ◦ Modular implementation ◦ Translation using multiple GPUs ◦ Native integration with tensorboard ◦ OpenNMT lua is going to be on maintenance mode (more focus on pytorch and tensorflow implementations)
  14. Conclusions • Others: ◦ Fine Tuning General Purpose Models ◦

    Encoding context ◦ Improving segments tokenization is critical ◦ Accelerate training times with TPUs
  15. References [1] Bahdanau et al. Neural Machine Translation by Jointly

    Learning to Align and Translate, ICLR 2015. [2] Vaswani et al. Attention is all you need, 31st Conference on Neural Information Processing Systems (NIPS 2017) [3] Sutskever et al. Sequence to Sequence Learning with Neural Networks Conference on Neural Information Processing Systems (NIPS 2014) [3] Tensor2tensor https://github.com/tensorflow/tensor2tensor [4] OpenNMT http://opennmt.net/OpenNMT/ [5] The Annotated Transformer http://nlp.seas.harvard.edu/2018/04/03/attention.html