Upgrade to Pro — share decks privately, control downloads, hide ads and more …

EuroPython 2017: How Booking.com serves Deep Learning Predictions at Large Scale by Sahil Dua

EuroPython 2017: How Booking.com serves Deep Learning Predictions at Large Scale by Sahil Dua

With so many machine learning frameworks and libraries available, writing a model isn’t a bottleneck anymore while putting your models in production is still a challenge.

In this talk, you will learn how we deploy the python deep learning models in production at Booking.com.

Topics will include:

- Deep Learning model training in Docker containers
- Automated retraining of models
- Deployment of models using Kubernetes
- Serving model predictions in containerized environment
- Optimising serving predictions for latency and throughput

Sahil Dua

July 13, 2017
Tweet

More Decks by Sahil Dua

Other Decks in Technology

Transcript

  1. @sahildua2305 I am ... ➔ Backend Developer developing Deep Learning

    Infrastructure ➔ Machine Learning Enthusiast ➔ Open Source Contributor (Git, Pandas, Kinto, go-github, etc.) ➔ Tech Speaker I am not ... ➔ A Data Scientist ➔ A Machine Learning Expert
  2. @sahildua2305 Agenda ➔ Applications of Deep Learning ➔ Life-cycle of

    a model ➔ Our Deep Learning Production Pipeline
  3. @sahildua2305 Image Tagging Sea view: 6.38 Balcony/Terrace: 4.82 Photo of

    the whole room: 4.21 Bed: 3.47 Decorative details: 3.15 Seating area: 2.70
  4. @sahildua2305 Image Tagging Using the image tag information in the

    right context Swimming pool, Breakfast Buffet, etc.
  5. @sahildua2305 Recommendation Engine User X booked hotel Y User Z

    ... ? Objective: Find probability of booking a hotel User Features: country, language, etc. Contextual Features: day of week, season, etc. Item Features: price, location of the hotel, etc.
  6. @sahildua2305 Deploying a Model ➔ Python app running in container

    ➔ Model weights from Hadoop storage ➔ Loads model in memory ➔ Get a nice URL to get predictions
  7. @sahildua2305 Deploying a Model App App App App App App

    Load Balancer Input Features Prediction Client
  8. @sahildua2305 Optimizing for Latency ➔ Do not predict if you

    can precompute ➔ Reduce Request Overhead ➔ Predict for one instance ➔ Quantization (float 32 => fixed 8) ➔ TensorFlow specific: freeze network & optimize for inference
  9. @sahildua2305 Optimizing for Throughput ➔ Do not predict if you

    can precompute ➔ Batch requests ➔ Parallelize requests
  10. @sahildua2305 Summary ➔ Training models in containers ➔ Serving models

    from containers using Kubernetes ➔ Optimizing serving for latency/throughput
  11. http://workingatbooking.com We are hiring! Roles • Software Developer • Data

    Scientist • ... Work with • MapReduce • Spark • Recommender Systems • NLP • ...
  12. @sahildua2305 Want to get in touch? LinkedIn @sahildua2305 GitHub Twitter

    @sahildua2305 @sahildua2305 Website www.sahildua.com