Upgrade to Pro — share decks privately, control downloads, hide ads and more …

OSSummit 2017 - Using Docker containers to serve Deep Learning predictions - Sahil Dua

Sahil Dua
October 24, 2017

OSSummit 2017 - Using Docker containers to serve Deep Learning predictions - Sahil Dua

Each day, over 1.5 million room nights are reserved on Booking.com. That gives us access to huge amount of data which we can utilize in order to provide a better experience to our customers.

We understand that while there are a lot of machine learning frameworks and libraries available, putting the models in production at large scale is still a challenge. I’d like to talk about how we took on the challenge of deploying deep learning models in production: how we chose our tools and developed our internal deep learning infrastructure. I’ll cover how we do model training in Docker containers, distributed TensorFlow training in a cluster of containers, automated re-training of models and finally - deployment of models using Kubernetes. I’ll also talk about how we optimize our model prediction infrastructure for latency or throughput depending on the use case.

Sahil Dua

October 24, 2017
Tweet

More Decks by Sahil Dua

Other Decks in Technology

Transcript

  1. @sahildua2305 @sahildua2305 whoami ➔ Backend Developer - Deep Learning Infrastructure

    ➔ Machine Learning Enthusiast ➔ Open Source Contributor (Git, Pandas, Kinto, go-github, etc.) ➔ Tech Speaker
  2. @sahildua2305 @sahildua2305 ➔ Deep Learning at Booking.com ➔ Life-cycle of

    a model ➔ Training Models ➔ Serving Predictions Agenda
  3. @sahildua2305 @sahildua2305 Image Tagging Sea view: 6.38 Balcony/Terrace: 4.82 Photo

    of the whole room: 4.21 Bed: 3.47 Decorative details: 3.15 Seating area: 2.70
  4. @sahildua2305 @sahildua2305 Image Tagging Using the image tag information in

    the right context Swimming pool, Breakfast Buffet, etc.
  5. @sahildua2305 @sahildua2305 Machine Learning workload ➔ Computationally intensive workload ➔

    Often not highly parallelizable algorithms ➔ 10 to 100 GBs of data
  6. @sahildua2305 @sahildua2305 Why k8s – GPUs? ➔ In alpha since

    1.3 ➔ Speed up 20X-50X resources: limits: alpha.kubernetes.io/nvidia-gpu: 1
  7. @sahildua2305 @sahildua2305 Training with k8s ➔ Base images with ML

    frameworks ◆ TensorFlow, Torch, VowpalWabbit, etc. ➔ Training code is installed at start time ➔ Data access - Hadoop (or PVs)
  8. @sahildua2305 @sahildua2305 Serving Predictions ➔ Stateless app with common code

    ➔ Containerized ➔ No model in image ➔ REST API for predictions
  9. @sahildua2305 @sahildua2305 Serving Predictions ➔ Get trained model from Hadoop

    ➔ Load model in memory ➔ Warm it up ➔ Expose HTTP API ➔ Respond to the probes
  10. @sahildua2305 @sahildua2305 Deploying a new model ➔ Create new Deployment

    ➔ Create new HTTP Route ➔ Wait for liveness/readiness probe
  11. @sahildua2305 @sahildua2305 Optimizing for Latency ➔ Do not predict if

    you can precompute ➔ Reduce Request Overhead ➔ Predict for one instance ➔ Quantization (float 32 => fixed 8) ➔ TensorFlow specific: freeze network & optimize for inference
  12. @sahildua2305 @sahildua2305 Optimizing for Throughput ➔ Do not predict if

    you can precompute ➔ Batch requests ➔ Parallelize requests
  13. @sahildua2305 @sahildua2305 Summary ➔ Training models in pods ➔ Serving

    models ➔ Optimizing serving for latency/throughput
  14. @sahildua2305 @sahildua2305 Next steps ➔ Tooling to control hundred deployments

    ➔ Autoscale prediction service ➔ Hyper parameter tuning for training
  15. @sahildua2305 @sahildua2305 Want to get in touch? LinkedIn @sahildua2305 GitHub

    Twitter @sahildua2305 @sahildua2305 Website www.sahildua.com