Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Kubernetes for Machine Learning - Global AI Bootcamp Singapore 2018

Nilesh Gule
December 15, 2018

Kubernetes for Machine Learning - Global AI Bootcamp Singapore 2018

Slide deck of the presentation done for the Global AI Bootcamp at Singapore. https://globalaibootcampsg.azurewebsites.net
The talk covered how to industrialise deployment of Machine Learning model built using TensorFlow and Python. Using Azure Kubernetes Service (AKS) cluster with GPU support deployed a Kubernetes app using Kubeflow. Demonstrate the use of distributed training using TFJob.

Nilesh Gule

December 15, 2018
Tweet

More Decks by Nilesh Gule

Other Decks in Technology

Transcript

  1. $whoami { “name” : “Nilesh Gule”, “website” : “https://www.HandsOnArchitect.com", “github”

    : “https://github.com/NileshGule" “twitter” : “@nileshgule”, “linkedin” : “https://www.linkedin.com/in/nileshgule”, “email” : “[email protected]", “likes” : “Technical Evangelism, Cricket” }
  2. Agenda • Basic understanding of ML • TensorFlow • Docker,

    Kubernetes • Azure (RG, AKS, Files) • ML Workflow • Containerize ML Model • AKS Deployment • Deploy on GPU nodes • Distributed training Assumption #globalaibootcamp
  3. Challenges with ML • Sequential & slow training • Hard

    to setup distributed training • High cost • Reproducibility • Scalability
  4. Kubernetes to the rescue Scale deployment to multi node cluster

    GPU support Better utilization of resources Distributed training
  5. Simplify ML deployment with Kubeflow Make ML workflows on Kubernetes

    • simple, • portable • Scalable Best of breed open source system for ML Runs on diverse infrastructures
  6. TFJob & Kubernetes CRD • Kubernetes Custom Resource (CRD) •

    Contains • Chief – orchestrates training & checkpointing • Ps – parameter servers • Worker – train the model • Evaluator – compute evaluation metrics • ReplicaSpec • Replicas • Template • restartPolicy
  7. References • DMTK - http://www.dmtk.io/ • Image classification with Mnist

    dataset • AKS Deployment Tutorial • Kubeflow • Ksonnet • Tensorflow • Azure AKS • Enable GPU for Kubernetes • TFJob • Kubernetes Custom Resource • ML pipelines with Docker • Open AI keynote