Kubeflow: portable and scalable machine learning using Jupyterhub and Kubernetes [PyData Delhi 2018]

Slide 1

Slide 1 text

Kubeflow: scalable and portable ML Akash Tandon, Data engineering@SocialCops Github: @analyticalmonk Twitter: @AkashTandon

Slide 2

Slide 2 text

Agenda - Need of DevOps for ML and Data Science (DataOps) - Containers and Kubernetes for ML - Opportunities and challenges - Kubeflow: composable, portable and scalable ML - Components - Low bar, high ceiling - Issues and roadmap - Summary and demo

Slide 3

Slide 3 text

Current ML workflow What you think

Slide 4

Slide 4 text

Current ML workflow The reality Source: https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf

Slide 5

Slide 5 text

DataOps - DevOps in Data Science and ML DataOps is an automated, process-oriented methodology, used by analytic and data teams to improve the quality and reduce the cycle time of data analytics. DataOps manifesto: http://dataopsmanifesto.org

Slide 6

Slide 6 text

DataOps - DevOps in Data Science and ML

Slide 7

Slide 7 text

We need tools that are great at DevOps

Slide 8

Slide 8 text

Enter containers and Kubernetes

Slide 9

Slide 9 text

Containers ● Containers allow you to easily package an application's code, configurations, and dependencies into easy to use building blocks. ● These building blocks deliver environmental consistency, operational efficiency, developer productivity, and version control. ● To put it simply, your code runs in any environment!

Slide 10

Slide 10 text

But managing multiple containers can be a pain. That’s where K8s steps in.

Slide 11

Slide 11 text

Kubernetes ● Kubernetes is an orchestration manager for containers. ● It orchestrates computing, network and storage. ● Simply put, it makes your life easier when working with containers.

Slide 12

Slide 12 text

Sample K8s manifest

Slide 13

Slide 13 text

But there’s a catch.

Slide 14

Slide 14 text

Steep DevOps learning curve ● Containers ● Kubernetes primitives ● Persistent storage ● APIs ● Cloud platforms ● and it goes on...

Slide 15

Slide 15 text

DevOps practitioners don’t know enough Data Science. Data Scientists don’t know enough DevOps. And we don’t want them to!

Slide 16

Slide 16 text

How do we get DevOps goodness Without driving data teams crazy?!

Slide 17

Slide 17 text

Enter Kubeflow

Slide 18

Slide 18 text

Kubeflow ● ML toolkit for Kubernetes ● Open-source and community-driven ● Support for multiple ML frameworks ● End-to-end workflows which can be shared, scaled and deployed Source: https://github.com/kubeflow/kubeflow/issues/187

Slide 19

Slide 19 text

Low bar, high ceiling ● Low bar: allow data science practitioners to get up and running on Kubernetes cluster even without DevOps know-how. ● High ceiling: allow sysdmins and DevOps practitioners to modify defaults and extend the framework as needed.

Slide 20

Slide 20 text

Components ● Jupyterhub (collaboration and interactivity) ● K8s- native tensorflow controller (model building) ● K8s- native tensorflow serving deployment (model deployment) ● Ambassador (reverse proxy) ● Current and upcoming components for model tuning, model building and much more... ● Out-of-the-box setup for putting all of this together!

Slide 21

Slide 21 text

Jupyterhub

Slide 22

Slide 22 text

Tensorflow - Open source numerical computing and ML - Developed by Google, open-sourced in 2015 - Huge community and ecosystem - Support for multiple ML models - Tf-serving (model deployment), tensorboard (training visualization), etc. - Supports distributed training and deployment of models

Slide 23

Slide 23 text

Why Kubeflow? Based on current functionality you should consider using Kubeflow if: ● You want to train/serve TensorFlow models in different environments (e.g. local, on prem, and cloud) ● You want to use Jupyter notebooks to manage TensorFlow training jobs ● You want to launch training jobs that use resources – such as additional CPUs or GPUs – that aren’t available on your personal computer ● You want to combine TensorFlow with other processes ○ For example, you may want to use tensorflow/agents to run simulations to generate data for training reinforcement learning models. Refer https://www.kubeflow.org/docs/started/getting-started/ for more info.

Slide 24

Slide 24 text

Demo - Kubeflow tutorial using a sequence-to-sequence model - Based on Hamel Husain’s wonderful post: How to create data products that are magical using sequence-to-sequence models - Github repo: https://github.com/kubeflow/examples/tree/master/github_issue_summarization - Let’s get started!

Slide 25

Slide 25 text

Demo

Slide 26

Slide 26 text

Road ahead - Get the entry (bar)rier lower - Multi-tenancy on Kubernetes - Support for different ML libraries/packages - PyTorch - Caffe2 - Mxnet - v1.0 to be launched by December 2018

Slide 27

Slide 27 text

Find out more - Official website: https://www.kubeflow.org/ - Github: https://github.com/kubeflow/kubeflow - Katacoda tutorials: https://www.katacoda.com/kubeflow/

Slide 28

Slide 28 text

Reach out at Email: [email protected], [email protected] Twitter: @AkashTandon Github: @analyticalmonk