Slide 1

Slide 1 text

Kubeflow Kale @leriomaggio [email protected] @leriomaggio Valerio Maggio Scaling Jupyter Notebooks to Complex Pipelines Stefano Fioravanzo

Slide 2

Slide 2 text

ML + Dev + Ops

Slide 3

Slide 3 text

Reference for the Introductory (MLOps) part

Slide 4

Slide 4 text

ML is hard!

Slide 5

Slide 5 text

“toy” ML Pipeline Data Loading Preprocessing Model 
 learning API Interface
 (Model Serving) Adapted from “What about tests in Machine Learning projects?” Sarah Diot-Girard - EuroSciPy 2019

Slide 6

Slide 6 text

Data Analysis Data Transformation Data Validation Data Splitting Model Building Model Validation Train 
 at scale Roll-out Serving Monitoring Data Loading Logging Preprocessing Model Learning Model Serving Adapted from “Managing Machine Learning in Production with Kubeflow and MLOps” David Aronchick - KubeCon EU 2019

Slide 7

Slide 7 text

Adapted from “Managing Machine Learning in Production with Kubeflow and MLOps” David Aronchick - KubeCon EU 2019

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

Cowboys and Rangers can be friends Adapted from “Managing Machine Learning in Production with Kubeflow and MLOps” David Aronchick - KubeCon EU 2019

Slide 11

Slide 11 text

Adapted from “Managing Machine Learning in Production with Kubeflow and MLOps” David Aronchick - KubeCon EU 2019

Slide 12

Slide 12 text

ML/DL Specialised Infrastructure 1. Specialised Chipset 2. Specialised Software 3. (Specialised?) Computing Platform

Slide 13

Slide 13 text

State of the Art Technologies Support Machine Learning workloads in the Cloud Cloud
 Agnostic Polyaxon Kubeflow Kubernetes

Slide 14

Slide 14 text

Kubernetes Kubernetes Master API Server Scheduler Controllers Cluster State Kubernetes Node1 Node2 Node3 Remote Configuration Cluster Settings Declarative API Kubernetes: Container Orchestration Engine Developed internally at Google and open sourced in 2014

Slide 15

Slide 15 text

Extending Core Components Kubernetes Supported Languages: • GoLang • Python • Javascript • Java • Rust • … Remote Configuration Build custom component Kubernetes Master API Server Scheduler Controllers Cluster State CustomComp Automate Application specific services and tasks with custom components

Slide 16

Slide 16 text

Polyaxon Platform for managing the whole lifecycle of large scale deep learning and machine learning applications. https://github.com/polyaxon/polyaxon

Slide 17

Slide 17 text

Distributed Training Hyperparameter Tuning Model Serving Jupyter Notebook Kubernetes Kubeflow Infrastructure Infrastructure management via declarative API Kubeflow: Developed internally at Google and open sourced in 2016 https://github.com/kubeflow/kubeflow Kubeflow Kubernetes Components for Machine Learning

Slide 18

Slide 18 text

Distributed DL Training + Example of distributed deep learning training on Kubeflow Worker1 TF Distributed Training model.py DL Training Controller Cluster Settings Deploy Kubernetes Master API Server Similar process for other DL frameworks: PyTorch, MXNet, Caffe, … Worker2 Parameter Server User provided model

Slide 19

Slide 19 text

No content

Slide 20

Slide 20 text

Introducing Kubeflow Pipelines: announced by Google Cloud. 
 A platform for building and deploying portable and scalable end-to-end ML workflows, based on containers. pipelines The Kubeflow Pipelines platform consists of: • User interface for managing and tracking experiments, jobs, and runs • Engine for scheduling multi-step ML workflows • SDK for defining and manipulating pipelines and components • Jupyter Notebooks for interacting with the system using the SDK • Integration with the other tools in the Kubeflow Toolkit 
 (es: tf-operator for distributed training) https://github.com/kubeflow/pipelines/

Slide 21

Slide 21 text

@dsl.pipeline( name=‘ML Workflow', ) def xgb_train_pipeline( output, project, region='us-central1', train_data='gs://ml-pipeline-playground/sfpd/train.csv', …, ): with dsl.ExitHandler(exit_op=delete_cluster_op): create_cluster_op = CreateClusterOp('create-cluster', project, region, output) analyze_op = AnalyzeOp('analyze', project, region, create_cluster_op.output, schema, train_data, '%s/{{workflow.name}}/analysis' % output) transform_op = TransformOp('transform', project, region, create_cluster_op.output, train_data, eval_data, target, analyze_op.output, '%s/{{workflow.name}}/transform' % output) train_op = TrainerOp('train', project, region, create_cluster_op.output, transform_op.outputs['train'],transform_op.outputs['eval'], target, analyze_op.output, workers, rounds, '%s/{{workflow.name}}/model' % output) predict_op = PredictOp('predict', project, region, create_cluster_op.output, transform_op.outputs['eval'], train_op.output, target, analyze_op.output, '%s/{{workflow.name}}/predict' % output) cm_op = ConfusionMatrixOp('confusion-matrix', predict_op.output, '%s/{{workflow.name}}/confusionmatrix' % output) roc_op = RocOp('roc', predict_op.output, true_label, '%s/{{workflow.name}}/roc' % output)

Slide 22

Slide 22 text

Kubeflow Python SDK Standalone Python function Create Lightweight Components as Python Standalone functions transpose SDK

Slide 23

Slide 23 text

create-matrices transpose matmul Authoring Pipelines SDK

Slide 24

Slide 24 text

KALE (/ˈkeɪliː/) Kubeflow Automated PipeLines Engine https://kubeflow-kale.github.io

Slide 25

Slide 25 text

KALE Abstracting from KubeFlow Pipelines SDK Kale Kale From (local) Jupyter Notebooks to (remote) Kubeflow Pipelines Kubeflow

Slide 26

Slide 26 text

Jupyter Notebook to KFP Local Jupyter Notebook On cloud Kubeflow Pipeline Kale transpose matmul create-matrices

Slide 27

Slide 27 text

No content

Slide 28

Slide 28 text

Kale recognised list of tags https://github.com/kubeflow-kale/jupyterlab-kubeflow-kale Jupyter-lab extension

Slide 29

Slide 29 text

nbparser static_analyzer marshal codegen Kale main components Kale Derive pipeline structure Identify dependencies Inject data objects Generate & Deploy Pipeline networkx Pyflakes Odo dill Jinja2 support for HyperParam Tuning

Slide 30

Slide 30 text

static_analysis Pipeline Step create-matrices Pipeline Step transpose Pipeline Step matmul Kale Pipeline Step create-matrices Pipeline Step transpose Pipeline Step matmul marshal

Slide 31

Slide 31 text

Marshalling PatternDispatcher TypeDispatcher

Slide 32

Slide 32 text

codegen Sample template used to generate a standalone function Kale Code generation via templates

Slide 33

Slide 33 text

Examples https://github.com/kubeflow-kale/kale/tree/master/examples https://github.com/kubeflow-kale/examples

Slide 34

Slide 34 text

Thank you very much 
 for your kind attention @leriomaggio [email protected] @leriomaggio https://kubeflow-kale.github.io