Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Kubeflow Kale: from Jupyter notebooks to Complex pipelines

Kubeflow Kale: from Jupyter notebooks to Complex pipelines

`Kubeflow Kale` lets you deploy Jupyter Notebooks that run on your laptop to Kubeflow Pipelines, without requiring any of the Kubeflow SDK boilerplate.

You can define pipelines by annotating notebook’s code cells and clicking a deployment button in the Jupyter UI.

Kale will take care of converting the Notebook to a valid Kubeflow Pipelines deployment, taking care of resolving data dependencies and managing the pipeline’s lifecycle.

Valerio Maggio

September 05, 2019
Tweet

More Decks by Valerio Maggio

Other Decks in Programming

Transcript

  1. Kubeflow Kale @leriomaggio valeriomaggio@gmail.com @leriomaggio Valerio Maggio Scaling Jupyter Notebooks

    to Complex Pipelines Stefano Fioravanzo
  2. ML + Dev + Ops

  3. Reference for the Introductory (MLOps) part

  4. ML is hard!

  5. “toy” ML Pipeline Data Loading Preprocessing Model 
 learning API

    Interface
 (Model Serving) Adapted from “What about tests in Machine Learning projects?” Sarah Diot-Girard - EuroSciPy 2019
  6. Data Analysis Data Transformation Data Validation Data Splitting Model Building

    Model Validation Train 
 at scale Roll-out Serving Monitoring Data Loading Logging Preprocessing Model Learning Model Serving Adapted from “Managing Machine Learning in Production with Kubeflow and MLOps” David Aronchick - KubeCon EU 2019
  7. Adapted from “Managing Machine Learning in Production with Kubeflow and

    MLOps” David Aronchick - KubeCon EU 2019
  8. None
  9. None
  10. Cowboys and Rangers can be friends Adapted from “Managing Machine

    Learning in Production with Kubeflow and MLOps” David Aronchick - KubeCon EU 2019
  11. Adapted from “Managing Machine Learning in Production with Kubeflow and

    MLOps” David Aronchick - KubeCon EU 2019
  12. ML/DL Specialised Infrastructure 1. Specialised Chipset 2. Specialised Software 3.

    (Specialised?) Computing Platform
  13. State of the Art Technologies Support Machine Learning workloads in

    the Cloud Cloud
 Agnostic Polyaxon Kubeflow Kubernetes
  14. Kubernetes Kubernetes Master API Server Scheduler Controllers Cluster State Kubernetes

    Node1 Node2 Node3 Remote Configuration Cluster Settings Declarative API Kubernetes: Container Orchestration Engine Developed internally at Google and open sourced in 2014
  15. Extending Core Components Kubernetes Supported Languages: • GoLang • Python

    • Javascript • Java • Rust • … Remote Configuration Build custom component Kubernetes Master API Server Scheduler Controllers Cluster State CustomComp Automate Application specific services and tasks with custom components
  16. Polyaxon Platform for managing the whole lifecycle of large scale

    deep learning and machine learning applications. https://github.com/polyaxon/polyaxon
  17. Distributed Training Hyperparameter Tuning Model Serving Jupyter Notebook Kubernetes Kubeflow

    Infrastructure Infrastructure management via declarative API Kubeflow: Developed internally at Google and open sourced in 2016 https://github.com/kubeflow/kubeflow Kubeflow Kubernetes Components for Machine Learning
  18. Distributed DL Training + Example of distributed deep learning training

    on Kubeflow Worker1 TF Distributed Training model.py DL Training Controller Cluster Settings Deploy Kubernetes Master API Server Similar process for other DL frameworks: PyTorch, MXNet, Caffe, … Worker2 Parameter Server User provided model
  19. None
  20. Introducing Kubeflow Pipelines: announced by Google Cloud. 
 A platform

    for building and deploying portable and scalable end-to-end ML workflows, based on containers. pipelines The Kubeflow Pipelines platform consists of: • User interface for managing and tracking experiments, jobs, and runs • Engine for scheduling multi-step ML workflows • SDK for defining and manipulating pipelines and components • Jupyter Notebooks for interacting with the system using the SDK • Integration with the other tools in the Kubeflow Toolkit 
 (es: tf-operator for distributed training) https://github.com/kubeflow/pipelines/
  21. @dsl.pipeline( name=‘ML Workflow', ) def xgb_train_pipeline( output, project, region='us-central1', train_data='gs://ml-pipeline-playground/sfpd/train.csv',

    …, ): with dsl.ExitHandler(exit_op=delete_cluster_op): create_cluster_op = CreateClusterOp('create-cluster', project, region, output) analyze_op = AnalyzeOp('analyze', project, region, create_cluster_op.output, schema, train_data, '%s/{{workflow.name}}/analysis' % output) transform_op = TransformOp('transform', project, region, create_cluster_op.output, train_data, eval_data, target, analyze_op.output, '%s/{{workflow.name}}/transform' % output) train_op = TrainerOp('train', project, region, create_cluster_op.output, transform_op.outputs['train'],transform_op.outputs['eval'], target, analyze_op.output, workers, rounds, '%s/{{workflow.name}}/model' % output) predict_op = PredictOp('predict', project, region, create_cluster_op.output, transform_op.outputs['eval'], train_op.output, target, analyze_op.output, '%s/{{workflow.name}}/predict' % output) cm_op = ConfusionMatrixOp('confusion-matrix', predict_op.output, '%s/{{workflow.name}}/confusionmatrix' % output) roc_op = RocOp('roc', predict_op.output, true_label, '%s/{{workflow.name}}/roc' % output)
  22. Kubeflow Python SDK Standalone Python function Create Lightweight Components as

    Python Standalone functions transpose SDK
  23. create-matrices transpose matmul Authoring Pipelines SDK

  24. KALE (/ˈkeɪliː/) Kubeflow Automated PipeLines Engine https://kubeflow-kale.github.io

  25. KALE Abstracting from KubeFlow Pipelines SDK Kale Kale From (local)

    Jupyter Notebooks to (remote) Kubeflow Pipelines Kubeflow
  26. Jupyter Notebook to KFP Local Jupyter Notebook On cloud Kubeflow

    Pipeline Kale transpose matmul create-matrices
  27. None
  28. Kale recognised list of tags https://github.com/kubeflow-kale/jupyterlab-kubeflow-kale Jupyter-lab extension

  29. nbparser static_analyzer marshal codegen Kale main components Kale Derive pipeline

    structure Identify dependencies Inject data objects Generate & Deploy Pipeline networkx Pyflakes Odo dill Jinja2 support for HyperParam Tuning
  30. static_analysis Pipeline Step create-matrices Pipeline Step transpose Pipeline Step matmul

    Kale Pipeline Step create-matrices Pipeline Step transpose Pipeline Step matmul marshal
  31. Marshalling PatternDispatcher TypeDispatcher

  32. codegen Sample template used to generate a standalone function Kale

    Code generation via templates
  33. Examples https://github.com/kubeflow-kale/kale/tree/master/examples https://github.com/kubeflow-kale/examples

  34. Thank you very much 
 for your kind attention @leriomaggio

    valeriomaggio@gmail.com @leriomaggio https://kubeflow-kale.github.io