Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Continuous Delivery for Machine Learning Systems - DevOpsDays Warsaw

3f6ef57041f1429e9764eb6c72d0fecd?s=47 Adarsh Shah
November 19, 2020

Continuous Delivery for Machine Learning Systems - DevOpsDays Warsaw

Machine Learning workflow includes data management, experiment management (model training & development), model deployment, serving, and retraining. Training a model takes hours & some times days & typically deals with a large dataset. Training & serving a model also require special resources like high-density cores & GPUs.

In this talk, we will look at how Continuous Delivery for Machine Learning looks like using anecdotes and how to use cloud-native technologies to perform various steps in a Machine Learning workflow. We will also be talking about how it is different from deploying other software and what are the various aspects to consider. We will also be looking at different tools available to enable Continuous Delivery for machine learning.

3f6ef57041f1429e9764eb6c72d0fecd?s=128

Adarsh Shah

November 19, 2020
Tweet

Transcript

  1. Continuous Delivery for Machine Learning Systems Deploying ML Systems to

    Production safely and quickly in a sustainable way Adarsh Sha h Engineering Leader, Coach, Hands-on Architec t Independent Consultan t @shahadarsh 
 https://shahadarsh.com Deck: http://bit.ly/ml-dod-pl
  2. https://shahadarsh.com @shahadarsh Deck: http://bit.ly/ml-dod-pl Hidden Technical Debt in ML Systems

    From the paper Hidden Technical Debt in Machine Learning Systems
  3. https://shahadarsh.com @shahadarsh Deck: http://bit.ly/ml-dod-pl 1 0 1 0 1 0

    1 0 1 Results Traditional Software Development Machine Learning Program Data { } 1 0 1 0 1 0 1 0 1 Desired Results Model Training Data { } Program { } 1 0 1 0 1 0 1 0 1 Live Data Training Prediction Results
  4. https://shahadarsh.com @shahadarsh Deck: http://bit.ly/ml-dod-pl Data Acquisition Data Preparation Model Development

    Training Prediction Accuracy Evaluation Data Management Experimentation Production Deployment Validation Monitoring / Alerting Accuracy not reached Retrain Data Drift Fix Accuracy reached
  5. shahadarsh.com @shahadarsh Deck: http://bit.ly/ml-dod-pl Challenges Unique to ML

  6. https://shahadarsh.com @shahadarsh Deck: http://bit.ly/ml-dod-pl #1: Data Management Data Location Large

    Datasets Security Compliance Data Quality Tracking Dataset
  7. https://shahadarsh.com @shahadarsh Deck: http://bit.ly/ml-dod-pl #2: Experimentation Code Quality Research &

    
 Experimentation Tracking experiments Training Time 
 & Troubleshooting Infrastructure 
 Requirements Model Accuracy Evaluation
  8. https://shahadarsh.com @shahadarsh Deck: http://bit.ly/ml-dod-pl #3: Production Deployment Of f l

    ine/Online 
 Prediction Monitoring & Alerting
  9. https://shahadarsh.com @shahadarsh Deck: http://bit.ly/ml-dod-pl What is Continuous Delivery? Continuous Delivery

    is the ability to get changes of all types—including new features, con f i guration changes, bug f i xes and experiments—into production, or into the hands of users, safely and quickly in a sustainable way . - Jez Humble & Dave Farley 
 (Continuous Delivery Book Authors)
  10. https://shahadarsh.com @shahadarsh Deck: http://bit.ly/ml-dod-pl Continuous Delivery

  11. https://shahadarsh.com @shahadarsh Deck: http://bit.ly/ml-dod-pl Continuous Integration Continuous Integration is a

    software development practice where members of a team integrate their work frequently, usually each person integrates at least daily - leading to multiple integrations per day . - Martin Fowler
  12. https://shahadarsh.com @shahadarsh Deck: http://bit.ly/ml-dod-pl Principles of Continuous Delivery ๏ Build

    quality i n ๏ Work in small batche s ๏ Computers perform repetitive tasks, people solve problem s ๏ Relentlessly pursue continuous improvement (Kaizen ) ๏ Everyone is responsible
  13. https://shahadarsh.com @shahadarsh Deck: http://bit.ly/ml-dod-pl Data pipeline Dat a Source A

    Dat a Source B Dat a Source C Data Acquisition A Data Validation
 A Data Preparation
 A Training 
 Dataset Versioned Training Process Testing Data Acquisition B Data Validation
 B Data Preparation
 B Data Acquisition C Data Validation
 C Data Preparation
 C Bias & Fairness — — Security 
 & Compliance
  14. https://shahadarsh.com @shahadarsh Deck: http://bit.ly/ml-dod-pl Static Analysis Unit Tests Training Code

    Linting etc. Artifact Repository Build Artifact Continuous Integration (Training Code) Dev Environment Validation Tests Merge to 
 Main Branch
  15. https://shahadarsh.com @shahadarsh Deck: http://bit.ly/ml-dod-pl Data Pipeline Continuous Integration 
 (Training

    Code) Con f i guration Training 
 Dataset Training Environment Accuracy Evaluation Monitoring/ Alerting Testing (Bias & Fairness) Model Trigger Log Aggregation Automated 
 Provisioning/De-provisioning Data Scientist Training
  16. https://shahadarsh.com @shahadarsh Deck: http://bit.ly/ml-dod-pl Static Analysis Unit Tests Application Code

    Linting, Security Scan etc. Artifact Repository Build Artifact Ephemeral Environment Integration Tests Tag as Tested Model Continuous Integration (Application Code) Training
  17. https://shahadarsh.com @shahadarsh Deck: http://bit.ly/ml-dod-pl Data Management Experimentation Production Deployment Data

    Pipeline Continuous Integration 
 (Training Code) Data Scientist Con f i guration Training Model Continuous Integration 
 (Application Code) Deployment Production Environment Smoke Tests Monitoring /Alerting Application 
 Developer Bringing it all together Training 
 Dataset
  18. https://shahadarsh.com @shahadarsh Deck: http://bit.ly/ml-dod-pl Machine Learning Roles ML Researcher ML

    Engineer Data Engineer MLOps Engineer
  19. https://shahadarsh.com @shahadarsh Deck: http://bit.ly/ml-dod-pl Team Structure Considerations Cross Functional Team

    Separate Data Science Team ML Platform Engineering Team
  20. https://shahadarsh.com @shahadarsh Deck: http://bit.ly/ml-dod-pl References • continuousdelivery.com • Dr. Deming’s

    14 Points for Management • Challenges Deploying Machine Learning Models to Production • State of DevOps Report • martinfowler.com • Large image datasets: A pyrrhic win for computer vision?
  21. https://shahadarsh.com @shahadarsh Deck: http://bit.ly/ml-dod-pl Book Recommendations

  22. https://shahadarsh.com @shahadarsh Deck: http://bit.ly/ml-dod-pl Adarsh Sha h Engineering Leader, Coach,

    Hands-on Architec t Independent Consultan t @shahadarsh 
 https://shahadarsh.com