Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Continuous Delivery for Machine Learning Systems - ADDO

3f6ef57041f1429e9764eb6c72d0fecd?s=47 Adarsh Shah
November 12, 2020

Continuous Delivery for Machine Learning Systems - ADDO

Machine Learning workflow includes data management, experiment management (model training & development), model deployment, serving, and retraining. Training a model takes hours & some times days & typically deals with a large dataset. Training & serving a model also require special resources like high-density cores & GPUs.

In this talk, we will look at how Continuous Delivery for Machine Learning looks like using anecdotes and how to use cloud-native technologies to perform various steps in a Machine Learning workflow. We will also be talking about how it is different from deploying other software and what are the various aspects to consider. We will also be looking at different tools available to enable Continuous Delivery for machine learning.

3f6ef57041f1429e9764eb6c72d0fecd?s=128

Adarsh Shah

November 12, 2020
Tweet

Transcript

  1. Continuous Delivery for Machine Learning Systems Deploying ML Systems to

    Production safely and quickly in a sustainable way Adarsh Shah Engineering Leader, Coach, Hands-on Architect Independent Consultant @shahadarsh 
 https://shahadarsh.com
  2. https://shahadarsh.com @shahadarsh Hidden Technical Debt in ML Systems From the

    paper Hidden Technical Debt in Machine Learning Systems
  3. https://shahadarsh.com @shahadarsh 1 0 1 0 1 0 1 0

    1 Results Traditional Software Development Machine Learning Program Data { } 1 0 1 0 1 0 1 0 1 Desired Results Model Training Data { } Program { } 1 0 1 0 1 0 1 0 1 Live Data Training Prediction Results
  4. https://shahadarsh.com @shahadarsh Data Acquisition Data Preparation Model Development Training Prediction

    Accuracy Evaluation Data Management Experimentation Production Deployment Validation Monitoring / Alerting Accuracy not reached Retrain Data Drift Fix Accuracy reached
  5. shahadarsh.com @shahadarsh Challenges Unique to ML

  6. https://shahadarsh.com @shahadarsh #1: Data Management Data Location Large Datasets Security

    Compliance Data Quality Tracking Dataset
  7. https://shahadarsh.com @shahadarsh #2: Experimentation Code Quality Research & 
 Experimentation

    Tracking experiments Training Time 
 & Troubleshooting Infrastructure 
 Requirements Model Accuracy Evaluation
  8. https://shahadarsh.com @shahadarsh #3: Production Deployment Offline/Online 
 Prediction Monitoring &

    Alerting
  9. https://shahadarsh.com @shahadarsh What is Continuous Delivery? Continuous Delivery is the

    ability to get changes of all types—including new features, configuration changes, bug fixes and experiments—into production, or into the hands of users, safely and quickly in a sustainable way. - Jez Humble & Dave Farley 
 (Continuous Delivery Book Authors)
  10. https://shahadarsh.com @shahadarsh Continuous Delivery

  11. https://shahadarsh.com @shahadarsh Continuous Integration Continuous Integration is a software development

    practice where members of a team integrate their work frequently, usually each person integrates at least daily - leading to multiple integrations per day. - Martin Fowler
  12. https://shahadarsh.com @shahadarsh Principles of Continuous Delivery ๏ Build quality in

    ๏ Work in small batches ๏ Computers perform repetitive tasks, people solve problems ๏ Relentlessly pursue continuous improvement (Kaizen) ๏ Everyone is responsible
  13. https://shahadarsh.com @shahadarsh Data pipeline Data Source A Data Source B

    Data Source C Data Acquisition A Data Validation
 A Data Preparation
 A Training 
 Dataset Versioned Training Process Testing Data Acquisition B Data Validation
 B Data Preparation
 B Data Acquisition C Data Validation
 C Data Preparation
 C Bias & Fairness —— Security 
 & Compliance
  14. https://shahadarsh.com @shahadarsh Static Analysis Unit Tests Training Code Linting etc.

    Artifact Repository Build Artifact Continuous Integration (Training Code) Dev Environment Validation Tests Merge to 
 Main Branch
  15. https://shahadarsh.com @shahadarsh Data Pipeline Continuous Integration 
 (Training Code) Configuration

    Training 
 Dataset Training Environment Accuracy Evaluation Monitoring/ Alerting Testing (Bias & Fairness) Model Trigger Log Aggregation Automated 
 Provisioning/De-provisioning Data Scientist Training
  16. https://shahadarsh.com @shahadarsh Static Analysis Unit Tests Application Code Linting, Security

    Scan etc. Artifact Repository Build Artifact Ephemeral Environment Integration Tests Tag as Tested Model Continuous Integration (Application Code) Training
  17. https://shahadarsh.com @shahadarsh Data Management Experimentation Production Deployment Data Pipeline Continuous

    Integration 
 (Training Code) Data Scientist Configuration Training Model Continuous Integration 
 (Application Code) Deployment Production Environment Smoke Tests Monitoring /Alerting Application 
 Developer Bringing it all together Training 
 Dataset
  18. https://shahadarsh.com @shahadarsh Machine Learning Roles ML Researcher ML Engineer Data

    Engineer MLOps Engineer
  19. https://shahadarsh.com @shahadarsh Team Structure Considerations Cross Functional Team Separate Data

    Science Team ML Platform Engineering Team
  20. https://shahadarsh.com @shahadarsh Platforms

  21. https://shahadarsh.com @shahadarsh References • continuousdelivery.com • Dr. Deming’s 14 Points

    for Management • Challenges Deploying Machine Learning Models to Production • State of DevOps Report • martinfowler.com • Large image datasets: A pyrrhic win for computer vision?
  22. https://shahadarsh.com @shahadarsh Book Recommendations

  23. https://shahadarsh.com @shahadarsh Adarsh Shah Engineering Leader, Coach, Hands-on Architect Independent

    Consultant @shahadarsh 
 https://shahadarsh.com