Slide 1

Slide 1 text

Deploying Machine Learning Systems to Production Challenges & Solutions using MLOps Adarsh Shah Engineering Leader, Coach, Hands-on Architect Independent Consultant @shahadarsh shahadarsh.com

Slide 2

Slide 2 text

shahadarsh.com @shahadarsh Traditional Software Development vs Machine Learning

Slide 3

Slide 3 text

shahadarsh.com @shahadarsh Machine Learning Workflow Data Acquisition Data Preparation Model Development Model Training Model Serving Accuracy Evaluation Code
 Changes Retraining Data Management Experimentation Production Deployment

Slide 4

Slide 4 text

shahadarsh.com @shahadarsh Fun Fact about Model training Everybody has. They just don’t realize it. Have you ever helped train a ML model?

Slide 5

Slide 5 text

shahadarsh.com @shahadarsh Hidden Technical Debt in ML Systems From the paper Hidden Technical Debt in Machine Learning Systems

Slide 6

Slide 6 text

shahadarsh.com @shahadarsh Challenges Unique to ML

Slide 7

Slide 7 text

shahadarsh.com @shahadarsh #1: Data Management Data Location Large Datasets Security Compliance

Slide 8

Slide 8 text

shahadarsh.com @shahadarsh #2: Constant Research and Experimentation Code Quality Experimentation Notebooks Tracking experiments

Slide 9

Slide 9 text

shahadarsh.com @shahadarsh #3: Training Process Retraining Training Time Reproducibility

Slide 10

Slide 10 text

shahadarsh.com @shahadarsh #4: Infrastructure Requirements Edge devices GPU & 
 high density cores Costs Elasticity

Slide 11

Slide 11 text

shahadarsh.com @shahadarsh #5: Testing Model Accuracy Data Validation Model
 Bias & Fairness

Slide 12

Slide 12 text

shahadarsh.com @shahadarsh #6: Dependency Hell Dependency Hell ARM architecture

Slide 13

Slide 13 text

shahadarsh.com @shahadarsh MLOps

Slide 14

Slide 14 text

shahadarsh.com @shahadarsh MLOps MLOps = Machine Learning + DevOps People Process + Technology +

Slide 15

Slide 15 text

shahadarsh.com @shahadarsh Roles ML Researcher ML Engineer Data Engineer MLOps Engineer

Slide 16

Slide 16 text

shahadarsh.com @shahadarsh Team Structure considerations Cross functional Team Separate Data Science Team ML Platform Engineering Team

Slide 17

Slide 17 text

shahadarsh.com @shahadarsh Data pipeline Data Source A Data Acquisition A Data Preparation A Training 
 Dataset Data Validation A Data Source B Data Source N Data Acquisition B Data Acquisition N Data Preparation B Data Preparation N Data Validation B Data Validation N Input Training Process Input

Slide 18

Slide 18 text

shahadarsh.com @shahadarsh Training Pipeline Training
 Code Continuous 
 Integration Training Data Data Pipeline Pre-trained
 Weights Validation Artifact Repository Push image Training Environment Cloud, On-Prem 
 or Edge location Infra provisioning automation GPU support Monitoring/ Logging/Alerting UI or 
 command Schedule Training Bias & Fairness Testing Build & Version
 Model

Slide 19

Slide 19 text

shahadarsh.com @shahadarsh Deployment Pipeline GitOps Monitoring/ Logging/Alerting Artifact Repository Pull Model
 Image Model 
 Training Retrain Model
 (if accuracy 
 below acceptable %) Push to Master Infra provisioning automation GPU support Model Serving 
 Environment Cloud, On-Prem 
 or Edge location Deploy Model Evaluate Model Accuracy Periodic

Slide 20

Slide 20 text

shahadarsh.com @shahadarsh Platforms available

Slide 21

Slide 21 text

shahadarsh.com @shahadarsh Platforms

Slide 22

Slide 22 text

shahadarsh.com @shahadarsh Kubeflow

Slide 23

Slide 23 text

shahadarsh.com @shahadarsh To sum it up • Machine Learning Workflow • Traditional Software Development vs Machine Learning • Unique ML Challenges • Data Management • Constant Research and Experimentation • Training Process • Infrastructure Requirements • Testing • Dependency Hell • MLOps • Roles & Team Structure Considerations • Data, Training & Deployment Pipeline • Platforms Available

Slide 24

Slide 24 text

Questions Adarsh Shah Engineering Leader, Coach, Hands-on Architect Independent Consultant @shahadarsh shahadarsh.com