Slide 1

Slide 1 text

@useautomation | DevOpsDays Austin A Tale of Two Ops: How MLOps can learn from DevOps Andre Elizondo Solutions Architect @ WhyLabs

Slide 2

Slide 2 text

@useautomation | DevOpsDays Austin Who am I? ● Seattle, WA ● Recovering Sysadmin, SRE, Evangelist ○ Chef, Adobe, Datadog, Big Fish Games, Lacework, etc. ● >10 yrs part of the DevOps community

Slide 3

Slide 3 text

@useautomation | DevOpsDays Austin

Slide 4

Slide 4 text

@useautomation | DevOpsDays Austin

Slide 5

Slide 5 text

@useautomation | DevOpsDays Austin

Slide 6

Slide 6 text

@useautomation | DevOpsDays Austin https://mattturck.com/landscape/mad2023.pdf

Slide 7

Slide 7 text

@useautomation | DevOpsDays Austin https://mattturck.com/landscape/mad2023.pdf

Slide 8

Slide 8 text

@useautomation | DevOpsDays Austin Machine Learning is the new OpenStack

Slide 9

Slide 9 text

@useautomation | DevOpsDays Austin

Slide 10

Slide 10 text

@useautomation | DevOpsDays Austin We’re here

Slide 11

Slide 11 text

@useautomation | DevOpsDays Austin We’re here We’ll get here soon

Slide 12

Slide 12 text

@useautomation | DevOpsDays Austin Machine Learning is the new shadow IT

Slide 13

Slide 13 text

@useautomation | DevOpsDays Austin We’re at risk of another pig over the fence

Slide 14

Slide 14 text

@useautomation | DevOpsDays Austin We’re at risk of another pig over the fence

Slide 15

Slide 15 text

@useautomation | DevOpsDays Austin What is MLOps? ● Applying DevOps practices & culture in Machine Learning ● A process that involves multiple teams/silos https://ml-ops.org/content/mlops-principles

Slide 16

Slide 16 text

@useautomation | DevOpsDays Austin What is MLOps? ● Applying DevOps practices & culture in Machine Learning ● A process that involves multiple teams/silos ● Not AIOps https://ml-ops.org/content/mlops-principles

Slide 17

Slide 17 text

@useautomation | DevOpsDays Austin DevOps handles… ● Compute ● Networking ● Storage ● Release ● Service Reliability ● Security

Slide 18

Slide 18 text

@useautomation | DevOpsDays Austin DevOps handles… ● Compute ● Networking ● Storage ● Release ● Service Reliability ● Security ● ML Reliability

Slide 19

Slide 19 text

@useautomation | DevOpsDays Austin How is it similar to DevOps? ● Shared concepts, but different ○ CI/CD ○ Observability ○ Automation ○ Containers ● Huge silos between teams ○ Data Engineering ○ Data Scientists ○ ML Engineers ○ Product Managers ○ DevOps/SRE/Operations https://ml-ops.org/content/mlops-principles

Slide 20

Slide 20 text

@useautomation | DevOpsDays Austin CI/CD in MLOps ● Deploying your model ○ Testing is different ○ Scaling is different(ish) ○ Packaging is more or less the same ○ Continuous delivery is possible but harder ● ML Data Pipelines ○ Training ○ Feature ○ Inference https://ml-ops.org/content/mlops-principles

Slide 21

Slide 21 text

@useautomation | DevOpsDays Austin CI/CD in MLOps ● Deploying your model ○ Testing is different ○ Scaling is different(ish) ○ Packaging is more or less the same ○ Continuous delivery is possible but harder ● ML Data Pipelines ○ Training ○ Feature ○ Inference https://ml-ops.org/content/mlops-principles

Slide 22

Slide 22 text

@useautomation | DevOpsDays Austin Observability in MLOps ● Performance is important ○ Some similar metrics, some different ones ○ Threshold baselines are different ● Availability is important ○ Service availability isn’t enough ● External dependencies need to be monitored upstream ● Sometimes batch, sometimes real time https://www.oreilly.com/library/view/reliable-machine-learning/9781098106218/

Slide 23

Slide 23 text

@useautomation | DevOpsDays Austin Automation in MLOps ● Response workflows ○ Retraining ○ Roll-back ● Infrastructure as code ○ Terraform ● Monitoring as code https://ml-ops.org/content/mlops-principles

Slide 24

Slide 24 text

@useautomation | DevOpsDays Austin Containers in MLOps ● Dependency isolation ● Yes, it’s still kubernetes. ○ With all of it’s usual complaints. ● Sometimes controlled directly, most times through a platform ○ Kubeflow ○ Sagemaker ○ AzureML ○ Vertex ● Scaled for model serving, training, and pipelines

Slide 25

Slide 25 text

@useautomation | DevOpsDays Austin What is unique about MLOps? ● Machine learning systems tend to be: ○ Fragile to changes in data ○ Harder to test ○ Harder to scale ● Complex to measure ○ What is good vs what is bad? ○ You may not know if something is good or bad for a while ● Models get worse over time, not better ● Data Scientists <3 Jupyter notebooks https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning

Slide 26

Slide 26 text

@useautomation | DevOpsDays Austin Why should you be excited about MLOps? ● There’s a TON of new innovation happening ● There is a desperate need for operating experience ● MLOps is where DevOps was ~8-10 years ago ● Open source development is happening fast ● ML is here to stay

Slide 27

Slide 27 text

@useautomation | DevOpsDays Austin We have a LOT of knowledge to share

Slide 28

Slide 28 text

@useautomation | DevOpsDays Austin What should you do? ● Find out what models you’re running (or planning to run) in production ● Get involved, share knowledge and experiences ● Start experimenting with open source models & examples ● Talk about this with your team and think about how you can avoid surprises

Slide 29

Slide 29 text

@useautomation | DevOpsDays Austin Thank you