Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A Tale of Two Ops: How MLOps can learn from DevOps

A Tale of Two Ops: How MLOps can learn from DevOps

Andre Elizondo

May 05, 2023
Tweet

More Decks by Andre Elizondo

Other Decks in Technology

Transcript

  1. @useautomation | DevOpsDays Austin A Tale of Two Ops: How

    MLOps can learn from DevOps Andre Elizondo Solutions Architect @ WhyLabs
  2. @useautomation | DevOpsDays Austin Who am I? • Seattle, WA

    • Recovering Sysadmin, SRE, Evangelist ◦ Chef, Adobe, Datadog, Big Fish Games, Lacework, etc. • >10 yrs part of the DevOps community
  3. @useautomation | DevOpsDays Austin What is MLOps? • Applying DevOps

    practices & culture in Machine Learning • A process that involves multiple teams/silos https://ml-ops.org/content/mlops-principles
  4. @useautomation | DevOpsDays Austin What is MLOps? • Applying DevOps

    practices & culture in Machine Learning • A process that involves multiple teams/silos • Not AIOps https://ml-ops.org/content/mlops-principles
  5. @useautomation | DevOpsDays Austin DevOps handles… • Compute • Networking

    • Storage • Release • Service Reliability • Security
  6. @useautomation | DevOpsDays Austin DevOps handles… • Compute • Networking

    • Storage • Release • Service Reliability • Security • ML Reliability
  7. @useautomation | DevOpsDays Austin How is it similar to DevOps?

    • Shared concepts, but different ◦ CI/CD ◦ Observability ◦ Automation ◦ Containers • Huge silos between teams ◦ Data Engineering ◦ Data Scientists ◦ ML Engineers ◦ Product Managers ◦ DevOps/SRE/Operations https://ml-ops.org/content/mlops-principles
  8. @useautomation | DevOpsDays Austin CI/CD in MLOps • Deploying your

    model ◦ Testing is different ◦ Scaling is different(ish) ◦ Packaging is more or less the same ◦ Continuous delivery is possible but harder • ML Data Pipelines ◦ Training ◦ Feature ◦ Inference https://ml-ops.org/content/mlops-principles
  9. @useautomation | DevOpsDays Austin CI/CD in MLOps • Deploying your

    model ◦ Testing is different ◦ Scaling is different(ish) ◦ Packaging is more or less the same ◦ Continuous delivery is possible but harder • ML Data Pipelines ◦ Training ◦ Feature ◦ Inference https://ml-ops.org/content/mlops-principles
  10. @useautomation | DevOpsDays Austin Observability in MLOps • Performance is

    important ◦ Some similar metrics, some different ones ◦ Threshold baselines are different • Availability is important ◦ Service availability isn’t enough • External dependencies need to be monitored upstream • Sometimes batch, sometimes real time https://www.oreilly.com/library/view/reliable-machine-learning/9781098106218/
  11. @useautomation | DevOpsDays Austin Automation in MLOps • Response workflows

    ◦ Retraining ◦ Roll-back • Infrastructure as code ◦ Terraform • Monitoring as code https://ml-ops.org/content/mlops-principles
  12. @useautomation | DevOpsDays Austin Containers in MLOps • Dependency isolation

    • Yes, it’s still kubernetes. ◦ With all of it’s usual complaints. • Sometimes controlled directly, most times through a platform ◦ Kubeflow ◦ Sagemaker ◦ AzureML ◦ Vertex • Scaled for model serving, training, and pipelines
  13. @useautomation | DevOpsDays Austin What is unique about MLOps? •

    Machine learning systems tend to be: ◦ Fragile to changes in data ◦ Harder to test ◦ Harder to scale • Complex to measure ◦ What is good vs what is bad? ◦ You may not know if something is good or bad for a while • Models get worse over time, not better • Data Scientists <3 Jupyter notebooks https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning
  14. @useautomation | DevOpsDays Austin Why should you be excited about

    MLOps? • There’s a TON of new innovation happening • There is a desperate need for operating experience • MLOps is where DevOps was ~8-10 years ago • Open source development is happening fast • ML is here to stay
  15. @useautomation | DevOpsDays Austin What should you do? • Find

    out what models you’re running (or planning to run) in production • Get involved, share knowledge and experiences • Start experimenting with open source models & examples • Talk about this with your team and think about how you can avoid surprises