Slide 1

Slide 1 text

DevOps for Data Science? Automate the boring stuff and leverage the OSS ecosystem PyCon Africa – August 6th, 2020 Tania Allard, PhD @ixek

Slide 2

Slide 2 text

@ixek @trallard trallard.dev

Slide 3

Slide 3 text

About Me I Python I am also a GDE for Tensorflow I love mechanical keyboards My dog usually barks while I am giving online talks

Slide 4

Slide 4 text

These slides https://bit.ly/mlops-pyconafrica

Slide 5

Slide 5 text

background ML and Data Science in 2020 Table of Contents 1 What is even MLOps? And why you’d need it… 2 Mlops 101 Getting started with MLOps 3 @ixek

Slide 6

Slide 6 text

background ML and Data Science in 2020 01

Slide 7

Slide 7 text

Where have we been? The Garner hype cycle @ixek

Slide 8

Slide 8 text

Data Scientist It’s never been easier to run ML experiments ML engineer /SRE Machine learning in production is hard y’all! Every team @ixek

Slide 9

Slide 9 text

● Tools like scikit-learn and Keras make it easy to create models in a few lines ● Techniques like transfer learning make our lifes easier ● More Compute! All the GPUs! From the DS perspective

Slide 10

Slide 10 text

The new unicorn Must have Analytical skills Software eng. Programming Data engineering Data visualization Also must have Containerization End-to-end ML pipeline CI /CD /Versioning Deep learning / NLP / etc. Privacy and security @ixek

Slide 11

Slide 11 text

MLOps What is it? 02

Slide 12

Slide 12 text

Where is my unicorn? A mythical data scientist who can code, write unit tests AND resist the lure of a deep neural network when logistic regression will do.

Slide 13

Slide 13 text

The origin of devops Software developers: Need to move and iterate fast Operation team: Stability and availability of services is priority @ixek

Slide 14

Slide 14 text

DevOps is the union of people, process, and products to enable continuous delivery of value into production - Donovan Brown “ @ixek

Slide 15

Slide 15 text

Automate Automate everything you can (data processing, model training) Feedback Get feedback on new ideas fast (test immediately) No manual handoffs Provide early testing opportunities DevOps principles @ixek

Slide 16

Slide 16 text

Continuous integration – software engineering Based on test results – no waiting time* Quick testing Automated build Project source code in version control Code changes Automate Feedback iterate @ixek

Slide 17

Slide 17 text

Technical considerations ● Reliance on metrics (e.g. accuracy, specifity) ● Data visualization ● Required domain knowledge So what about ML? @ixek

Slide 18

Slide 18 text

More than ML code / model @ixek

Slide 19

Slide 19 text

The origin of mlops Data scientist: • Need to move and iterate fast • Use my loved frameworks • Scalable • Minimal wait: test, stage production SRE/ML Engineers: • Reuse of tooling and platforms • Uptime • Monitoring • Reliability and stability @ixek

Slide 20

Slide 20 text

Continuous integration – software engineering Improve model based on outputs/outcomes Sought metrics Automated training / data processing Project source code in version control. Data lineage. Code& data changes Automate Feedback iterate @ixek

Slide 21

Slide 21 text

Getting started 101 MLOps 03

Slide 22

Slide 22 text

RECYCLE YOUR ECOSYSTEM 1 Collaboration Version control (Git, Mercurial) OSS dev platform / CI /CD (GitHub, GitLab, Travis) 2 automation Leverage your deployment infrastructure (CI / CD, Make) 3 Mix-match Use the OSS libraries you love and leverage cloud computing* @ixek

Slide 23

Slide 23 text

MlOps step by step ENV #1 CI/CD Pipeline Process Train Stage Serve Data Distributed Cloud ENV #2 Data Scientist SRE/ML Engineers @ixek

Slide 24

Slide 24 text

CI/CD Pipeline Process Train Stage Serve Data Distributed Cloud First, I check in my code. ENV #1 ENV #2 Data Scientist Data Scientist SRE/ML Engineers MlOps step by step @ixek

Slide 25

Slide 25 text

Version control @ixek

Slide 26

Slide 26 text

CI/CD Pipeline Process Train Stage Serve Data Distributed Cloud That kicks off a CI/CD Pipeline. ENV #1 ENV #2 Data Scientist SRE/ML Engineers MlOps step by step @ixek

Slide 27

Slide 27 text

Kicking CI /CD Push changes GitHub actions @ixek

Slide 28

Slide 28 text

CI/CD Pipeline Process Train Stage Serve Data Distributed Cloud And now do a training run on the processed data ENV #1 ENV #2 Data Scientist SRE/ML Engineers MlOps step by step @ixek

Slide 29

Slide 29 text

Not only tests Can leverage to do the training or data processing Vision Venus has a beautiful name and is the second planet from the Sun. It’s atmosphere is extremely poisonous @ixek

Slide 30

Slide 30 text

CI/CD Pipeline Process Train Stage Serve Data Distributed Cloud Actually need to update the parameters ENV #1 ENV #2 Data Scientist SRE/ML Engineers MlOps step by step

Slide 31

Slide 31 text

Parameters update? No problem check in to version control @ixek

Slide 32

Slide 32 text

Updated reporting Embed reports and metrics to your Pull request @ixek

Slide 33

Slide 33 text

Updated reporting Embed reports and metrics to your Pull request @ixek

Slide 34

Slide 34 text

CI/CD Pipeline Process Train Stage Serve Data Distributed Cloud Model is optimized and working! Let’s roll out to production. ENV #1 ENV #2 Data Scientist SRE/ML Engineers MlOps step by step @ixek

Slide 35

Slide 35 text

CI/CD Pipeline Process Train Stage Serve Data Distributed Cloud Trigger the CI/CD pipeline one last time. ENV #1 ENV #2 Data Scientist SRE/ML Engineers MlOps step by step @ixek

Slide 36

Slide 36 text

CI/CD Pipeline Process Train Stage Serve Data Distributed Cloud And roll out to the world! ENV #1 ENV #2 Data Scientist SRE/ML Engineers MlOps step by step @ixek

Slide 37

Slide 37 text

But there is more @ixek

Slide 38

Slide 38 text

But there is more @ixek

Slide 39

Slide 39 text

But there is more @ixek

Slide 40

Slide 40 text

In brief MLOps allows you to be more efficient with the tools you use and love @ixek

Slide 41

Slide 41 text

RECYCLE YOUR ECOSYSTEM 1 Collaboration Version control (Git, Mercurial) OSS dev platform / CI /CD (GitHub, GitLab, Travis) 2 automation Leverage your deployment infrastructure (CI / CD, Make) 3 Mix-match Use the OSS libraries you love and leverage cloud computing* @ixek

Slide 42

Slide 42 text

These slides https://bit.ly/mlops-pyconafrica

Slide 43

Slide 43 text

Thanks! @ixek @trallard trallard.dev