Upgrade to Pro — share decks privately, control downloads, hide ads and more …

DevOps for data science: automate the boring stuff and leverage the OSS ecosystem

DevOps for data science: automate the boring stuff and leverage the OSS ecosystem

Ecdea9b9714877b86cee08458f085481?s=128

Tania Allard

August 06, 2020
Tweet

More Decks by Tania Allard

Other Decks in Technology

Transcript

  1. DevOps for Data Science? Automate the boring stuff and leverage

    the OSS ecosystem PyCon Africa – August 6th, 2020 Tania Allard, PhD @ixek
  2. @ixek @trallard trallard.dev

  3. About Me I Python I am also a GDE for

    Tensorflow I love mechanical keyboards My dog usually barks while I am giving online talks
  4. These slides https://bit.ly/mlops-pyconafrica

  5. background ML and Data Science in 2020 Table of Contents

    1 What is even MLOps? And why you’d need it… 2 Mlops 101 Getting started with MLOps 3 @ixek
  6. background ML and Data Science in 2020 01

  7. Where have we been? The Garner hype cycle @ixek

  8. Data Scientist It’s never been easier to run ML experiments

    ML engineer /SRE Machine learning in production is hard y’all! Every team @ixek
  9. • Tools like scikit-learn and Keras make it easy to

    create models in a few lines • Techniques like transfer learning make our lifes easier • More Compute! All the GPUs! From the DS perspective
  10. The new unicorn Must have Analytical skills Software eng. Programming

    Data engineering Data visualization Also must have Containerization End-to-end ML pipeline CI /CD /Versioning Deep learning / NLP / etc. Privacy and security @ixek
  11. MLOps What is it? 02

  12. Where is my unicorn? A mythical data scientist who can

    code, write unit tests AND resist the lure of a deep neural network when logistic regression will do.
  13. The origin of devops Software developers: Need to move and

    iterate fast Operation team: Stability and availability of services is priority @ixek
  14. DevOps is the union of people, process, and products to

    enable continuous delivery of value into production - Donovan Brown “ @ixek
  15. Automate Automate everything you can (data processing, model training) Feedback

    Get feedback on new ideas fast (test immediately) No manual handoffs Provide early testing opportunities DevOps principles @ixek
  16. Continuous integration – software engineering Based on test results –

    no waiting time* Quick testing Automated build Project source code in version control Code changes Automate Feedback iterate @ixek
  17. Technical considerations • Reliance on metrics (e.g. accuracy, specifity) •

    Data visualization • Required domain knowledge So what about ML? @ixek
  18. More than ML code / model @ixek

  19. The origin of mlops Data scientist: • Need to move

    and iterate fast • Use my loved frameworks • Scalable • Minimal wait: test, stage production SRE/ML Engineers: • Reuse of tooling and platforms • Uptime • Monitoring • Reliability and stability @ixek
  20. Continuous integration – software engineering Improve model based on outputs/outcomes

    Sought metrics Automated training / data processing Project source code in version control. Data lineage. Code& data changes Automate Feedback iterate @ixek
  21. Getting started 101 MLOps 03

  22. RECYCLE YOUR ECOSYSTEM 1 Collaboration Version control (Git, Mercurial) OSS

    dev platform / CI /CD (GitHub, GitLab, Travis) 2 automation Leverage your deployment infrastructure (CI / CD, Make) 3 Mix-match Use the OSS libraries you love and leverage cloud computing* @ixek
  23. MlOps step by step ENV #1 CI/CD Pipeline Process Train

    Stage Serve Data Distributed Cloud ENV #2 Data Scientist SRE/ML Engineers @ixek
  24. CI/CD Pipeline Process Train Stage Serve Data Distributed Cloud First,

    I check in my code. ENV #1 ENV #2 Data Scientist Data Scientist SRE/ML Engineers MlOps step by step @ixek
  25. Version control @ixek

  26. CI/CD Pipeline Process Train Stage Serve Data Distributed Cloud That

    kicks off a CI/CD Pipeline. ENV #1 ENV #2 Data Scientist SRE/ML Engineers MlOps step by step @ixek
  27. Kicking CI /CD Push changes GitHub actions @ixek

  28. CI/CD Pipeline Process Train Stage Serve Data Distributed Cloud And

    now do a training run on the processed data ENV #1 ENV #2 Data Scientist SRE/ML Engineers MlOps step by step @ixek
  29. Not only tests Can leverage to do the training or

    data processing Vision Venus has a beautiful name and is the second planet from the Sun. It’s atmosphere is extremely poisonous @ixek
  30. CI/CD Pipeline Process Train Stage Serve Data Distributed Cloud Actually

    need to update the parameters ENV #1 ENV #2 Data Scientist SRE/ML Engineers MlOps step by step
  31. Parameters update? No problem check in to version control @ixek

  32. Updated reporting Embed reports and metrics to your Pull request

    @ixek
  33. Updated reporting Embed reports and metrics to your Pull request

    @ixek
  34. CI/CD Pipeline Process Train Stage Serve Data Distributed Cloud Model

    is optimized and working! Let’s roll out to production. ENV #1 ENV #2 Data Scientist SRE/ML Engineers MlOps step by step @ixek
  35. CI/CD Pipeline Process Train Stage Serve Data Distributed Cloud Trigger

    the CI/CD pipeline one last time. ENV #1 ENV #2 Data Scientist SRE/ML Engineers MlOps step by step @ixek
  36. CI/CD Pipeline Process Train Stage Serve Data Distributed Cloud And

    roll out to the world! ENV #1 ENV #2 Data Scientist SRE/ML Engineers MlOps step by step @ixek
  37. But there is more @ixek

  38. But there is more @ixek

  39. But there is more @ixek

  40. In brief MLOps allows you to be more efficient with

    the tools you use and love @ixek
  41. RECYCLE YOUR ECOSYSTEM 1 Collaboration Version control (Git, Mercurial) OSS

    dev platform / CI /CD (GitHub, GitLab, Travis) 2 automation Leverage your deployment infrastructure (CI / CD, Make) 3 Mix-match Use the OSS libraries you love and leverage cloud computing* @ixek
  42. These slides https://bit.ly/mlops-pyconafrica

  43. Thanks! @ixek @trallard trallard.dev