Upgrade to Pro — share decks privately, control downloads, hide ads and more …

DevOps for data science: automate the boring stuff and leverage the OSS ecosystem

DevOps for data science: automate the boring stuff and leverage the OSS ecosystem

Ecdea9b9714877b86cee08458f085481?s=128

Tania Allard

August 06, 2020
Tweet

Transcript

  1. DevOps for Data Science? Automate the boring stuff and leverage

    the OSS ecosystem PyCon Africa – August 6th, 2020 Tania Allard, PhD @ixek
  2. @ixek @trallard trallard.dev

  3. About Me I Python I am also a GDE for

    Tensorflow I love mechanical keyboards My dog usually barks while I am giving online talks
  4. These slides https://bit.ly/mlops-pyconafrica

  5. background ML and Data Science in 2020 Table of Contents

    1 What is even MLOps? And why you’d need it… 2 Mlops 101 Getting started with MLOps 3 @ixek
  6. background ML and Data Science in 2020 01

  7. Where have we been? The Garner hype cycle @ixek

  8. Data Scientist It’s never been easier to run ML experiments

    ML engineer /SRE Machine learning in production is hard y’all! Every team @ixek
  9. • Tools like scikit-learn and Keras make it easy to

    create models in a few lines • Techniques like transfer learning make our lifes easier • More Compute! All the GPUs! From the DS perspective
  10. The new unicorn Must have Analytical skills Software eng. Programming

    Data engineering Data visualization Also must have Containerization End-to-end ML pipeline CI /CD /Versioning Deep learning / NLP / etc. Privacy and security @ixek
  11. MLOps What is it? 02

  12. Where is my unicorn? A mythical data scientist who can

    code, write unit tests AND resist the lure of a deep neural network when logistic regression will do.
  13. The origin of devops Software developers: Need to move and

    iterate fast Operation team: Stability and availability of services is priority @ixek
  14. DevOps is the union of people, process, and products to

    enable continuous delivery of value into production - Donovan Brown “ @ixek
  15. Automate Automate everything you can (data processing, model training) Feedback

    Get feedback on new ideas fast (test immediately) No manual handoffs Provide early testing opportunities DevOps principles @ixek
  16. Continuous integration – software engineering Based on test results –

    no waiting time* Quick testing Automated build Project source code in version control Code changes Automate Feedback iterate @ixek
  17. Technical considerations • Reliance on metrics (e.g. accuracy, specifity) •

    Data visualization • Required domain knowledge So what about ML? @ixek
  18. More than ML code / model @ixek

  19. The origin of mlops Data scientist: • Need to move

    and iterate fast • Use my loved frameworks • Scalable • Minimal wait: test, stage production SRE/ML Engineers: • Reuse of tooling and platforms • Uptime • Monitoring • Reliability and stability @ixek
  20. Continuous integration – software engineering Improve model based on outputs/outcomes

    Sought metrics Automated training / data processing Project source code in version control. Data lineage. Code& data changes Automate Feedback iterate @ixek
  21. Getting started 101 MLOps 03

  22. RECYCLE YOUR ECOSYSTEM 1 Collaboration Version control (Git, Mercurial) OSS

    dev platform / CI /CD (GitHub, GitLab, Travis) 2 automation Leverage your deployment infrastructure (CI / CD, Make) 3 Mix-match Use the OSS libraries you love and leverage cloud computing* @ixek
  23. MlOps step by step ENV #1 CI/CD Pipeline Process Train

    Stage Serve Data Distributed Cloud ENV #2 Data Scientist SRE/ML Engineers @ixek
  24. CI/CD Pipeline Process Train Stage Serve Data Distributed Cloud First,

    I check in my code. ENV #1 ENV #2 Data Scientist Data Scientist SRE/ML Engineers MlOps step by step @ixek
  25. Version control @ixek

  26. CI/CD Pipeline Process Train Stage Serve Data Distributed Cloud That

    kicks off a CI/CD Pipeline. ENV #1 ENV #2 Data Scientist SRE/ML Engineers MlOps step by step @ixek
  27. Kicking CI /CD Push changes GitHub actions @ixek

  28. CI/CD Pipeline Process Train Stage Serve Data Distributed Cloud And

    now do a training run on the processed data ENV #1 ENV #2 Data Scientist SRE/ML Engineers MlOps step by step @ixek
  29. Not only tests Can leverage to do the training or

    data processing Vision Venus has a beautiful name and is the second planet from the Sun. It’s atmosphere is extremely poisonous @ixek
  30. CI/CD Pipeline Process Train Stage Serve Data Distributed Cloud Actually

    need to update the parameters ENV #1 ENV #2 Data Scientist SRE/ML Engineers MlOps step by step
  31. Parameters update? No problem check in to version control @ixek

  32. Updated reporting Embed reports and metrics to your Pull request

    @ixek
  33. Updated reporting Embed reports and metrics to your Pull request

    @ixek
  34. CI/CD Pipeline Process Train Stage Serve Data Distributed Cloud Model

    is optimized and working! Let’s roll out to production. ENV #1 ENV #2 Data Scientist SRE/ML Engineers MlOps step by step @ixek
  35. CI/CD Pipeline Process Train Stage Serve Data Distributed Cloud Trigger

    the CI/CD pipeline one last time. ENV #1 ENV #2 Data Scientist SRE/ML Engineers MlOps step by step @ixek
  36. CI/CD Pipeline Process Train Stage Serve Data Distributed Cloud And

    roll out to the world! ENV #1 ENV #2 Data Scientist SRE/ML Engineers MlOps step by step @ixek
  37. But there is more @ixek

  38. But there is more @ixek

  39. But there is more @ixek

  40. In brief MLOps allows you to be more efficient with

    the tools you use and love @ixek
  41. RECYCLE YOUR ECOSYSTEM 1 Collaboration Version control (Git, Mercurial) OSS

    dev platform / CI /CD (GitHub, GitLab, Travis) 2 automation Leverage your deployment infrastructure (CI / CD, Make) 3 Mix-match Use the OSS libraries you love and leverage cloud computing* @ixek
  42. These slides https://bit.ly/mlops-pyconafrica

  43. Thanks! @ixek @trallard trallard.dev