Upgrade to Pro — share decks privately, control downloads, hide ads and more …

DevOps for data science: automate the boring stuff and leverage the OSS ecosystem

DevOps for data science: automate the boring stuff and leverage the OSS ecosystem

Tania Allard

August 06, 2020
Tweet

More Decks by Tania Allard

Other Decks in Technology

Transcript

  1. DevOps for Data
    Science?
    Automate the boring stuff and leverage the
    OSS ecosystem
    PyCon Africa – August 6th, 2020
    Tania Allard, PhD @ixek

    View Slide

  2. @ixek
    @trallard
    trallard.dev

    View Slide

  3. About Me
    I Python
    I am also a GDE for Tensorflow
    I love mechanical keyboards
    My dog usually barks while I
    am giving online talks

    View Slide

  4. These slides
    https://bit.ly/mlops-pyconafrica

    View Slide

  5. background
    ML and Data Science
    in 2020
    Table of
    Contents
    1
    What is even
    MLOps?
    And why you’d need it…
    2
    Mlops 101
    Getting started with
    MLOps
    3
    @ixek

    View Slide

  6. background
    ML and Data Science in 2020
    01

    View Slide

  7. Where
    have we
    been?
    The Garner hype cycle @ixek

    View Slide

  8. Data Scientist
    It’s never been easier
    to run ML experiments
    ML engineer
    /SRE
    Machine learning in
    production is hard y’all!
    Every team
    @ixek

    View Slide

  9. ● Tools like scikit-learn and Keras make it easy to
    create models in a few lines
    ● Techniques like transfer learning make our lifes easier
    ● More Compute! All the GPUs!
    From the DS
    perspective

    View Slide

  10. The new unicorn
    Must have
    Analytical skills
    Software eng.
    Programming
    Data engineering
    Data visualization
    Also must have
    Containerization
    End-to-end ML pipeline
    CI /CD /Versioning
    Deep learning / NLP / etc.
    Privacy and security
    @ixek

    View Slide

  11. MLOps
    What is it?
    02

    View Slide

  12. Where is my
    unicorn?
    A mythical data
    scientist who can code,
    write unit tests AND
    resist the lure of a
    deep neural network
    when logistic
    regression will do.

    View Slide

  13. The origin of devops
    Software developers:
    Need to move and iterate fast
    Operation team:
    Stability and availability of
    services is priority
    @ixek

    View Slide

  14. DevOps is the union of
    people, process, and
    products to enable
    continuous delivery of value
    into production
    - Donovan Brown

    @ixek

    View Slide

  15. Automate
    Automate
    everything you can
    (data processing,
    model training)
    Feedback
    Get feedback on
    new ideas fast (test
    immediately)
    No manual
    handoffs
    Provide early
    testing
    opportunities
    DevOps principles
    @ixek

    View Slide

  16. Continuous integration – software engineering
    Based on test
    results – no
    waiting time*
    Quick testing
    Automated build
    Project source
    code in version
    control
    Code
    changes
    Automate Feedback iterate
    @ixek

    View Slide

  17. Technical
    considerations
    ● Reliance on metrics
    (e.g. accuracy, specifity)
    ● Data visualization
    ● Required domain
    knowledge
    So what
    about ML?
    @ixek

    View Slide

  18. More than ML code / model
    @ixek

    View Slide

  19. The origin of mlops
    Data scientist:
    • Need to move and
    iterate fast
    • Use my loved
    frameworks
    • Scalable
    • Minimal wait: test, stage
    production
    SRE/ML Engineers:
    • Reuse of tooling and
    platforms
    • Uptime
    • Monitoring
    • Reliability and stability
    @ixek

    View Slide

  20. Continuous integration – software engineering
    Improve model
    based on
    outputs/outcomes
    Sought metrics
    Automated
    training / data
    processing
    Project source
    code in version
    control. Data
    lineage.
    Code& data
    changes
    Automate Feedback iterate
    @ixek

    View Slide

  21. Getting started
    101 MLOps
    03

    View Slide

  22. RECYCLE YOUR ECOSYSTEM
    1 Collaboration
    Version control (Git, Mercurial)
    OSS dev platform / CI /CD
    (GitHub, GitLab, Travis)
    2 automation Leverage your deployment
    infrastructure (CI / CD, Make)
    3 Mix-match Use the OSS libraries you love and
    leverage cloud computing*
    @ixek

    View Slide

  23. MlOps step by step
    ENV
    #1
    CI/CD Pipeline
    Process Train Stage Serve
    Data
    Distributed Cloud
    ENV
    #2
    Data Scientist
    SRE/ML Engineers
    @ixek

    View Slide

  24. CI/CD Pipeline
    Process Train Stage Serve
    Data
    Distributed Cloud
    First, I check in
    my code.
    ENV
    #1
    ENV
    #2
    Data Scientist
    Data Scientist
    SRE/ML Engineers
    MlOps step by step
    @ixek

    View Slide

  25. Version
    control
    @ixek

    View Slide

  26. CI/CD Pipeline
    Process Train Stage Serve
    Data
    Distributed Cloud
    That kicks off
    a CI/CD Pipeline.
    ENV
    #1
    ENV
    #2
    Data Scientist
    SRE/ML Engineers
    MlOps step by step
    @ixek

    View Slide

  27. Kicking CI
    /CD
    Push changes
    GitHub actions
    @ixek

    View Slide

  28. CI/CD Pipeline
    Process Train Stage Serve
    Data
    Distributed Cloud
    And now do a
    training run on the
    processed data
    ENV
    #1
    ENV
    #2
    Data Scientist
    SRE/ML Engineers
    MlOps step by step
    @ixek

    View Slide

  29. Not only tests
    Can leverage to do the
    training or data
    processing
    Vision
    Venus has a beautiful
    name and is the second
    planet from the Sun. It’s
    atmosphere is extremely
    poisonous
    @ixek

    View Slide

  30. CI/CD Pipeline
    Process Train Stage Serve
    Data
    Distributed Cloud
    Actually need to
    update the
    parameters
    ENV
    #1
    ENV
    #2
    Data Scientist
    SRE/ML Engineers
    MlOps step by step

    View Slide

  31. Parameters update?
    No problem check in to
    version control
    @ixek

    View Slide

  32. Updated reporting
    Embed reports and
    metrics to your Pull
    request
    @ixek

    View Slide

  33. Updated reporting
    Embed reports and
    metrics to your Pull
    request
    @ixek

    View Slide

  34. CI/CD Pipeline
    Process Train Stage Serve
    Data
    Distributed Cloud
    Model is optimized
    and working! Let’s
    roll out to
    production.
    ENV
    #1
    ENV
    #2
    Data Scientist
    SRE/ML Engineers
    MlOps step by step
    @ixek

    View Slide

  35. CI/CD Pipeline
    Process Train Stage Serve
    Data
    Distributed Cloud
    Trigger the CI/CD
    pipeline one last
    time.
    ENV
    #1
    ENV
    #2
    Data Scientist
    SRE/ML Engineers
    MlOps step by step
    @ixek

    View Slide

  36. CI/CD Pipeline
    Process Train Stage Serve
    Data
    Distributed Cloud
    And roll out
    to the world!
    ENV
    #1
    ENV
    #2
    Data Scientist
    SRE/ML Engineers
    MlOps step by step
    @ixek

    View Slide

  37. But there
    is more
    @ixek

    View Slide

  38. But there
    is more
    @ixek

    View Slide

  39. But there
    is more
    @ixek

    View Slide

  40. In brief
    MLOps allows you to be more efficient with the
    tools you use and love
    @ixek

    View Slide

  41. RECYCLE YOUR ECOSYSTEM
    1 Collaboration
    Version control (Git, Mercurial)
    OSS dev platform / CI /CD
    (GitHub, GitLab, Travis)
    2 automation Leverage your deployment
    infrastructure (CI / CD, Make)
    3 Mix-match Use the OSS libraries you love and
    leverage cloud computing*
    @ixek

    View Slide

  42. These slides
    https://bit.ly/mlops-pyconafrica

    View Slide

  43. Thanks!
    @ixek
    @trallard
    trallard.dev

    View Slide