$30 off During Our Annual Pro Sale. View Details »

Continuous Delivery for Machine Learning Systems - DevOpsDays Warsaw

Adarsh Shah
November 19, 2020

Continuous Delivery for Machine Learning Systems - DevOpsDays Warsaw

Machine Learning workflow includes data management, experiment management (model training & development), model deployment, serving, and retraining. Training a model takes hours & some times days & typically deals with a large dataset. Training & serving a model also require special resources like high-density cores & GPUs.

In this talk, we will look at how Continuous Delivery for Machine Learning looks like using anecdotes and how to use cloud-native technologies to perform various steps in a Machine Learning workflow. We will also be talking about how it is different from deploying other software and what are the various aspects to consider. We will also be looking at different tools available to enable Continuous Delivery for machine learning.

Adarsh Shah

November 19, 2020
Tweet

More Decks by Adarsh Shah

Other Decks in Technology

Transcript

  1. Continuous Delivery for
    Machine Learning Systems
    Deploying ML Systems to
    Production safely and quickly
    in a sustainable way
    Adarsh Sha
    h


    Engineering Leader, Coach, Hands-on Architec
    t


    Independent Consultan
    t


    @shahadarsh 

    https://shahadarsh.com
    Deck: http://bit.ly/ml-dod-pl

    View Slide

  2. https://shahadarsh.com @shahadarsh
    Deck: http://bit.ly/ml-dod-pl
    Hidden Technical Debt in ML Systems
    From the paper Hidden Technical Debt in Machine Learning Systems

    View Slide

  3. https://shahadarsh.com @shahadarsh
    Deck: http://bit.ly/ml-dod-pl
    1 0
    1


    0 1
    0


    1 0 1
    Results
    Traditional Software Development Machine Learning
    Program Data
    { } 1 0
    1


    0 1
    0


    1 0 1
    Desired
    Results
    Model
    Training
    Data
    { }
    Program
    { } 1 0
    1


    0 1
    0


    1 0 1
    Live Data
    Training Prediction
    Results

    View Slide

  4. https://shahadarsh.com @shahadarsh
    Deck: http://bit.ly/ml-dod-pl
    Data



    Acquisition
    Data
    Preparation
    Model
    Development
    Training Prediction
    Accuracy
    Evaluation
    Data Management Experimentation Production Deployment
    Validation
    Monitoring
    / Alerting
    Accuracy not reached
    Retrain
    Data Drift Fix
    Accuracy



    reached

    View Slide

  5. shahadarsh.com @shahadarsh
    Deck: http://bit.ly/ml-dod-pl
    Challenges Unique to ML

    View Slide

  6. https://shahadarsh.com @shahadarsh
    Deck: http://bit.ly/ml-dod-pl
    #1: Data Management
    Data Location
    Large Datasets
    Security
    Compliance
    Data Quality
    Tracking Dataset

    View Slide

  7. https://shahadarsh.com @shahadarsh
    Deck: http://bit.ly/ml-dod-pl
    #2: Experimentation
    Code Quality
    Research & 

    Experimentation
    Tracking



    experiments
    Training Time 

    & Troubleshooting
    Infrastructure 

    Requirements
    Model Accuracy



    Evaluation

    View Slide

  8. https://shahadarsh.com @shahadarsh
    Deck: http://bit.ly/ml-dod-pl
    #3: Production Deployment
    Of
    f
    l
    ine/Online 

    Prediction
    Monitoring & Alerting

    View Slide

  9. https://shahadarsh.com @shahadarsh
    Deck: http://bit.ly/ml-dod-pl
    What is Continuous Delivery?
    Continuous Delivery is the ability to get changes of all
    types—including new features, con
    f
    i
    guration changes,
    bug
    f
    i
    xes and experiments—into production, or into the
    hands of users, safely and quickly in a sustainable
    way
    .


    - Jez Humble & Dave Farley 

    (Continuous Delivery Book Authors)

    View Slide

  10. https://shahadarsh.com @shahadarsh
    Deck: http://bit.ly/ml-dod-pl
    Continuous Delivery

    View Slide

  11. https://shahadarsh.com @shahadarsh
    Deck: http://bit.ly/ml-dod-pl
    Continuous Integration
    Continuous Integration is a software development
    practice where members of a team integrate their work
    frequently, usually each person integrates at least daily -
    leading to multiple integrations per day
    .


    - Martin Fowler

    View Slide

  12. https://shahadarsh.com @shahadarsh
    Deck: http://bit.ly/ml-dod-pl
    Principles of Continuous Delivery
    ๏ Build quality i
    n


    ๏ Work in small batche
    s


    ๏ Computers perform repetitive tasks, people solve
    problem
    s


    ๏ Relentlessly pursue continuous improvement (Kaizen
    )


    ๏ Everyone is responsible

    View Slide

  13. https://shahadarsh.com @shahadarsh
    Deck: http://bit.ly/ml-dod-pl
    Data pipeline
    Dat
    a


    Source A
    Dat
    a


    Source B
    Dat
    a


    Source C
    Data
    Acquisition
    A
    Data
    Validation

    A
    Data
    Preparation

    A
    Training 

    Dataset
    Versioned
    Training
    Process
    Testing
    Data
    Acquisition
    B
    Data
    Validation

    B
    Data
    Preparation

    B
    Data
    Acquisition
    C
    Data
    Validation

    C
    Data
    Preparation

    C
    Bias & Fairness







    Security 

    & Compliance

    View Slide

  14. https://shahadarsh.com @shahadarsh
    Deck: http://bit.ly/ml-dod-pl
    Static
    Analysis
    Unit Tests
    Training Code
    Linting etc.
    Artifact
    Repository
    Build



    Artifact
    Continuous Integration (Training Code)
    Dev



    Environment
    Validation
    Tests
    Merge to 

    Main Branch

    View Slide

  15. https://shahadarsh.com @shahadarsh
    Deck: http://bit.ly/ml-dod-pl
    Data Pipeline Continuous Integration 

    (Training Code) Con
    f
    i
    guration
    Training 

    Dataset
    Training



    Environment
    Accuracy
    Evaluation
    Monitoring/
    Alerting
    Testing (Bias
    & Fairness)
    Model
    Trigger
    Log
    Aggregation
    Automated 

    Provisioning/De-provisioning
    Data



    Scientist
    Training

    View Slide

  16. https://shahadarsh.com @shahadarsh
    Deck: http://bit.ly/ml-dod-pl
    Static
    Analysis
    Unit Tests
    Application



    Code
    Linting, Security Scan etc.
    Artifact
    Repository
    Build



    Artifact
    Ephemeral



    Environment
    Integration
    Tests
    Tag as
    Tested
    Model
    Continuous Integration (Application Code)
    Training

    View Slide

  17. https://shahadarsh.com @shahadarsh
    Deck: http://bit.ly/ml-dod-pl
    Data Management Experimentation Production Deployment
    Data Pipeline Continuous Integration 

    (Training Code)
    Data



    Scientist
    Con
    f
    i
    guration
    Training Model
    Continuous Integration 

    (Application Code)
    Deployment
    Production



    Environment
    Smoke
    Tests
    Monitoring
    /Alerting
    Application 

    Developer
    Bringing it all together
    Training 

    Dataset

    View Slide

  18. https://shahadarsh.com @shahadarsh
    Deck: http://bit.ly/ml-dod-pl
    Machine Learning Roles
    ML Researcher
    ML Engineer
    Data Engineer
    MLOps Engineer

    View Slide

  19. https://shahadarsh.com @shahadarsh
    Deck: http://bit.ly/ml-dod-pl
    Team Structure Considerations
    Cross Functional Team Separate Data Science Team ML Platform Engineering Team

    View Slide

  20. https://shahadarsh.com @shahadarsh
    Deck: http://bit.ly/ml-dod-pl
    References
    • continuousdelivery.com
    • Dr. Deming’s 14 Points for Management
    • Challenges Deploying Machine Learning Models to
    Production
    • State of DevOps Report



    • martinfowler.com
    • Large image datasets: A pyrrhic win for computer vision?

    View Slide

  21. https://shahadarsh.com @shahadarsh
    Deck: http://bit.ly/ml-dod-pl
    Book Recommendations

    View Slide

  22. https://shahadarsh.com @shahadarsh
    Deck: http://bit.ly/ml-dod-pl
    Adarsh Sha
    h


    Engineering Leader, Coach, Hands-on Architec
    t


    Independent Consultan
    t


    @shahadarsh 

    https://shahadarsh.com

    View Slide