Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Track Machine Learning Applications by MLflow Tracking

suci
September 05, 2020

Track Machine Learning Applications by MLflow Tracking

Productization of machine learning (ML) solutions can be challenging. Therefore, the concept of operationalization on machine learning (MLOps) has emerged in the past few years for effective model lifecycle management. One of the core aspects of MLOps is "monitoring".

ML models are built by experimenting with a wide range of datasets. However, since the real data continue to change, it is necessary to monitor and to manage model usage, consumption, and results of models.

MLflow is an open-source framework designed to manage the end-to-end ML lifecycle with different components. In the talk, the basic concepts of MLflow will be introduced. Then, MLflow Tracking will be the main focus. You will know how to track experiments for recording and comparing parameters and results by MLflow Tracking.

suci

September 05, 2020
Tweet

More Decks by suci

Other Decks in Programming

Transcript

  1. Track Machine Learning Applications
    by Tracking
    Shuhsi Lin @ PyConTW 2020

    View Slide

  2. Lurking in PyHug, Taipei.py and various Meetups
    About Me
    2
    Working in a manufacturing company
    With data and people
    Focus on
    • Agile/Engineering culture
    • IoT applications
    • Streaming process
    • Data visualization
    Shuhsi Lin
    sucitw gmail.com
    https://medium.com/@suci/

    View Slide

  3. ML Life Cycle
    MLOPS
    Logging & ML monitoring
    Logging ML Life Cycle
    Agenda -What we will focus on
    MLflow Tracking
    Track ML
    ● Basic logging
    ● Model logging
    ● Auto-logging
    3

    View Slide

  4. What we will not focus on
    1. Details of ML Platform/ MLOps
    ○ Deployment/Operation/Administration
    2. MLflow Projects/Models/Model Registry
    3. Infrastructure Details
    4. Comparison of Different Tools
    5. Machine Learning Algorithms or Frameworks
    4

    View Slide

  5. Logging
    Everything
    5

    View Slide

  6. Why Logging is important
    Stakeholder
    ● Auditing for business
    ● Product improvement from
    log statistics
    End-User
    ● Self-Troubleshooting
    Developer
    ● Profiling for performance
    ● Debugging
    Sysadmin
    ● Stability monitoring
    ● Troubleshooting
    Security
    ● Auditing for security
    Business Analytics Problem Solving
    6

    View Slide

  7. What is Logging
    in Machine Learning
    7

    View Slide

  8. ML Life Cycle
    8

    View Slide

  9. ML is’t just code
    9

    View Slide

  10. MLOps: Continuous delivery and automation pipelines in machine learning
    (originally adapted from Hidden Technical Debt in Machine Learning Systems)
    Elements for ML systems
    10

    View Slide

  11. Exploratory
    Analysis
    Development Deployment Delivery Management
    ● Data collection
    ● ETL
    ● Selection
    ● Training
    ● Evaluation
    ● Validation
    ● Versioned
    Model
    Dev/Analysis
    Featuring
    Engineering
    ● Model registry
    ● Monitor
    ● Alert
    ● Debug
    ● Feedback
    ● Resource
    manage
    ● ...
    (Batch + Realtime)
    Deployment
    Pipeline
    ● Audit
    ● Score/Serve
    ML Life Cycle
    ● Dashboard
    ● Recommendation
    ● Interdiction
    ● ...
    User Interface Operation
    DEV PRD PRD PRD
    Retrain and re-tuning
    New model development/update features
    Inspired from “Enabling Scalable Data Science Pipelines with MLflow and Model Registry at Thermo Fisher Scientific”
    11

    View Slide

  12. Manual process
    MLOps: Continuous delivery and automation pipelines in machine learning
    The Maturity of the MLOPS Process
    level 0
    ML pipeline automation
    level 1
    CI/CD pipeline automation
    level 2
    ● MLOps is an ML engineering culture and
    practice that aims at unifying ML system
    development (Dev) and ML system
    operation (Ops).
    ● Practicing MLOps means that you
    advocate for automation and monitoring at
    all steps of ML system construction,
    including integration, testing, releasing,
    deployment and infrastructure
    management.
    MLOps: DevOps principles to ML systems
    12

    View Slide

  13. MLOps level 0: Manual process
    MLOps: Continuous delivery and automation pipelines in machine learning
    13

    View Slide

  14. DEV
    PRD
    MLOps level 1:
    ML pipeline
    automation
    ML OPS
    MLOps: Continuous delivery and automation pipelines in machine learning
    14

    View Slide

  15. MLOps level 2: CI/CD
    pipeline automation MLOPS
    MLOps: Continuous delivery and automation pipelines in machine learning
    CI
    CD
    CD
    15

    View Slide

  16. Model development & Post-Deployment
    ○ Prove value of experiment
    ■ Need baseline to show and compare
    ○ Collaborate
    ■ Need to refer and access models and artifacts from other members
    ○ Reproduce work
    ■ Need same parameters and model of ex-run
    Experiment Tracking
    16

    View Slide

  17. Log day-to-day work in ML life cycle
    ○ Hyper parameters
    ○ Training/modeling performances
    ○ Model
    ● Type
    ● Building environment
    ● Modeling version
    ○ and so on
    What we should log/track in ML
    parameters
    ● Convolutional filter
    ● Kernel_size
    ● Max pooling
    ● Dropout
    ● Dense
    ● Batch_size
    ● Epochs
    ● ….
    Evaluation metrics
    ● Mean Absolute Error (MAE)
    ● Mean Squared Error (MSE)
    ● Root Mean Squared Error (RSME)
    ● R-squared (r2)
    ● ...
    17

    View Slide

  18. ML is complex
    and need to be tracked
    18

    View Slide

  19. ● Since 2018 from DataBricks (Main contributor)
    ● An open platform for the machine learning lifecycle
    ● Python Library; runs locally and on the cloud
    ● Built-in UI for experiment visualization
    ● Logging integrations for major frameworks: scikit-learn, PyTorch, TF,..
    https://github.com/mlflow
    19

    View Slide

  20. Collaboration with MLflow
    20

    View Slide

  21. Components
    Main focus of this sharing!
    21

    View Slide

  22. ● Neptune (commercial)
    ● Tensorboard (+MLflow.tensorflow)
    ● TorchServe (+ MLflow.pytorch)
    ● Kubeflow (Meta)
    ● Data Science Workbench
    ● ...
    Similar Tools
    22

    View Slide

  23. TorchServe
    https://aws.amazon.com/tw/blogs/machine-learning/deploying-pytorch-models-for-inference-at-scale-using-torchserve/
    Facebook + AWS
    23

    View Slide

  24. Kubeflow
    https://www.kubeflow.org/docs/started/kubeflow-overview/
    Tracking and managing metadata of
    machine learning workflows in Kubeflow
    24

    View Slide

  25. Getting Started with
    Tracking
    25

    View Slide

  26. 26

    View Slide

  27. Tracking APIs (REST, Python, Java, R)
    Experiment and metric tracking
    Experiment/Production pipeline
    Artifacts
    27

    View Slide

  28. Entity
    ● Code version
    ● Start and end time
    ● Source
    ● (Hyper) Parameters
    ● Metrics
    ● Tags/Notes
    Run
    Experiments
    Terminology
    Run Run Run
    Run Run Run Run
    Run
    Backend stores
    ● File store
    ● Database
    Artifacts
    ● Output files
    a. Images
    b. Pickled models
    c. Data files...
    File storage
    ● Amazon S3
    ● Azure Blob Storage
    ● Google Cloud Storage
    ● FTP server
    ● SFTP Server
    ● NFS
    ● HDFS
    Project
    28

    View Slide

  29. Setup & Initialize
    Experiments
    Run
    Start a Run
    Log (Hyper)parameter
    Log metrics
    Log artifact
    Log model
    Train & Inference
    Data preparation
    Setup MLflow experiment
    Evaluation Post-analysis
    Feedback loop
    Compare runs/ Tuning
    (parameters/models)
    29

    View Slide

  30. Tracking.API
    MLflow Tracking for ML Development
    ● start_Run()
    ● log_param()
    ● log_metric()
    ● log_artifact()
    ● end_Run()
    ● Parameters
    ● Metrics
    ● Output file
    ○ Artifact
    ● Code version
    ● ...
    Output
    ● log_model ● Model
    30

    View Slide

  31. DEMO Examples
    Tracking
    https://github.com/sucitw/mlflow_tracking 31

    View Slide

  32. Example Architecture
    Docker container
    Tracking server
    flask:5000
    Artifacts Store
    docker run -d -p 5000:5000 \
    -v /tmp/artifactStore:/tmp/mlflow/artifactStore \
    --name mlflow-tracking-server \
    suci/mlflow-tracking
    Backend store
    Tracking Metrics
    mlflow server \
    --backend-store-uri $FILE_STORE \
    --default-artifact-root $ARTIFACT_STORE \
    --host $SERVER_HOST \
    --port $SERVER_PORT
    Shared file system
    https://github.com/sucitw/mlflow_tracking
    Tracking API
    Tracking API
    32

    View Slide

  33. Experiments
    Tracking-
    # Setup & Initialize MLflow experiment
    experiment_name = "PyconTW 2020 Demo "
    tracking_server = "http://localhost:5000"
    mlflow.set_tracking_uri(tracking_server)
    mlflow.set_experiment(experiment_name)
    #System Env setting
    #backend-store-uri
    $FILE_STORE
    #default-artifact-root
    $ARTIFACT_STORE
    #host
    $SERVER_HOST
    #port
    $SERVER_PORT
    Setup & Initialize
    33

    View Slide

  34. Basic Logging
    34

    View Slide

  35. Run
    Tracking-
    with mlflow.start_run() as run:
    # Log a parameter (key-value pair)
    log_param("param1", randint(0, 100))
    # Log a metric; metrics can be updated throughout the run
    log_metric("metricsA", random())
    log_metric("metricsA", random() + 1)
    log_metric("metricsA", random() + 2)
    log_metric("metricsB", random() + 2)
    # Log an artifact (output file)
    with open("outputs/test.txt", "w") as f:
    f.write("hello world! Run id:{}".format(type(mlflow.active_run().info)))
    log_artifacts("outputs")
    Train & Inference
    Log (Hyper)parameter
    Log artifact
    Log metrics
    35

    View Slide

  36. Tracking UI
    36

    View Slide

  37. run
    experiment
    37

    View Slide

  38. Compare Two Runs
    Run A Run B
    38

    View Slide

  39. Compare Two Runs
    39

    View Slide

  40. Model Logging
    40

    View Slide

  41. Model
    mlflow..log_model(model, ...)
    mlflow..load_model(modelpath)
    mlflow..deploy()
    Log->Load->deploy
    Built-In Model Flavors
    ● Python Function (python_function)
    ● R Function (crate)
    ● H2O (h2o)
    ● Keras (keras)
    ● MLeap (mleap)
    ● PyTorch (pytorch)
    ● Scikit-learn (sklearn)
    ● Spark MLlib (spark)
    ● TensorFlow (tensorflow)
    ● ONNX (onnx)
    ● MXNet Gluon (gluon)
    ● XGBoost (xgboost)
    ● LightGBM (lightgbm)
    ● Spacy(spaCy)
    ● Fastai(fastai)

    mlflow.sklearn.log_model(lr, "model")
    41

    View Slide

  42. https://github.com/sucitw/mlflow_tracking/blob/master/pytw2020_demo_SK_model.py
    42

    View Slide

  43. https://github.com/sucitw/mlflow_tracking/blob/master/pytw2020_demo_SK_model.py
    43

    View Slide

  44. Automatic Logging
    44

    View Slide

  45. Automated MLflow Tracking
    https://www.mlflow.org/docs/latest/tracking.html#automatic-logging
    45

    View Slide

  46. import mlflow
    mlflow.tensorflow.autolog()
    model = train_model()
    Auto-Logging - TF
    import mlflow
    mlflow.log_param("layers", layers)
    model = train_model()
    mlflow.log_metric("mse", model.mse())
    mlflow.log_artifact("plot", plot(model))
    mlflow.tensorflow.log_model(model)
    # Manually logging # With autologging
    Capture TensorBoard metrics
    https://cs.stanford.edu/people/matei/papers/2020/deem_mlflow.pdf 46

    View Slide

  47. Recap
    47

    View Slide

  48. MLOps: Continuous delivery and automation pipelines in machine learning
    (originally adapted from Hidden Technical Debt in Machine Learning Systems)
    ML is COMPLEX and need to be Tracked
    Elements for
    ML systems
    48

    View Slide

  49. Managing the machine learning lifecycle - (Experiment) Tracking
    What can help
    Easy use in at the same way
    ● Remotely /cloud or locally
    ● Individual, team or large orgs
    Tracking Metrics
    ● Simplified tracking for ML models means faster time to insights and value
    ● Integrated with popular ML library & languages
    Model management
    ● Launch of model registry enhances governance and core proposition of model
    management.
    49

    View Slide

  50. ● MLflow Tracking
    ● MLflow Project
    ● MLflow Models
    ● MLflow Model registry
    ● MLflow Deployment
    More
    Model governance
    Model deployment/serving
    Experiment tracking
    Azure
    Machine Learning
    RedisAI
    50

    View Slide

  51. ● MLflow official Doc
    ● MLflow tracking
    ● Learning MLflow
    ○ 2020_Workshop | Managing the Complete Machine Learning Lifecycle with MLflow (DataBricks)
    ● MLOps: Continuous delivery and automation pipelines in machine learning
    ● 2020 Spark Summit: Enabling Scalable Data Science Pipelines with MLflow and Model
    Registry at Thermo Fisher Scientific
    ● 2020 DEEM workshop: Developments in MLflow: A System to Accelerate the Machine
    Learning Lifecycle
    ● Example codes of this talk (Github)
    Reference
    51

    View Slide

  52. CREDITS: This presentation template was created by Slidesgo,
    including icons by Flaticon, and infographics & images by Freepik
    Thanks!
    52

    View Slide