Track Machine Learning Applications by MLflow Tracking

Bfb93c5dc0fde76ddac605bba0e1b642?s=47 suci
September 05, 2020

Track Machine Learning Applications by MLflow Tracking

Productization of machine learning (ML) solutions can be challenging. Therefore, the concept of operationalization on machine learning (MLOps) has emerged in the past few years for effective model lifecycle management. One of the core aspects of MLOps is "monitoring".

ML models are built by experimenting with a wide range of datasets. However, since the real data continue to change, it is necessary to monitor and to manage model usage, consumption, and results of models.

MLflow is an open-source framework designed to manage the end-to-end ML lifecycle with different components. In the talk, the basic concepts of MLflow will be introduced. Then, MLflow Tracking will be the main focus. You will know how to track experiments for recording and comparing parameters and results by MLflow Tracking.

Bfb93c5dc0fde76ddac605bba0e1b642?s=128

suci

September 05, 2020
Tweet

Transcript

  1. Track Machine Learning Applications by Tracking Shuhsi Lin @ PyConTW

    2020
  2. Lurking in PyHug, Taipei.py and various Meetups About Me 2

    Working in a manufacturing company With data and people Focus on • Agile/Engineering culture • IoT applications • Streaming process • Data visualization Shuhsi Lin sucitw gmail.com https://medium.com/@suci/
  3. ML Life Cycle MLOPS Logging & ML monitoring Logging ML

    Life Cycle Agenda -What we will focus on MLflow Tracking Track ML • Basic logging • Model logging • Auto-logging 3
  4. What we will not focus on 1. Details of ML

    Platform/ MLOps ◦ Deployment/Operation/Administration 2. MLflow Projects/Models/Model Registry 3. Infrastructure Details 4. Comparison of Different Tools 5. Machine Learning Algorithms or Frameworks 4
  5. Logging Everything 5

  6. Why Logging is important Stakeholder • Auditing for business •

    Product improvement from log statistics End-User • Self-Troubleshooting Developer • Profiling for performance • Debugging Sysadmin • Stability monitoring • Troubleshooting Security • Auditing for security Business Analytics Problem Solving 6
  7. What is Logging in Machine Learning 7

  8. ML Life Cycle 8

  9. ML is’t just code 9

  10. MLOps: Continuous delivery and automation pipelines in machine learning (originally

    adapted from Hidden Technical Debt in Machine Learning Systems) Elements for ML systems 10
  11. Exploratory Analysis Development Deployment Delivery Management • Data collection •

    ETL • Selection • Training • Evaluation • Validation • Versioned Model Dev/Analysis Featuring Engineering • Model registry • Monitor • Alert • Debug • Feedback • Resource manage • ... (Batch + Realtime) Deployment Pipeline • Audit • Score/Serve ML Life Cycle • Dashboard • Recommendation • Interdiction • ... User Interface Operation DEV PRD PRD PRD Retrain and re-tuning New model development/update features Inspired from “Enabling Scalable Data Science Pipelines with MLflow and Model Registry at Thermo Fisher Scientific” 11
  12. Manual process MLOps: Continuous delivery and automation pipelines in machine

    learning The Maturity of the MLOPS Process level 0 ML pipeline automation level 1 CI/CD pipeline automation level 2 • MLOps is an ML engineering culture and practice that aims at unifying ML system development (Dev) and ML system operation (Ops). • Practicing MLOps means that you advocate for automation and monitoring at all steps of ML system construction, including integration, testing, releasing, deployment and infrastructure management. MLOps: DevOps principles to ML systems 12
  13. MLOps level 0: Manual process MLOps: Continuous delivery and automation

    pipelines in machine learning 13
  14. DEV PRD MLOps level 1: ML pipeline automation ML OPS

    MLOps: Continuous delivery and automation pipelines in machine learning 14
  15. MLOps level 2: CI/CD pipeline automation MLOPS MLOps: Continuous delivery

    and automation pipelines in machine learning CI CD CD 15
  16. Model development & Post-Deployment ◦ Prove value of experiment ▪

    Need baseline to show and compare ◦ Collaborate ▪ Need to refer and access models and artifacts from other members ◦ Reproduce work ▪ Need same parameters and model of ex-run Experiment Tracking 16
  17. Log day-to-day work in ML life cycle ◦ Hyper parameters

    ◦ Training/modeling performances ◦ Model • Type • Building environment • Modeling version ◦ and so on What we should log/track in ML parameters • Convolutional filter • Kernel_size • Max pooling • Dropout • Dense • Batch_size • Epochs • …. Evaluation metrics • Mean Absolute Error (MAE) • Mean Squared Error (MSE) • Root Mean Squared Error (RSME) • R-squared (r2) • ... 17
  18. ML is complex and need to be tracked 18

  19. • Since 2018 from DataBricks (Main contributor) • An open

    platform for the machine learning lifecycle • Python Library; runs locally and on the cloud • Built-in UI for experiment visualization • Logging integrations for major frameworks: scikit-learn, PyTorch, TF,.. https://github.com/mlflow 19
  20. Collaboration with MLflow 20

  21. Components Main focus of this sharing! 21

  22. • Neptune (commercial) • Tensorboard (+MLflow.tensorflow) • TorchServe (+ MLflow.pytorch)

    • Kubeflow (Meta) • Data Science Workbench • ... Similar Tools 22
  23. TorchServe https://aws.amazon.com/tw/blogs/machine-learning/deploying-pytorch-models-for-inference-at-scale-using-torchserve/ Facebook + AWS 23

  24. Kubeflow https://www.kubeflow.org/docs/started/kubeflow-overview/ Tracking and managing metadata of machine learning workflows

    in Kubeflow 24
  25. Getting Started with Tracking 25

  26. 26

  27. Tracking APIs (REST, Python, Java, R) Experiment and metric tracking

    Experiment/Production pipeline Artifacts 27
  28. Entity • Code version • Start and end time •

    Source • (Hyper) Parameters • Metrics • Tags/Notes Run Experiments Terminology Run Run Run Run Run Run Run Run Backend stores • File store • Database Artifacts • Output files a. Images b. Pickled models c. Data files... File storage • Amazon S3 • Azure Blob Storage • Google Cloud Storage • FTP server • SFTP Server • NFS • HDFS Project 28
  29. Setup & Initialize Experiments Run Start a Run Log (Hyper)parameter

    Log metrics Log artifact Log model Train & Inference Data preparation Setup MLflow experiment Evaluation Post-analysis Feedback loop Compare runs/ Tuning (parameters/models) 29
  30. Tracking.API MLflow Tracking for ML Development • start_Run() • log_param()

    • log_metric() • log_artifact() • end_Run() • Parameters • Metrics • Output file ◦ Artifact • Code version • ... Output • log_model • Model 30
  31. DEMO Examples Tracking https://github.com/sucitw/mlflow_tracking 31

  32. Example Architecture Docker container Tracking server flask:5000 Artifacts Store docker

    run -d -p 5000:5000 \ -v /tmp/artifactStore:/tmp/mlflow/artifactStore \ --name mlflow-tracking-server \ suci/mlflow-tracking Backend store Tracking Metrics mlflow server \ --backend-store-uri $FILE_STORE \ --default-artifact-root $ARTIFACT_STORE \ --host $SERVER_HOST \ --port $SERVER_PORT Shared file system https://github.com/sucitw/mlflow_tracking Tracking API Tracking API 32
  33. Experiments Tracking- # Setup & Initialize MLflow experiment experiment_name =

    "PyconTW 2020 Demo " tracking_server = "http://localhost:5000" mlflow.set_tracking_uri(tracking_server) mlflow.set_experiment(experiment_name) #System Env setting #backend-store-uri $FILE_STORE #default-artifact-root $ARTIFACT_STORE #host $SERVER_HOST #port $SERVER_PORT Setup & Initialize 33
  34. Basic Logging 34

  35. Run Tracking- with mlflow.start_run() as run: # Log a parameter

    (key-value pair) log_param("param1", randint(0, 100)) # Log a metric; metrics can be updated throughout the run log_metric("metricsA", random()) log_metric("metricsA", random() + 1) log_metric("metricsA", random() + 2) log_metric("metricsB", random() + 2) # Log an artifact (output file) with open("outputs/test.txt", "w") as f: f.write("hello world! Run id:{}".format(type(mlflow.active_run().info))) log_artifacts("outputs") Train & Inference Log (Hyper)parameter Log artifact Log metrics 35
  36. Tracking UI 36

  37. run experiment 37

  38. Compare Two Runs Run A Run B 38

  39. Compare Two Runs 39

  40. Model Logging 40

  41. Model mlflow.<model-type>.log_model(model, ...) mlflow.<model-type>.load_model(modelpath) mlflow.<model-type>.deploy() Log->Load->deploy Built-In Model Flavors •

    Python Function (python_function) • R Function (crate) • H2O (h2o) • Keras (keras) • MLeap (mleap) • PyTorch (pytorch) • Scikit-learn (sklearn) • Spark MLlib (spark) • TensorFlow (tensorflow) • ONNX (onnx) • MXNet Gluon (gluon) • XGBoost (xgboost) • LightGBM (lightgbm) • Spacy(spaCy) • Fastai(fastai) <model-type> mlflow.sklearn.log_model(lr, "model") 41
  42. https://github.com/sucitw/mlflow_tracking/blob/master/pytw2020_demo_SK_model.py 42

  43. https://github.com/sucitw/mlflow_tracking/blob/master/pytw2020_demo_SK_model.py 43

  44. Automatic Logging 44

  45. Automated MLflow Tracking https://www.mlflow.org/docs/latest/tracking.html#automatic-logging 45

  46. import mlflow mlflow.tensorflow.autolog() model = train_model() Auto-Logging - TF import

    mlflow mlflow.log_param("layers", layers) model = train_model() mlflow.log_metric("mse", model.mse()) mlflow.log_artifact("plot", plot(model)) mlflow.tensorflow.log_model(model) # Manually logging # With autologging Capture TensorBoard metrics https://cs.stanford.edu/people/matei/papers/2020/deem_mlflow.pdf 46
  47. Recap 47

  48. MLOps: Continuous delivery and automation pipelines in machine learning (originally

    adapted from Hidden Technical Debt in Machine Learning Systems) ML is COMPLEX and need to be Tracked Elements for ML systems 48
  49. Managing the machine learning lifecycle - (Experiment) Tracking What can

    help Easy use in at the same way • Remotely /cloud or locally • Individual, team or large orgs Tracking Metrics • Simplified tracking for ML models means faster time to insights and value • Integrated with popular ML library & languages Model management • Launch of model registry enhances governance and core proposition of model management. 49
  50. • MLflow Tracking • MLflow Project • MLflow Models •

    MLflow Model registry • MLflow Deployment More Model governance Model deployment/serving Experiment tracking Azure Machine Learning RedisAI 50
  51. • MLflow official Doc • MLflow tracking • Learning MLflow

    ◦ 2020_Workshop | Managing the Complete Machine Learning Lifecycle with MLflow (DataBricks) • MLOps: Continuous delivery and automation pipelines in machine learning • 2020 Spark Summit: Enabling Scalable Data Science Pipelines with MLflow and Model Registry at Thermo Fisher Scientific • 2020 DEEM workshop: Developments in MLflow: A System to Accelerate the Machine Learning Lifecycle • Example codes of this talk (Github) Reference 51
  52. CREDITS: This presentation template was created by Slidesgo, including icons

    by Flaticon, and infographics & images by Freepik Thanks! 52