Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MLflow and the Machine Learning Lifecycle

Sponsored · Ship Features Fearlessly Turn features on and off without deploys. Used by thousands of Ruby developers.
Avatar for Giulia Giulia
November 20, 2018

MLflow and the Machine Learning Lifecycle

Forty minutes presentation given at XebiCon'18

https://youtu.be/ikHI-ys-cYM

Avatar for Giulia

Giulia

November 20, 2018
Tweet

More Decks by Giulia

Other Decks in Programming

Transcript

  1. Giulia BIANCHI Data Scientist @XebiaFr @Giuliabianchl Fares OUESLATI ML Engineer

    @XebiaFr @fares_oueslati Loïc DIVAD Software Engineer @XebiaFr @LoicMDivad 2
  2. • Dataset from scikit-learn • Predict diabetes progression given ◦

    Age ◦ Sex ◦ Body Mass Index (BMI - indice de masse corporelle) ◦ Blood Pressure (BP - tension) ◦ Other blood measurements 11 @Xebiconfr #Xebicon18 @Giuliabianchl LARS paper
  3. 15 @Xebiconfr #Xebicon18 @Giuliabianchl import mlflow import mlflow.sklearn with mlflow.start_run():

    model = Model(param_1, param_2) model.fit(train_data, label) prediction = model.predict(test_data) (rmse, mae, r2) = eval_metrics(test_label, prediction) mlflow.log_param("param_1", param_1) mlflow.log_param("param_2", param_2) mlflow.log_metric("rmse", rmse) mlflow.log_metric("r2", r2) mlflow.log_metric("mae", mae) mlflow.sklearn.log_model(model, "model")
  4. 19 @Xebiconfr #Xebicon18 @fares_oueslati # MLproject file name: My Project

    conda_env: my_env.yaml entry_points: main: parameters: data_file: <path> regularization: {type: float, default: 0.1} command: "python train.py -r {regularization} {data_file}" validate: parameters: data_file: <path> command: "python validate.py {data_file}"
  5. 20 @Xebiconfr #Xebicon18 @fares_oueslati # Remote Run $ mlflow run

    [email protected]:mlflow/mlflow-example.git -P alpha=0.5 # Local Run $ mlflow run . -P alpha=0.5
  6. Model Format flavor 2 flavor 1 Cloud Serving Tools Batch

    & Stream Scoring Inference Code 23 ML Frameworks @Xebiconfr #Xebicon18 @fares_oueslati
  7. 24 @Xebiconfr #Xebicon18 @fares_oueslati # MLmodel file artifact_path: model flavors:

    python_function: data: model.pkl loader_module: mlflow.sklearn python_version: 2.7.15 sklearn: pickled_model: model.pkl sklearn_version: 0.19.2 run_id: 32357c10ae854113b0503e880e7433c1 utc_time_created: '2018-10-29 11:31:01.434417' Usable by any tool that can run Python (Docker, Spark etc.) Usable by tools that understand Sklearn model format
  8. MLflow built-in flavors pyfunc rfunc h2o keras sklearn spark mleap

    pytorch tensorflow 25 @Xebiconfr #Xebicon18 @fares_oueslati
  9. Generic & self-contained flavor that describes how to run the

    model as a lambda function pyfunc rfunc 26 h2o keras sklearn spark mleap pytorch tensorflow @Xebiconfr #Xebicon18 @fares_oueslati
  10. MLflow built-in flavors pyfunc rfunc h2o keras sklearn spark mleap

    pytorch tensorflow 27 @Xebiconfr #Xebicon18 @fares_oueslati
  11. MLflow API example: sklearn flavours mlflow flavor api \ save_model

    \ log_model \ load_model 29 @Xebiconfr #Xebicon18 @fares_oueslati
  12. The MLflow project structure $ tree . ├── R ├──

    azureml ├── entities ├── java ├── models ├── projects ├── protos ├── sagemaker ├── server ├── store ├── tracking └── utils 31 @Xebiconfr #Xebicon18 @LoicMDivad
  13. The three main modules are materialized by python packages $

    tree . ├── R ├── azureml ├── entities ├── java ├── models ├── projects ├── protos ├── sagemaker ├── server ├── store ├── tracking └── utils The MLflow project structure 32 @Xebiconfr #Xebicon18 @LoicMDivad
  14. $ tree . ├── R ├── azureml ├── entities ├──

    java ├── models ├── projects ├── protos ├── sagemaker ├── server ├── store ├── tracking └── utils The MLflow project structure Managed solutions from cloud providers have their own package The three main modules are materialized by python packages 33 @Xebiconfr #Xebicon18 @LoicMDivad
  15. Other programming languages than python have their subproject $ tree

    . ├── R ├── azureml ├── entities ├── java ├── models ├── projects ├── protos ├── sagemaker ├── server ├── store ├── tracking └── utils Managed solutions from cloud providers have their own package The three main modules are materialized by python packages The MLflow project structure 34 @Xebiconfr #Xebicon18 @LoicMDivad
  16. Other programming languages than python have their subproject $ tree

    . ├── R ├── azureml ├── entities ├── java ├── models ├── projects ├── protos ├── sagemaker ├── server ├── store ├── tracking └── utils Managed solutions from cloud providers have their own package The three main modules are materialized by python packages The MLflow project structure 35 @Xebiconfr #Xebicon18 @LoicMDivad
  17. Experiment Run Param RunData Metric 37 @Xebiconfr #Xebicon18 @LoicMDivad artifact_path:

    model flavors: python_function: data: model.pkl loader_module: mlflow.sklearn sklearn: pickled_model: model.pkl sklearn_version: 0.19.1 run_id: cf5db2cc7c0d4074bbccd970d912e1c8 utc_time_created: '2018-07-28 15:49:49.055985' MLflowObject
  18. Experiment Run Param RunData Metric 38 @Xebiconfr #Xebicon18 @LoicMDivad artifact_path:

    model flavors: python_function: data: model.pkl loader_module: mlflow.sklearn sklearn: pickled_model: model.pkl sklearn_version: 0.19.1 run_id: cf5db2cc7c0d4074bbccd970d912e1c8 utc_time_created: '2018-07-28 15:49:49.055985' MLflowObject
  19. $ tree . ├── azureml ├── entities ├── ... ├──

    java └── store ├── abstract_store.py ├── artifact_repo.py ├── azure_blob_artifact_repo.py ├── dbfs_artifact_repo.py ├── file_store.py ├── gcs_artifact_repo.py ├── local_artifact_repo.py ├── rest_store.py ├── s3_artifact_repo.py └── sftp_artifact_repo.py 39 @Xebiconfr #Xebicon18 @LoicMDivad
  20. Experiment Metadata • Runs • Parameters • Metrics ... Large

    artefacts • Datasets • Models • Images ... ArtifactRepository AbstractStore FileStore RestStore S3 GCS DBFS ... 40 @Xebiconfr #Xebicon18 @LoicMDivad
  21. if artifact_uri.startswith("s3:/"): import S3ArtifactRepository elif artifact_uri.startswith("gs:/"): import GCSArtifactRepository elif artifact_uri.startswith("wasbs:/"):

    import AzureBlobArtifactRepository elif artifact_uri.startswith("sftp:/"): import SFTPArtifactRepository elif artifact_uri.startswith("dbfs:/"): import DbfsArtifactRepository else: import LocalArtifactRepository 41 @Xebiconfr #Xebicon18 @LoicMDivad
  22. if artifact_uri.startswith("s3:/"): import S3ArtifactRepository elif artifact_uri.startswith("gs:/"): import GCSArtifactRepository elif artifact_uri.startswith("wasbs:/"):

    import AzureBlobArtifactRepository elif artifact_uri.startswith("sftp:/"): import SFTPArtifactRepository elif artifact_uri.startswith("dbfs:/"): import DbfsArtifactRepository else: import LocalArtifactRepository 42 from google.cloud import storage as gcs_storage # or from azure.storage.blob import BlockBlobService # requires # GOOGLE_APPLICATION_CREDENTIALS # AWS_SECRET_ACCESS_KEY # ETC ... @Xebiconfr #Xebicon18 @LoicMDivad
  23. if artifact_uri.startswith("s3:/"): import S3ArtifactRepository elif artifact_uri.startswith("gs:/"): import GCSArtifactRepository elif artifact_uri.startswith("wasbs:/"):

    import AzureBlobArtifactRepository elif artifact_uri.startswith("sftp:/"): import SFTPArtifactRepository elif artifact_uri.startswith("dbfs:/"): import DbfsArtifactRepository else: import LocalArtifactRepository 43 from google.cloud import storage as gcs_storage # or from azure.storage.blob import BlockBlobService # requires # GOOGLE_APPLICATION_CREDENTIALS # AWS_SECRET_ACCESS_KEY # ETC ... @Xebiconfr #Xebicon18 @LoicMDivad
  24. . └── EXPERIMENT-0 │ ├── RUN-4a59d6 │ ├── artifacts │

    ├── meta.yaml │ ├── metrics │ └── params │ │ └── RUN-99d663 ├── artifacts ├── meta.yaml ├── metrics └── params Server Tracking Store 44 @Xebiconfr #Xebicon18 @LoicMDivad
  25. . └── EXPERIMENT-0 │ ├── RUN-4a59d6 │ ├── artifacts │

    ├── meta.yaml │ ├── metrics │ └── params │ │ └── RUN-99d663 ├── artifacts ├── meta.yaml ├── metrics └── params Server Tracking Store 45 $ mlflow server Launch a Flask server with four workers by default. It receives Byte Streams from client API and serializes the result in an artifact repo @Xebiconfr #Xebicon18 @LoicMDivad
  26. 1. Give access to MLflow for all JVM users 2.

    A CRUD interface to MLflow available as Maven artifact 3. Come with MLeap flavour to save a Spark model in a SparkML format or MLeap format 46 @Xebiconfr #Xebicon18 @LoicMDivad MlflowClient client = new MlflowClient(); long expId = client.createExperiment(expName); // ... RunInfo runCreated = client.createRun(expId, sourceFile); client.logParam(runId, "min_samples_leaf", "2"); client.logParam(runId, "max_depth", "3"); // Log metrics client.logMetric(runId, "auc", 2.12F); client.logMetric(runId, "accuracy_score", 3.12F); client.logMetric(runId, "zero_one_loss", 4.12F); // Finished run client.setTerminated(runId, RunStatus.FINISHED);
  27. MLflow v0.8.0 is still in alpha: ➢ Load data from

    diverse formats (e.g. CSV vs Parquet) ➢ Database backend tracking store ➢ Common hyperparameters tuning libraries integration ➢ Built-in Spark MLlib & PyTorch integration ➢ Support HDFS Artifact Repository ➢ ... 47 @Xebiconfr #Xebicon18 @LoicMDivad
  28. ➢ MLflow allows to keep track of results and make

    them reproducible ◦ So you iterate faster and run through the machine learning life cycle ➢ The goal of MLflow is to make it easier to switch between tools ➢ MLflow is open source and open interface solution ➢ The Machine Learning platform is tool to unify Data Science and Engineering 48 @Xebiconfr #Xebicon18 @LoicMDivad
  29. 49