Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Efficient Model Exploring and Continuous Delivery With Polyaxon + Kubeflow - KubeCon + CloudNativeCon EU 2021 Virtual

Efficient Model Exploring and Continuous Delivery With Polyaxon + Kubeflow - KubeCon + CloudNativeCon EU 2021 Virtual

I gave a talk in KubeCon + CloudNativeCon EU 2021 Virtual. Here is the detail: https://kccnceu2021.sched.com/event/iE5c.

Shotaro Kohama

July 13, 2021
Tweet

More Decks by Shotaro Kohama

Other Decks in Programming

Transcript

  1. What is Mercari? • Mercari is a customer-to-customer marketplace where

    individuals can give new life to items around them—that have fallen out of use—by selling them to other customers. • Our unique challenge is around pricing - since items are in various conditions, we need to help customers find the right price to sell their items, which is where we leverage ML. • Today, I'm going to talk about the operations side of our ML pipeline to enable features like our Price Guidance System.
  2. Machine Learning at Mercari US Price Guidance System • The

    Price Suggestion feature recommends a viable price range for an item during the listing process. • The Smart Pricing feature continuously updates the listing price until it hits a user-specified floor price or the item gets sold. • An ML model takes item’s title, description, category, brand, and condition to suggest the listing and floor prices in real-time. Mercari engineering | Price Guidance System leveraging Artificial Intelligence Techniques https://medium.com/mercari-engineering/price-guidance-system-74358bd96081
  3. Agenda Machine Learning Development Lifecycle Model Exploration with Polyaxon Continuous

    Training with Kubeflow Pipelines What we built to accelerate ML project iterations 0 1 2 3
  4. Agenda Machine Learning Development Lifecycle Model Exploration with Polyaxon Continuous

    Training with Kubeflow Pipelines What we built to accelerate ML project iterations 0 1 2 3
  5. ML Development Lifecycle ML Projects are highly iterative. How to

    accelerate the iteration is the key to the success of projects. We can be able to accelerate iterations by automating manual processes with open-source MLOps and DevOps tools Organizing machine learning projects: project management guidelines. https://www.jeremyjordan.me/ml-projects-guide/
  6. ML Project Lifecycle at Mercari US Model Exploration with Polyaxon

    Polyaxon is an ML Ops tool to support scalable and reproducible model exploration. Continuous Training with Kubeflow Pipelines Kubeflow Pipelines is an open source for ML training pipeline management. We set up a scheduled job to build a docker image to serve a new trained model on top of it. Continuous Delivery with Spinnaker Spinnaker is an open source for Continuous Delivery. Spinnaker is able to trigger a deploy pipeline when an image is pushed to an image registry. Organizing machine learning projects: project management guidelines. https://www.jeremyjordan.me/ml-projects-guide/
  7. Agenda Machine Learning Development Lifecycle Model Exploration with Polyaxon Continuous

    Training with Kubeflow Pipelines What we built to accelerate ML project iterations 0 1 2 3
  8. What is Polyaxon? Scalable and Reproducible Experiments • Polyaxon is

    an MLOps tool to support scalable and reproducible model exploration. • Polyaxon provides a yaml specification to run hyperparameter tuning jobs on Kubernetes. The tuning jobs will run parallelly and scalably on top of a cluster autoscaler. • The yaml specification enables other developers to reproduce the experiment easily. 1
  9. How to run a job on Polyaxon Define Polyaxonfile to

    run a parameter tuning job. Create code to train an ML model. Upload Polyaxonfile and the code with Polyaxon CLI. Experiments will run on Kubernetes. 1 2 3 4 My Favorite Point Polyaxon builds a docker image to run training code. A developer doesn’t have to wait for CI to build a docker image every time the developer modifies the code. That prevents an interruption from happening in a development flow.
  10. Polyaxon at Mercari US How long we’ve been using Polyaxon

    • For about 2 years since Feb, 2019. How many projects/experiments we’ve run • 175 projects • About 87,000 experiments What infrastructure we’ve been using • Google Cloud Kubernetes Engine • Google Cloud Storage for logs, data, and artifacts • Regular, Preemptible x CPU, GPU node-pools • Google Filestore as NFS Persistent Volume
  11. What is Kubeflow Pipelines? Kubeflow Pipelines (KFP) is a Machine

    Learning Workflow Engine • Kubeflow Pipelines is an open source to manage end-to-end machine learning pipelines. • Kubeflow Pipelines has an integrated metadata store. The inputs and outputs of a stage will be automatically stored in the metadata store. • Kubeflow Pipelines allows a developer to implement easily a reusable component based on Python SDK. 2
  12. Kubeflow Pipelines at Mercari US KFP connects Polyaxon and Spinnaker

    for Continuous Model Deployment • KFP Web UI enables a developer to set up a scheduled job. • A pipeline submits a training job on Polyaxon and builds a docker image to serve a new trained model. • Spinnaker can trigger a deployment pipeline automatically when a new docker image is pushed to a docker registry. Mercari engineering | Continuous delivery and automation pipelines in machine learning with Polyaxon and Kubeflow Pipelines https://medium.com/mercari-engineering/continuous-delivery-and-automation-pipelines-in-machine-learning-with-polyaxon-and-kubeflow-d6a3668715de
  13. What we built to accelerate iterations Monorepo for Kubeflow Pipelines

    We built a monorepo to manage pipeline versions in a git workflow and to share best practices. Manifests to manage projects on Polyaxon and KFP We defined a manifest to prepare resources on Kubeflow Pipelines and Polyaxon like infrastructure as code. A KFP component to submit a Polyaxon job We developed a Kubeflow Pipelines component to submit a job from KFP to Polyaxon. 3
  14. Monorepo for Kubeflow Pipelines KFP + Continuous Integration (CI) •

    Monorepo contains KFP components and a python package to define lightweight KFP components. • CI will detect modified pipelines, and compiles and uploads them as the version: branch_name + commit_hash. • When a branch is merged into the main branch, CI will upload the updated pipelines to the production cluster. $ tree mercari-us-kubeflow-pipelines mercari-us-kubeflow-pipelines ├── components # directory for KFP components ├── docs # directory for documents ├── package │ └── merkfp # python package for lightweight KFP components ├── pipelines # directory for each project pipelines │ └── mercari-us-ml-price-suggestion │ └── train_model.py ├── projects # directory for “project” manifests │ └── mercari-us-ml-price-suggestion.yml └── scripts # directory for scripts on continuous integration
  15. “Project” Manifest for KFP and Polyaxon Continuous Integration (CI) creates

    resources like Infrastructure as Code • CI will create KFP experiments and Polyaxon projects for the development and production environments to keep consistency. • CI will generate Github code owners based on “owners”. It allows each team to approve pull requests to modify project-related code. --- kind: Project name: mercari-us-ml-price-suggestion experiments: - name: “Default” - name: “Sneakers” - name: “Trading Cards” owners: - github: "@kouzoh/mercari-price-suggest-us-prod" mercari-ml-price-suggestion-us.yml
  16. Polyaxon Kubeflow Pipelines Component An init container clones a private

    repo with a secret. The main container logs a user in to Polyaxon with a secret. The main container submits a training job through Polyaxon API. The main container trails the logs until the job ends. The component outputs Project, User, Job ID, Status for the next step. 1 2 3 4 5
  17. Continuous Training with Polyaxon + KFP Mercari engineering | Continuous

    delivery and automation pipelines in machine learning with Polyaxon and Kubeflow Pipelines https://medium.com/mercari-engineering/continuous-delivery-and-automation-pipelines-in-machine-learning-with-polyaxon-and-kubeflow-d6a3668715de
  18. Takeaways Polyaxon suits the model exploration in a scalable and

    reproducible way Monorepo + CI for KFP works well to keep high efficiency and consistency A custom KFP component for Polyaxon enables us to move forward seamlessly 1 2 3