Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Efficient Model Exploring and Continuous Delivery With Polyaxon + Kubeflow - KubeCon + CloudNativeCon EU 2021 Virtual

Efficient Model Exploring and Continuous Delivery With Polyaxon + Kubeflow - KubeCon + CloudNativeCon EU 2021 Virtual

I gave a talk in KubeCon + CloudNativeCon EU 2021 Virtual. Here is the detail: https://kccnceu2021.sched.com/event/iE5c.

Shotaro Kohama

July 13, 2021
Tweet

More Decks by Shotaro Kohama

Other Decks in Programming

Transcript

  1. Shotaro Kohama, Mercari
    Efficient Model Exploring
    and Continuous Delivery
    With Polyaxon + Kubeflow

    View full-size slide

  2. What is Mercari?
    ● Mercari is a customer-to-customer
    marketplace where individuals can give new
    life to items around them—that have fallen out
    of use—by selling them to other customers.
    ● Our unique challenge is around pricing - since
    items are in various conditions, we need to
    help customers find the right price to sell their
    items, which is where we leverage ML.
    ● Today, I'm going to talk about the operations
    side of our ML pipeline to enable features like
    our Price Guidance System.

    View full-size slide

  3. Machine Learning at Mercari US
    Price Guidance System
    ● The Price Suggestion feature recommends a
    viable price range for an item during the listing
    process.
    ● The Smart Pricing feature continuously updates
    the listing price until it hits a user-specified floor
    price or the item gets sold.
    ● An ML model takes item’s title, description,
    category, brand, and condition to suggest the
    listing and floor prices in real-time.
    Mercari engineering | Price Guidance System leveraging Artificial Intelligence Techniques
    https://medium.com/mercari-engineering/price-guidance-system-74358bd96081

    View full-size slide

  4. Agenda
    Machine Learning Development Lifecycle
    Model Exploration with Polyaxon
    Continuous Training with Kubeflow Pipelines
    What we built to accelerate ML project iterations
    0
    1
    2
    3

    View full-size slide

  5. Agenda
    Machine Learning Development Lifecycle
    Model Exploration with Polyaxon
    Continuous Training with Kubeflow Pipelines
    What we built to accelerate ML project iterations
    0
    1
    2
    3

    View full-size slide

  6. ML Development Lifecycle
    ML Projects are highly iterative.
    How to accelerate the iteration is the
    key to the success of projects.
    We can be able to accelerate iterations
    by automating manual processes with
    open-source MLOps and DevOps tools
    Organizing machine learning projects: project management guidelines.
    https://www.jeremyjordan.me/ml-projects-guide/

    View full-size slide

  7. ML Project Lifecycle at Mercari US
    Model Exploration with Polyaxon
    Polyaxon is an ML Ops tool to support scalable and
    reproducible model exploration.
    Continuous Training with Kubeflow Pipelines
    Kubeflow Pipelines is an open source for ML training pipeline
    management. We set up a scheduled job to build a docker
    image to serve a new trained model on top of it.
    Continuous Delivery with Spinnaker
    Spinnaker is an open source for Continuous Delivery.
    Spinnaker is able to trigger a deploy pipeline when an image
    is pushed to an image registry.
    Organizing machine learning projects: project management guidelines.
    https://www.jeremyjordan.me/ml-projects-guide/

    View full-size slide

  8. Agenda
    Machine Learning Development Lifecycle
    Model Exploration with Polyaxon
    Continuous Training with Kubeflow Pipelines
    What we built to accelerate ML project iterations
    0
    1
    2
    3

    View full-size slide

  9. What is Polyaxon?
    Scalable and Reproducible Experiments
    ● Polyaxon is an MLOps tool to support scalable
    and reproducible model exploration.
    ● Polyaxon provides a yaml specification to run
    hyperparameter tuning jobs on Kubernetes. The
    tuning jobs will run parallelly and scalably on top
    of a cluster autoscaler.
    ● The yaml specification enables other developers
    to reproduce the experiment easily.
    1

    View full-size slide

  10. How to run a job on Polyaxon
    Define Polyaxonfile to run a parameter tuning job.
    Create code to train an ML model.
    Upload Polyaxonfile and the code with Polyaxon CLI.
    Experiments will run on Kubernetes.
    1
    2
    3
    4
    My Favorite Point
    Polyaxon builds a docker image to run training code. A developer doesn’t have to wait for CI to build a docker image
    every time the developer modifies the code. That prevents an interruption from happening in a development flow.

    View full-size slide

  11. Polyaxon at Mercari US
    How long we’ve been using Polyaxon
    ● For about 2 years since Feb, 2019.
    How many projects/experiments we’ve run
    ● 175 projects
    ● About 87,000 experiments
    What infrastructure we’ve been using
    ● Google Cloud Kubernetes Engine
    ● Google Cloud Storage for logs, data, and artifacts
    ● Regular, Preemptible x CPU, GPU node-pools
    ● Google Filestore as NFS Persistent Volume

    View full-size slide

  12. What is Kubeflow Pipelines?
    Kubeflow Pipelines (KFP) is a
    Machine Learning Workflow Engine
    ● Kubeflow Pipelines is an open source to
    manage end-to-end machine learning
    pipelines.
    ● Kubeflow Pipelines has an integrated
    metadata store. The inputs and outputs of
    a stage will be automatically stored in the
    metadata store.
    ● Kubeflow Pipelines allows a developer to
    implement easily a reusable component
    based on Python SDK.
    2

    View full-size slide

  13. Kubeflow Pipelines at Mercari US
    KFP connects Polyaxon and Spinnaker
    for Continuous Model Deployment
    ● KFP Web UI enables a developer to set up a
    scheduled job.
    ● A pipeline submits a training job on Polyaxon
    and builds a docker image to serve a new
    trained model.
    ● Spinnaker can trigger a deployment pipeline
    automatically when a new docker image is
    pushed to a docker registry.
    Mercari engineering | Continuous delivery and automation pipelines in machine learning with Polyaxon and Kubeflow Pipelines
    https://medium.com/mercari-engineering/continuous-delivery-and-automation-pipelines-in-machine-learning-with-polyaxon-and-kubeflow-d6a3668715de

    View full-size slide

  14. What we built to accelerate iterations
    Monorepo for Kubeflow Pipelines
    We built a monorepo to manage pipeline versions in a git workflow and to share best practices.
    Manifests to manage projects on Polyaxon and KFP
    We defined a manifest to prepare resources on Kubeflow Pipelines and Polyaxon like infrastructure as code.
    A KFP component to submit a Polyaxon job
    We developed a Kubeflow Pipelines component to submit a job from KFP to Polyaxon.
    3

    View full-size slide

  15. Monorepo for Kubeflow Pipelines
    KFP + Continuous Integration (CI)
    ● Monorepo contains KFP components and
    a python package to define lightweight
    KFP components.
    ● CI will detect modified pipelines, and
    compiles and uploads them as the version:
    branch_name + commit_hash.
    ● When a branch is merged into the main
    branch, CI will upload the updated
    pipelines to the production cluster.
    $ tree mercari-us-kubeflow-pipelines
    mercari-us-kubeflow-pipelines
    ├── components # directory for KFP components
    ├── docs # directory for documents
    ├── package
    │ └── merkfp # python package for lightweight KFP components
    ├── pipelines # directory for each project pipelines
    │ └── mercari-us-ml-price-suggestion
    │ └── train_model.py
    ├── projects # directory for “project” manifests
    │ └── mercari-us-ml-price-suggestion.yml
    └── scripts # directory for scripts on continuous integration

    View full-size slide

  16. “Project” Manifest for KFP and Polyaxon
    Continuous Integration (CI) creates
    resources like Infrastructure as Code
    ● CI will create KFP experiments and Polyaxon
    projects for the development and production
    environments to keep consistency.
    ● CI will generate Github code owners based on
    “owners”. It allows each team to approve pull
    requests to modify project-related code.
    ---
    kind: Project
    name: mercari-us-ml-price-suggestion
    experiments:
    - name: “Default”
    - name: “Sneakers”
    - name: “Trading Cards”
    owners:
    - github: "@kouzoh/mercari-price-suggest-us-prod"
    mercari-ml-price-suggestion-us.yml

    View full-size slide

  17. Polyaxon Kubeflow Pipelines Component
    An init container clones a private repo with a secret.
    The main container logs a user in to Polyaxon with a secret.
    The main container submits a training job through Polyaxon API.
    The main container trails the logs until the job ends.
    The component outputs Project, User, Job ID, Status for the next step.
    1
    2
    3
    4
    5

    View full-size slide

  18. Continuous Training with Polyaxon + KFP
    Mercari engineering | Continuous delivery and automation pipelines in machine learning with Polyaxon and Kubeflow Pipelines
    https://medium.com/mercari-engineering/continuous-delivery-and-automation-pipelines-in-machine-learning-with-polyaxon-and-kubeflow-d6a3668715de

    View full-size slide

  19. Takeaways
    Polyaxon suits the model exploration in a scalable and reproducible way
    Monorepo + CI for KFP works well to keep high efficiency and consistency
    A custom KFP component for Polyaxon enables us to move forward seamlessly
    1
    2
    3

    View full-size slide