Slide 1

Slide 1 text

Polyaxon + Kubeflow を利用した 効率的な継続的モデルインテグレーション Shotaro Kohama 第9回 MLOps 勉強会 Tokyo (Online) Jul 14, 2021

Slide 2

Slide 2 text

Confidential & Proprietary 2021 What is Mercari US?

Slide 3

Slide 3 text

Confidential & Proprietary 2021 Machine Learning at Mercari US Mercari engineering | Price Guidance System leveraging Artificial Intelligence Techniques https://medium.com/mercari-engineering/price-guidance-system-74358bd96081 Price Suggestion Feature Smart Pricing Feature

Slide 4

Slide 4 text

Confidential & Proprietary 2021 Agenda Machine Learning Development Lifecycle Model Exploration with Polyaxon Continuous Training with Kubeflow Pipelines What we built to accelerate ML project iterations 0 1 2 3

Slide 5

Slide 5 text

Confidential & Proprietary 2021 Agenda Machine Learning Development Lifecycle Model Exploration with Polyaxon Continuous Training with Kubeflow Pipelines What we built to accelerate ML project iterations 0 1 2 3

Slide 6

Slide 6 text

Confidential & Proprietary 2021 ML Development Lifecycle ML Projects are highly iterative. How to accelerate the iteration is the key to the success of projects. We are able to accelerate iterations by automating manual processes with open-source MLOps and DevOps tools Organizing machine learning projects: project management guidelines. https://www.jeremyjordan.me/ml-projects-guide/

Slide 7

Slide 7 text

Confidential & Proprietary 2021 ML Development Lifecycle at Mercari US Model Exploration with Polyaxon Continuous Training with Kubeflow Pipelines (KFP) Continuous Delivery with Spinnaker Organizing machine learning projects: project management guidelines. https://www.jeremyjordan.me/ml-projects-guide/

Slide 8

Slide 8 text

Confidential & Proprietary 2021 Agenda Machine Learning Development Lifecycle Model Exploration with Polyaxon Continuous Training with Kubeflow Pipelines What we built to accelerate ML project iterations 0 1 2 3

Slide 9

Slide 9 text

Confidential & Proprietary 2021 Model Exploration with Polyaxon [NOTE] Poyaxon v0.6.1 の UI. Polyaxon v1.x では UI は異なる.

Slide 10

Slide 10 text

Confidential & Proprietary 2021 Model Exploration with Polyaxon --- version: 1 kind: group hptuning: concurrency: 100 matrix: learning_rate: linspace: 0.001:0.1:5 dropout: values: [0.25, 0.3] activation: values: [relu, sigmoid] declarations: batch_size: 128 num_steps: 500 num_epochs: 1 build: image: tensorflow/tensorflow:2.4.2-py3 build_steps: - pip3 install --no-cache-dir -U polyaxon-helper run: cmd: python3 model.py --batch_size={{ batch_size }} \ --num_steps={{ num_steps }} \ --learning_rate={{ learning_rate }} \ --dropout={{ dropout }} \ --num_epochs={{ num_epochs }} \ --activation={{ activation }} $ polyaxon run -u -f polyaxon_gridsearch.yml ... Creating an experiment group with the following definition: ---------------- ----------------- Search algorithm grid Concurrency 5 concurrent runs Early stopping deactivated ---------------- ----------------- Experiment group 1 was created [NOTE] Poyaxon v0.6.1 の Specification. Polyaxon v1.x では Specification は異なる.

Slide 11

Slide 11 text

Confidential & Proprietary 2021 How to run an experiment on Polyaxon Model Training 用の code を用意する Hyperparemeter Tuning Job を Polyaxonfile で定義する Polyaxon CLI を使って Polyaxonfile と Code を Upload する Polyaxon が各 job を Kubernetes 上で実行する 実験結果を UI 上で可視化する 1 2 3 4 5 My Favorite Point Code も一緒に Upload することで変更した後に すぐに Interactive に実行出来て便利

Slide 12

Slide 12 text

Confidential & Proprietary 2021 Polyaxon at Mercari US 使用期間 ● 2019 年の 2月頃から使い始めて、だいたい2年半くらい プロジェクト・実験数 (2021年5月時点) ● 175 Projects ● 約 870,000 Experiments 利用しているインフラ ● Google Cloud Kubernetes Engine ● Google Cloud Storage for logs, data, and artifacts ● Regular, Preemptible x CPU, GPU node-pools ● Google Filestore as NFS Persistent Volume

Slide 13

Slide 13 text

Confidential & Proprietary 2021 Continuous Training with Kubeflow Pipelines Kubeflow Pipelines (KFP) is a Machine Learning Workflow Engine ● KFP は Kubernetes 上で動く container based workflow engine ● KFP は metadata store を持っていて 各 step の input/output を保存する ● Python SDK を使って pipeline を DSL で 書くことができる

Slide 14

Slide 14 text

Confidential & Proprietary 2021 Continuous Model Delivery with Spinnaker KFP connects Polyaxon and Spinnaker for CD ● KFP から Polyaxon Job を定期実行、 新しい model を serve するための docker image を作成 ● Spinnaker は新しい docker image が レジストリに作成されるとデプロイを自 動的に実行 Mercari engineering | Continuous delivery and automation pipelines in machine learning with Polyaxon and Kubeflow Pipelines https://medium.com/mercari-engineering/continuous-delivery-and-automation-pipelines-in-machine-learning-with-polyaxon-and-kubeflow-d6a3668715de

Slide 15

Slide 15 text

Confidential & Proprietary 2021 Agenda Machine Learning Development Lifecycle Model Exploration with Polyaxon Continuous Training with Kubeflow Pipelines What we built to accelerate ML project iterations 0 1 2 3

Slide 16

Slide 16 text

Confidential & Proprietary 2021 What we built to accelerate Iterations Monorepo for Kubeflow Pipelines Monorepo を使うことで pipeline の version を CI で管理したり、ベストプラクティスを共有可能に Manifests to manage projects on Polyaxon and KFP Yaml で KFP と Polyaxon のリソースを定義できるようにし Instrastructure as Code のように管理可能に A KFP component to submit a Polyaxon Job KFP component を利用して、簡単に KFP から Polyaxon Job を submit 可能に

Slide 17

Slide 17 text

Confidential & Proprietary 2021 Monorepo for Kubeflow Pipelines $ tree mercari-us-kubeflow-pipelines mercari-us-kubeflow-pipelines ├── components # directory for KFP components ├── docs # directory for documents ├── package │ └── merkfp # python package for lightweight KFP components ├── pipelines # directory for each project pipelines │ └── mercari-us-ml-price-suggestion │ └── train_model.py ├── projects # directory for “project” manifests │ └── mercari-us-ml-price-suggestion.yml └── scripts # directory for scripts on continuous integration KFP + Continuous Integration ● KFP Lightweight component や Secret 名などの定数を定義する python package を用意 ● branch_name + commit_hash で pipeline version を管理 ● 修正された pipeline のみを CI で compile して upload する

Slide 18

Slide 18 text

Confidential & Proprietary 2021 “Project” Manifest for KFP and Polyaxon CI creates resources like Infrastructure as code ● Yaml で KFP の Experiments や Polyaxon の Project を定義可能に ● Dev と Prod の一貫性を保つために Yaml を元に CI から作成 ● GitHub Codeowners も生成 --- kind: Project name: mercari-us-ml-price-suggestion experiments: - name: “Default” - name: “Sneakers” - name: “Trading Cards” owners: - github: "@kouzoh/mercari-price-suggest-us-prod" mercari-ml-price-suggestion-us.yml

Slide 19

Slide 19 text

Confidential & Proprietary 2021 Polyaxon Kubeflow Pipelines Component 1 2 3 4 5 Init container で secret を使って private repo を clone Main container で secret を使って Polyaxon に login Training job を Polyaxon API を使って submit Log を tail しながら Job が終わるのを待つ Project, Job ID, Status などを次のステップに output

Slide 20

Slide 20 text

Confidential & Proprietary 2021 Continuous Training With Polyaxon + KFP

Slide 21

Slide 21 text

Confidential & Proprietary 2021 Takeaways Polyaxon helps us to achieve a scalable and reproducible model exploration Monorepo + CI for KFP works to keep consistency and to spread best practices A custom KFP component for Polyaxon enables us to move forward seamlessly 1 2 3

Slide 22

Slide 22 text

Confidential & Proprietary 2021 Thanks!