Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Polyaxon + Kubeflow を利用した効率的な継続的モデルインテグレーション / Continuous ML Model Integration with Polyaxon and Kubefolow Pipelines

Polyaxon + Kubeflow を利用した効率的な継続的モデルインテグレーション / Continuous ML Model Integration with Polyaxon and Kubefolow Pipelines

第9回 MLOps 勉強会 Tokyo (Online): https://mlops.connpass.com/event/215133/ でトークした際の資料です

Shotaro Kohama

July 13, 2021
Tweet

More Decks by Shotaro Kohama

Other Decks in Programming

Transcript

  1. Polyaxon + Kubeflow を利用した
    効率的な継続的モデルインテグレーション
    Shotaro Kohama
    第9回 MLOps 勉強会 Tokyo (Online) Jul 14, 2021

    View full-size slide

  2. Confidential & Proprietary 2021
    What is Mercari US?

    View full-size slide

  3. Confidential & Proprietary 2021
    Machine Learning at Mercari US
    Mercari engineering | Price Guidance System leveraging Artificial Intelligence Techniques
    https://medium.com/mercari-engineering/price-guidance-system-74358bd96081
    Price Suggestion Feature Smart Pricing Feature

    View full-size slide

  4. Confidential & Proprietary 2021
    Agenda
    Machine Learning Development Lifecycle
    Model Exploration with Polyaxon
    Continuous Training with Kubeflow Pipelines
    What we built to accelerate ML project iterations
    0
    1
    2
    3

    View full-size slide

  5. Confidential & Proprietary 2021
    Agenda
    Machine Learning Development Lifecycle
    Model Exploration with Polyaxon
    Continuous Training with Kubeflow Pipelines
    What we built to accelerate ML project iterations
    0
    1
    2
    3

    View full-size slide

  6. Confidential & Proprietary 2021
    ML Development Lifecycle
    ML Projects are highly iterative.
    How to accelerate the iteration is the
    key to the success of projects.
    We are able to accelerate iterations by
    automating manual processes with
    open-source MLOps and DevOps tools
    Organizing machine learning projects: project management guidelines.
    https://www.jeremyjordan.me/ml-projects-guide/

    View full-size slide

  7. Confidential & Proprietary 2021
    ML Development Lifecycle at Mercari US
    Model Exploration with Polyaxon
    Continuous Training with
    Kubeflow Pipelines (KFP)
    Continuous Delivery with Spinnaker
    Organizing machine learning projects: project management guidelines.
    https://www.jeremyjordan.me/ml-projects-guide/

    View full-size slide

  8. Confidential & Proprietary 2021
    Agenda
    Machine Learning Development Lifecycle
    Model Exploration with Polyaxon
    Continuous Training with Kubeflow Pipelines
    What we built to accelerate ML project iterations
    0
    1
    2
    3

    View full-size slide

  9. Confidential & Proprietary 2021
    Model Exploration with Polyaxon
    [NOTE] Poyaxon v0.6.1 の UI. Polyaxon v1.x では UI は異なる.

    View full-size slide

  10. Confidential & Proprietary 2021
    Model Exploration with Polyaxon
    ---
    version: 1
    kind: group
    hptuning:
    concurrency: 100
    matrix:
    learning_rate:
    linspace: 0.001:0.1:5
    dropout:
    values: [0.25, 0.3]
    activation:
    values: [relu, sigmoid]
    declarations:
    batch_size: 128
    num_steps: 500
    num_epochs: 1
    build:
    image: tensorflow/tensorflow:2.4.2-py3
    build_steps:
    - pip3 install --no-cache-dir -U polyaxon-helper
    run:
    cmd: python3 model.py --batch_size={{ batch_size }} \
    --num_steps={{ num_steps }} \
    --learning_rate={{ learning_rate }} \
    --dropout={{ dropout }} \
    --num_epochs={{ num_epochs }} \
    --activation={{ activation }}
    $ polyaxon run -u -f polyaxon_gridsearch.yml
    ...
    Creating an experiment group with the following
    definition:
    ---------------- -----------------
    Search algorithm grid
    Concurrency 5 concurrent runs
    Early stopping deactivated
    ---------------- -----------------
    Experiment group 1 was created
    [NOTE] Poyaxon v0.6.1 の Specification. Polyaxon v1.x では Specification は異なる.

    View full-size slide

  11. Confidential & Proprietary 2021
    How to run an experiment on Polyaxon
    Model Training 用の code を用意する
    Hyperparemeter Tuning Job を Polyaxonfile で定義する
    Polyaxon CLI を使って Polyaxonfile と Code を Upload する
    Polyaxon が各 job を Kubernetes 上で実行する
    実験結果を UI 上で可視化する
    1
    2
    3
    4
    5
    My Favorite Point
    Code も一緒に Upload することで変更した後に
    すぐに Interactive に実行出来て便利

    View full-size slide

  12. Confidential & Proprietary 2021
    Polyaxon at Mercari US
    使用期間
    ● 2019 年の 2月頃から使い始めて、だいたい2年半くらい
    プロジェクト・実験数 (2021年5月時点)
    ● 175 Projects
    ● 約 870,000 Experiments
    利用しているインフラ
    ● Google Cloud Kubernetes Engine
    ● Google Cloud Storage for logs, data, and artifacts
    ● Regular, Preemptible x CPU, GPU node-pools
    ● Google Filestore as NFS Persistent Volume

    View full-size slide

  13. Confidential & Proprietary 2021
    Continuous Training with Kubeflow Pipelines
    Kubeflow Pipelines (KFP) is a
    Machine Learning Workflow Engine
    ● KFP は Kubernetes 上で動く
    container based workflow engine
    ● KFP は metadata store を持っていて
    各 step の input/output を保存する
    ● Python SDK を使って pipeline を DSL で
    書くことができる

    View full-size slide

  14. Confidential & Proprietary 2021
    Continuous Model Delivery with Spinnaker
    KFP connects Polyaxon and
    Spinnaker for CD
    ● KFP から Polyaxon Job を定期実行、
    新しい model を serve するための
    docker image を作成
    ● Spinnaker は新しい docker image が
    レジストリに作成されるとデプロイを自
    動的に実行
    Mercari engineering | Continuous delivery and automation pipelines in machine learning with Polyaxon and Kubeflow Pipelines
    https://medium.com/mercari-engineering/continuous-delivery-and-automation-pipelines-in-machine-learning-with-polyaxon-and-kubeflow-d6a3668715de

    View full-size slide

  15. Confidential & Proprietary 2021
    Agenda
    Machine Learning Development Lifecycle
    Model Exploration with Polyaxon
    Continuous Training with Kubeflow Pipelines
    What we built to accelerate ML project iterations
    0
    1
    2
    3

    View full-size slide

  16. Confidential & Proprietary 2021
    What we built to accelerate Iterations
    Monorepo for Kubeflow Pipelines
    Monorepo を使うことで pipeline の version を CI で管理したり、ベストプラクティスを共有可能に
    Manifests to manage projects on Polyaxon and KFP
    Yaml で KFP と Polyaxon のリソースを定義できるようにし Instrastructure as Code のように管理可能に
    A KFP component to submit a Polyaxon Job
    KFP component を利用して、簡単に KFP から Polyaxon Job を submit 可能に

    View full-size slide

  17. Confidential & Proprietary 2021
    Monorepo for Kubeflow Pipelines
    $ tree mercari-us-kubeflow-pipelines
    mercari-us-kubeflow-pipelines
    ├── components # directory for KFP components
    ├── docs # directory for documents
    ├── package
    │ └── merkfp # python package for lightweight KFP components
    ├── pipelines # directory for each project pipelines
    │ └── mercari-us-ml-price-suggestion
    │ └── train_model.py
    ├── projects # directory for “project” manifests
    │ └── mercari-us-ml-price-suggestion.yml
    └── scripts # directory for scripts on continuous integration
    KFP + Continuous Integration
    ● KFP Lightweight component や
    Secret 名などの定数を定義する
    python package を用意
    ● branch_name + commit_hash で
    pipeline version を管理
    ● 修正された pipeline のみを CI で
    compile して upload する

    View full-size slide

  18. Confidential & Proprietary 2021
    “Project” Manifest for KFP and Polyaxon
    CI creates resources like
    Infrastructure as code
    ● Yaml で KFP の Experiments や
    Polyaxon の Project を定義可能に
    ● Dev と Prod の一貫性を保つために
    Yaml を元に CI から作成
    ● GitHub Codeowners も生成
    ---
    kind: Project
    name: mercari-us-ml-price-suggestion
    experiments:
    - name: “Default”
    - name: “Sneakers”
    - name: “Trading Cards”
    owners:
    - github: "@kouzoh/mercari-price-suggest-us-prod"
    mercari-ml-price-suggestion-us.yml

    View full-size slide

  19. Confidential & Proprietary 2021
    Polyaxon Kubeflow Pipelines Component
    1
    2
    3
    4
    5
    Init container で secret を使って private repo を clone
    Main container で secret を使って Polyaxon に login
    Training job を Polyaxon API を使って submit
    Log を tail しながら Job が終わるのを待つ
    Project, Job ID, Status などを次のステップに output

    View full-size slide

  20. Confidential & Proprietary 2021
    Continuous Training With Polyaxon + KFP

    View full-size slide

  21. Confidential & Proprietary 2021
    Takeaways
    Polyaxon helps us to achieve a scalable and reproducible model exploration
    Monorepo + CI for KFP works to keep consistency and to spread best practices
    A custom KFP component for Polyaxon enables us to move forward seamlessly
    1
    2
    3

    View full-size slide

  22. Confidential & Proprietary 2021
    Thanks!

    View full-size slide