model development Google Cloud を活用した大手エンタメ企業様のデータ分析基盤構築事例 https://youtu.be/BTYO0-avsXI Beyond Interactive: Notebook Innovation at Netflix https://netflixtechblog.com/notebook-innovation-591ee3221233
without code Not required Version control High availability Google Cloud を活用した大手エンタメ企業様のデータ分析基盤構築事例 https://youtu.be/BTYO0-avsXI Beyond Interactive: Notebook Innovation at Netflix https://netflixtechblog.com/notebook-innovation-591ee3221233
and automation pipelines in machine learning https://cloud.google.com/architecture/mlops-continuous-delivery-and- automation-pipelines-in-machine-learning
Accelerators) Massive amount of storage access (IOPS, Network bandwidth) Visualization Version control (code, data, model, and lineage between them) Not required High availability
Moreover, we have to consider dev/staging/prod. Or, Training/Serving Skew. Caused by the difference between environments. Why We Need DevOps for ML Data https://www.tecton.ai/blog/devops-ml-data/
machine learning (ML) workflows on Kubernetes simple, portable and scalable." At the start point, it was an open-source implementation of the Google internal ML platform (TFX). Now, Kubeflow has no restrictions on libraries and cloud services.
to use Kubernetes or Kubeflow as an ML platform. Both Kubernetes and Kubeflow requires huge amount of effort. Several company tried to use Kubeflow and decided to use managed ML platform.
learning (ML) models faster, with fully managed ML tools for any use case. SageMaker: Build, train, and deploy machine learning (ML) models for any use case with fully managed infrastructure, tools, and workflows. Origin of these services are their internal ML platform (Google & Amazon).
fast deliver CADDi AI Labにおけるマネ ージドなMLOps OpenSearchで実現する画 像検索とテスト追加で目指 す安定運用 CADDi AI LabにおけるマネージドなMLOps https://speakerdeck.com/vaaaaanquish/caddi-ai- labniokerumanezidonamlops
use Cannot update Kubeflow (delete & create) Fine grained log costs too high (with Prometheus) Too expensive to keep watching Kubeflow & Kubernetes Use Vertex AI to avoid managing Kubernetes & Kubeflow
too expensive Tons of YAMLs and customizations Hard to scale in the team Use Vertex AI to avoid hosting Kubeflow by themselves KubeflowによるMLOps基盤構築から得られた知見と課題 https://techblog.zozo.com/entry/mlops-platform-kubeflow
reason, they have extreme on-prem clusters Excellence in managing bare metal servers and Kubernetes Lupus - A Monitoring System for Accelerating MLOps https://speakerdeck.com/line_devday2021/lupus-a-monitoring-system-for- accelerating-mlops
they have extreme on-prem clusters Excellence in managing bare metal servers and Kubernetes Huge amount of investment in Kubernetes 継続的なモデルモニタリングを実現するKubernetes Operator https://www.slideshare.net/techblogyahoo/kubernetes-operator-251612755
machine learning (ML researchers) They need bare metal server to; 1. use GPUs and CPUs as much as possible 2. create their chip (accelerator) and test on their servers 継続的なモデルモニタリングを実現するKubernetes Operator https://www.slideshare.net/techblogyahoo/kubernetes-operator-251612755
in managing bare metal servers and Kubernetes Kubernetesによる機械学習基盤、楽天での活用事例 覃子麟 (チンツーリン) /楽天 株式会社 https://www.slideshare.net/rakutentech/kubernetes-144707493? from_action=save 楽天の規模とクラウドプラットフォーム統括部の役割 https://www.slideshare.net/rakutentech/ss-253221883
training/serving skew Be careful to use Kubernetes & Kubeflow as an ML platform The minimum requirement to use Kubernetes as an ML platform is the capability to customize Kubernetes to fit your use cases Consider hybrid approach: managed service for training & inference service on Kubernetes