CyberAgent AI事業本部2024年度MLOps研修基礎編 / MLOps Basic

©2024 CyberAgent Inc. Distribution prohibited 自己紹介 3 Dynalyst Data Scientist/Manager
長江五月業務 MLモデリングデータ分析 MLOps 保有AWS認定証 @nsakki55

©2024 CyberAgent Inc. Distribution prohibited アジェンダ 4 1. MLOpsとは 2.
MLOpsの要素 3. MLOpsの段階

©2024 CyberAgent Inc. Distribution prohibited 本発表の結論 5 1. MLOpsとは DevOps
for ML 2. MLOpsの要素実験基盤・学習パイプライン・推論サービス・バージョン管理・CI/CD・継続的学習・監視 3. MLOpsの段階 MLOps成熟度モデル・ML Test Score

©2024 CyberAgent Inc. Distribution prohibited MLOpsとは 7 Machine Learning Operations
機械学習運用をする技術 ×

©2024 CyberAgent Inc. Distribution prohibited MLOpsの歴史 10 DevOpsの登場 2008年 2012年
第3次AIブーム到来 MLOpsの象徴論文公開 MLOpsの登場生成AIブーム到来 2015年 2018年 2022年

©2024 CyberAgent Inc. Distribution prohibited 関連ワードのGoogle Trends 11 機械学習・DevOps・MLOpsのGoogle Trends
Google Trendsの変化と重要イベントの関係を見ます機械学習 DevOps MLOps

©2024 CyberAgent Inc. Distribution prohibited DevOpsの登場 12 機械学習 DevOps MLOps
2008年・DevOpsの登場

©2024 CyberAgent Inc. Distribution prohibited DevOpsとは 13 DEV OPS 開発(Development)と運用(Operations)
を合わせたソフトウェア開発手法以下の手法を組み合わせる・CI/CD ・マイクロサービス・IaC ・モニタリングとロギング・コミュニケーションとコラボレーション参考: DevOps （デブオプス）とは? - DevOps と AWS | AWS

©2024 CyberAgent Inc. Distribution prohibited 第3次AIブームの到来 14 機械学習 DevOps MLOps
2012年・AlexNet登場・第3次AIブームの到来

©2024 CyberAgent Inc. Distribution prohibited AlexNet: ディープラーニングの火付け役 15 ・8層CNN(Convolutional Nerural
Network)のImageNetの画像分類モデル・画像認識コンペILSVRCで性能を大きく向上させた・2000年代からのAIブームの火付け役と呼ばれることが多い参考: ImageNet Classiﬁcation with Deep Convolutional Neural Networks

©2024 CyberAgent Inc. Distribution prohibited MLOpsの象徴論文 16 機械学習 DevOps MLOps
2015年「機械学習システムの隠れた技術的負債」論文の公開

©2024 CyberAgent Inc. Distribution prohibited 機械学習システムの隠れた技術的負債 17 ・NeurlPS2015で採択された論文・ML特有のリスク要素を紹介参考:
Hidden Technical Debt in Machine Learning Systems

©2024 CyberAgent Inc. Distribution prohibited 機械学習システムの隠れた技術的負債 18 引用: Hidden Technical
Debt in Machine Learning Systems. Figure 1. ・MLシステムの中でMLコードはごく一部でしかない

©2024 CyberAgent Inc. Distribution prohibited MLOpsという言葉が登場 19 機械学習 DevOps MLOps
2018年 Google Cloud Nextで初めて MLOpsという言葉が使われる

©2024 CyberAgent Inc. Distribution prohibited MLOpsという言葉が登場 20 ・What is ML
Ops? Best Practices for DevOps for ML (Cloud Next '18) でMLOpsという言葉が初めて使われる・MLのためのDevOpsと表現される引用: What is ML Ops? Best Practices for DevOps for ML (Cloud Next '18)のYouTube動画サムネイル

©2024 CyberAgent Inc. Distribution prohibited 生成AIブームの到来 21 機械学習 DevOps MLOps
2022年・ChatGPTがリリース・生成AIブームの到来

©2024 CyberAgent Inc. Distribution prohibited LLMのOpsが注目 22 ・LLMを活用するための技術が注目を集めている・LLMOpsという言葉も誕生引用:
MLOps Community JP. LLM(GPT, PaLM等) with MLOps LT大会!!!

©2024 CyberAgent Inc. Distribution prohibited MLOpsの歴史(再掲) 23 DevOpsの登場 2008年 2012年
第3次AIブーム到来 MLOpsの象徴論文公開 MLOpsの登場生成AIブーム到来 2015年 2018年 2022年

©2024 CyberAgent Inc. Distribution prohibited DevOpsとMLOpsの関係 24 DevOps for ML
歴史的にはがMLOpsの立ち位置

©2024 CyberAgent Inc. Distribution prohibited 引用: MLOps: 機械学習における継続的デリバリーと自動化のパイプライン 3大クラウドベンダーのMLOpsの説明 25
MLOpsはMLシステム開発(Dev)とMLシステムオペレーション(Ops) の統合を目的とするMLエンジニアリングの文化と手法です。 Google Microsoft MLOpsはワークフローの効率を高める DevOpsの原則と実践に基づいています。引用: MLOps:Azure Machine Learning を使用したモデル管理、デプロイ、および監視 Amazon MLOpsはMLアプリケーション開発(Dev)とMLシステムのデプロイおよび運用(Ops)を統合するMLカルチャーとプラクティスです。引用: MLOps とは何ですか? - 機械学習オペレーションの説明 -AWS

©2024 CyberAgent Inc. Distribution prohibited MLOpsの定義 26 ・MLOpsとは何？のリサーチクエスチョンに答えた論文機械学習・ソフトウェアエンジニアリング(特にDevOps)・データエンジニアリングの3つの分野を活用するエンジニアリングプラクティス
参考: Machine Learning Operations (MLOps): Overview, Deﬁnition, and Architecture

©2024 CyberAgent Inc. Distribution prohibited MLシステムの障害の原因 27 引用: How ML
Breaks: A Decade of Outages for One Large ML Pipeline Googleの15年間で起きた機械学習パイプラインの障害原因の調査結果 96件中60件が機械学習と関係ない原因によって発生

©2024 CyberAgent Inc. Distribution prohibited MLOpsの要素 33 Figure 4. End-to-end
MLOps architecture and workﬂow with functional components and roles MLOpsを体系的に調査した論文 Machine Learning Operations (MLOps): Overview, Deﬁnition, and Architecture

©2024 CyberAgent Inc. Distribution prohibited MLOpsの要素 34 実験基盤 CI/CD 推論サービス
監視学習パイプライン継続的学習バージョン管理実験基盤学習パイプライン推論サービスバージョン管理 CI/CD 継続的学習監視

©2024 CyberAgent Inc. Distribution prohibited MLOpsの要素 35 実験基盤実験基盤学習パイプライン
推論サービスバージョン管理 CI/CD 継続的学習監視

©2024 CyberAgent Inc. Distribution prohibited MLの研究で起こりうる悲劇 36 大学院生降下法 ※ ※
大学院生がモデルが動作するまでハイパーパラメータをいじくり回す手法参考: How do people come up with all these crazy deep learning architectures? 先人が残したコードモデル改善コードは秘伝のタレ化

©2024 CyberAgent Inc. Distribution prohibited 開発環境の標準化 38 ・開発効率化ツールの活用・共通のpythonパッケージ管理・実行環境のコンテナ化
・実験用インフラストラクチャの提供・外部データ・サービスとの統合機能提供

©2024 CyberAgent Inc. Distribution prohibited 実験管理 39 ・損失曲線・モデルのパフォーマンス指標・推論レイテンシー
・システムパフォーマンス・ハイパーパラメーターモデルの学習実験ごと以下の値を管理する引用:https://www.wandb.jp/

©2024 CyberAgent Inc. Distribution prohibited NotebookからScriptへ 41 notebookのMLコードの記述は DS界隈でポピュラーな方法引用:
https://www.kaggle.com/ 再利用・バージョン管理のためプロダクションコードはscript で記述

©2024 CyberAgent Inc. Distribution prohibited 学習パイプライン 42 データ取得データ前処理・検証モデル学習
モデル検証モデル保存モデル登録

©2024 CyberAgent Inc. Distribution prohibited 推論パターン 46 アプリケーションモデル推論エンドポイント
オンライン推論 request 推論リクエストの度に予測値を計算

©2024 CyberAgent Inc. Distribution prohibited 推論パターン 47 アプリケーションモデルバッチ推論
予測値保管 Query 事前に予測値をバッチ計算

©2024 CyberAgent Inc. Distribution prohibited 推論パターン 48 リクエストごと定期的データ生成と同時
に予測が必要すぐに結果が必要でない場合低レイテンシー高スループット推論間隔ユースケース最適化オンライン推論バッチ推論

©2024 CyberAgent Inc. Distribution prohibited カナリアデプロイ 51 旧モデル新モデル新・旧モデルでトラフィック分割
Load Balancer 90% 10%

©2024 CyberAgent Inc. Distribution prohibited シャドウデプロイ 52 引用: Shadow variants
- Amazon SageMaker ・旧モデルと一緒に新モデルで推論・推論結果を検証

©2024 CyberAgent Inc. Distribution prohibited バージョン管理対象 54 ・コード・データ (Feature
Store) ・モデル (Model Store) 機械学習システム特有

©2024 CyberAgent Inc. Distribution prohibited Feature Store 55 引用: What
is a Feature Store? Components of a Feature Store オンライン・バッチ提供特徴量を一元管理

©2024 CyberAgent Inc. Distribution prohibited Feature Storeのメリット 56 特徴の管理特徴の計算
特徴の一貫性チームが特徴を再利用・共有できる特徴量エンジニアリングの定義を管理学習・推論時に同じデータを使用

©2024 CyberAgent Inc. Distribution prohibited Model Store (Model Registry) 57
Code Data Meta Data Conﬁg Env Artifacts Model 推論サービス監視実験 Model Store

©2024 CyberAgent Inc. Distribution prohibited Model Storeのメリット 58 再現性追跡可能性
モデルの状態を再現できるモデルに関わる生成物が分かる

©2024 CyberAgent Inc. Distribution prohibited CI/CD 60 引用: Practitioners Guide
to Machine Learning Operations (MLOps). Figure 9. A complex CI/CD system for the model deployment process

©2024 CyberAgent Inc. Distribution prohibited CI 61 テストコードの静的解析基本的なユニットテスト
Linterによるコード整形パッケージ化 MLコードを提供する形にまとめる通常のソフトウェアシステムと同じように扱う

©2024 CyberAgent Inc. Distribution prohibited CD 62 受け入れテスト段階的デプロイモデルパフォーマンスに基づく判定
徐々に新モデルを適応問題のあるモデルが本番環境に入るのを未然に防ぐ

©2024 CyberAgent Inc. Distribution prohibited CI/CD/CT 64 CT (Continuous Training)
DevOpsのCI/CDになぞらえと呼ばれること多い

©2024 CyberAgent Inc. Distribution prohibited 既存のソフトウェアの問題 65 ソフトウェアの腐敗 (Software rot)
・ソフトウェアの品質が時間が経つと低下する現象・ソフトウェアが外的環境の変化に追従できず　更新されないと起きる現象

©2024 CyberAgent Inc. Distribution prohibited データ鮮度の価値 67 Meta社(旧 Facebook)がCTR予測モデルの学習を 7日→1日に変更後、モデルの損失を1%削減できた※
※ Practical Lessons from Predicting Clicks on Ads at Facebook ユーザー嗜好の移り変わりが早いインターネット広告ではデータ鮮度が価値を生む可能性がある

©2024 CyberAgent Inc. Distribution prohibited 学習実行トリガー 68 スケジュール手動実行 cronなどで指定した時刻に実行
任意のタイミングで実行イベント駆動データの更新・モデル精度悪化を検知し実行

©2024 CyberAgent Inc. Distribution prohibited MLシステム特有の監視指標 71 大分類小分類項目
データデータ品質データの欠損・型チェックデータドリフト連続・カテゴリ特徴のデータ距離指標外れ値監視大きなデータドリフト検知モデルモデルドリフト過去と現在の予測値のデータ分布距離モデル設定学習時のメタデータ予測モデル評価指標本番環境の予測値に対する評価指標予測ドリフト予測値の分布変化引用: A Comprehensive Guide on How to Monitor Your Models in Production

©2024 CyberAgent Inc. Distribution prohibited MLOpsの要素 74 実験基盤 CI/CD 推論サービス
監視学習パイプライン継続的学習バージョン管理実験基盤学習パイプライン推論サービスバージョン管理 CI/CD 継続的学習監視

©2024 CyberAgent Inc. Distribution prohibited DevOps成熟度モデル 77 引用: DevOps maturity
model: types, steps, and evaluation metrics 組織の DevOpsのレベルを5段階で表す

©2024 CyberAgent Inc. Distribution prohibited MLOps成熟度モデル 78 引用: Amazon SageMakerを利用したエンタープライズのためのMLOps基盤ロードマップ
MLOpsのフェーズを段階的に表す

©2024 CyberAgent Inc. Distribution prohibited 3大クラウドベンダーのMLOps成熟度モデル 79 Amazon Google Microsoft
初期フェーズ Level 0 手動プロセス Level 0 MLOps なし反復可能フェーズ Level 1 MLパイプライン自動化 Level 1 DevOps はあるが MLOps なし信頼可能フェーズ Level 2 CI/CDパイプライン自動化 Level 2 学習自動化スケーラブルフェーズ Level 3 モデルデプロイ自動化 Level 4 フルMLOps自動化参考: 3大クラウド各社の MLOps 成熟度モデルの比較

©2024 CyberAgent Inc. Distribution prohibited GoogleのMLOps成熟度モデル 80 レベル 0: 手動プロセス
レベル 1: MLパイプラインの自動化レベル 2: CI/CDパイプラインの自動化

©2024 CyberAgent Inc. Distribution prohibited レベル 0: 手動プロセス 81 引用:
MLOps: 機械学習における継続的デリバリーと自動化のパイプライン,MLOps レベル 0: 手動プロセス・各作業を手動で実施・低頻度のリリース・CI/CDなし・デプロイは予測　サービスのみ・監視なし

©2024 CyberAgent Inc. Distribution prohibited レベル 1: MLパイプラインの自動化 82 引用:
MLOps: 機械学習における継続的デリバリーと自動化のパイプライン,MLOps レベル 1: ML パイプラインの自動化・統合テスト・継続的学習・開発と本番環境の実装同質化・コードのコンポーネント化・データとモデルの評価・Feature Store ・Model Store

©2024 CyberAgent Inc. Distribution prohibited レベル 2: CI/CDパイプラインの自動化 83 引用:
MLOps: 機械学習における継続的デリバリーと自動化のパイプライン,MLOps レベル 2: CI / CD パイプラインの自動化・新モデル検証環境・パイプラインのCI/CD ・CTの自動化・モデルのCD ・監視

©2024 CyberAgent Inc. Distribution prohibited MLOpsの定量化 84 引用: Figure 1.
ML Systems Require Extensive Testing and Monitoring. The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction 既存のソフトウェアシステムになかった MLシステムがテスト・監視する項目を取り上げた論文

©2024 CyberAgent Inc. Distribution prohibited ML Test Score 85 引用:
Figure 4. Average scores for interviewed teams. MLシステムを4つの観点でスコア化 7問×4軸の28問・特徴量 (0~7点) ・モデル (0~7点) ・インフラ (0~7点) ・モニタリング (0~7点) ・結果をドキュメント化し、テストを手動実行してる: 0.5ポイント・テストを自動的に繰り返し実行できるシステムがある: 1.0ポイント・4軸の各合計(0~7点)の最小値が ML Test Score

©2024 CyberAgent Inc. Distribution prohibited ML Test Score 86 ポイント
解釈 0 プロダクションシステムというよりは研究プロジェクト (0, 1] 総合的にテストはされていないが，可能な限り信頼性向上に努めている (1, 2] 基礎的なプロジェクトの要求事項は通過した．しかし，信頼性向上のためのさらなる投資が必要とされる (2, 3] 適切なテストがされている，だが更に自動化の余地が残っている (3, 5] 信頼性の高い自動化されたテストとモニタリングレベル．ミッションクリティカルな状況でも問題はない > 5 卓越したレベルの機械学習システム引用: [抄訳] What’s your ML test score? A rubric for ML production systems

©2024 CyberAgent Inc. Distribution prohibited まとめ 89 1. MLOpsとは DevOps
for ML 2. MLOpsの要素実験基盤・学習パイプライン・推論サービス・バージョン管理・CI/CD・継続的学習・監視 3. MLOpsの段階 MLOps成熟度モデル・ML Test Score

CyberAgent AI事業本部2024年度MLOps研修基礎編 / MLOps Basic

CyberAgent AI事業本部2024年度MLOps研修基礎編 / MLOps Basic

More Decks by Satsuki Nagae

Other Decks in Technology

Featured

Transcript