SageMaker One Step Forward Workshop

© 2021, Amazon Web Services, Inc. or its affiliates. All
rights reserved. AWS でもう一歩進める機械学習 Amazon SageMaker ハンズオン Yoshitaka Haribara, Ph.D. Startup Solutions Architect, AWS

Yoshitaka Haribara @_hariby Startup Solutions Architect Tokyo, Japan ソリューションアーキテクトとして日本のスタートアップに対する
AWS 導入支援、特に機械学習・量子コンピュータ担当をやっています。

Daisuke Sato @eccyan 複数企業のスタートアップを渡り歩き Startup SA としてジョイン。好きな筋トレはスクワットです！
Seigo Uchida @spesnova 大中小様々な企業を経て AWS にジョイン。ソリューションアーキテクトとして日本のスタートアップを支援。 Shingo Noguchi @nog スタートアップ、メガベンチャー等でサーバサイド開発や技術責任者等を担当。好きなおにぎりは筋子です。メンバー紹介

• 導入 (このスライドで話します) Amazon SageMaker のおさらいと最近のアップデート • ハンズオンワークショップ (SageMaker JumpStart)
Notebook1: Training and Hosting a PyTorch Model Notebook2: How to Use Spot Training • デモ SageMaker Debugger, SageMaker Experiments, SageMaker Feature Store MLOps • まとめ: Q&A とアンケート Agenda http://bit.ly/sage-615

Amazon SageMaker のおさらいと最近のアップデート

Amazon SageMaker overview PREPARE SageMaker Ground Truth Label training data
for machine learning SageMaker Data Wrangler NEW Aggregate and prepare data for machine learning SageMaker Processing Built-in Python, BYO R/Spark SageMaker Feature Store NEW Store, update, retrieve, and share features SageMaker Clarify NEW Detect bias and understand model predictions BUILD SageMaker Studio Notebooks Jupyter notebooks with elastic compute and sharing Built-in and Bring your-own Algorithms Dozens of optimized algorithms or bring your own Local Mode Test and prototype on your local machine SageMaker Autopilot Automatically create machine learning models with full visibility SageMaker JumpStart NEW Pre-built solutions for common use cases TRAIN & TUNE Managed Training Distributed infrastructure management SageMaker Experiments Capture, organize, and compare every step Automatic Model Tuning Hyperparameter optimization Distributed Training NEW Training for large datasets and models SageMaker Debugger NEW Debug and profile training runs Managed Spot Training Reduce training cost by 90% DEPLOY & MANAGE Managed Deployment Fully managed, ultra low latency, high throughput Kubernetes & Kubeflow Integration Simplify Kubernetes-based machine learning Multi-Model Endpoints Reduce cost by hosting multiple models per instance SageMaker Model Monitor Maintain accuracy of deployed models SageMaker Edge Manager NEW Manage and monitor models on edge devices SageMaker Pipelines NEW Workflow orchestration and automation Amazon SageMaker SageMaker Studio Integrated development environment (IDE) for ML

Amazon SageMaker Data Wrangler • 迅速かつ容易に機械学習のためのデータを準備 • SageMaker Studio IDE
の UI で利用可能 − S3, Athena, Redshift, SageMaker Feature Store など複数データソースから SageMaker にデータを直接インポート − 1クリックでデータ選択、クエリ、データ変換、可視化等を行う • 300以上の組み込み変換処理を利用してコード記述なしに処理

SageMaker Feature Store • 機械学習の学習・推論に必要な Feature を保存、更新、取得、共有等を可能にする専用リポジトリ • 格納された特徴データは、グループごとに整理され、メタ
データを使用しタグ付け • 特徴データの共有および再利用が複数チームで簡単に行えるため、開発コストを削減しながらイノベーションを加速 • 学習中の Feature とリアルタイムの推論のためFeature の統合ストアを提供。Feature の一貫性を保つために、追加コードを記述等は不要 • Amazon SageMaker, Amazon SageMaker Pipelines と統合して自動化された機械学習ワークフローを作成可能

JumpStart デモ

Amazon SageMaker 開発 Jupyter Notebook/Lab Amazon S3 The Jupyter Trademark
is registered with the U.S. Patent & Trademark Office.

Amazon SageMaker 開発 Jupyter Notebook/Lab Amazon S3 学習 Amazon EC2
P3 Instances Amazon ECR The Jupyter Trademark is registered with the U.S. Patent & Trademark Office. ビルド済みのコンテナイメージが予め用意されている

Amazon SageMaker 開発学習 Amazon EC2 P3 Instances Jupyter Notebook/Lab
Amazon S3 The Jupyter Trademark is registered with the U.S. Patent & Trademark Office. トレーニングでのメリット: • API 経由で学習⽤インスタンスを起動、学習が完了すると⾃動停⽌ • ⾼性能なインスタンスを秒課⾦で、簡単にコスト削減 • 指定した数のインスタンスを同時起動、分散学習も容易

Amazon SageMaker 開発学習 Amazon EC2 P3 Instances Jupyter Notebook/Lab
Amazon S3 The Jupyter Trademark is registered with the U.S. Patent & Trademark Office.

Amazon SageMaker 開発学習推論 Amazon EC2 P3 Instances Jupyter
Notebook/Lab Endpoint/ Batch transform Amazon S3 Amazon ECR The Jupyter Trademark is registered with the U.S. Patent & Trademark Office.

Amazon SageMaker 推論 Endpoint Amazon API Gateway AWS Lambda (AWS
SDK) User The Jupyter Trademark is registered with the U.S. Patent & Trademark Office.

© 2021, Amazon Web Services, Inc. or its Affiliates. SageMaker
Python SDK (v2) での呼び出し import sagemaker from sagemaker.pytorch import PyTorch # 各フレームワークに対応した Estimator クラス estimator = PyTorch("train.py", # トレーニングスクリプトなどを指定して初期化 role=sagemaker.get_execution_role(), instance_count=1, instance_type="ml.p3.2xlarge", framework_version="1.6.0", py_version="py3") estimator.fit("s3://mybucket/data/train") # fit でトレーニング predictor = estimator.deploy(initial_instance_count=2, # 2以上にすると Multi-AZ instance_type="ml.m5.xlarge") # deploy でエンドポイント作成

© 2021, Amazon Web Services, Inc. or its Affiliates. コード
(例: train.py) の書き換えは最低限 import argparse if __name__ == '__main__’: parser = argparse.ArgumentParser() # hyperparameters parser.add_argument('--epochs', type=int, default=10) # input data and model directories parser.add_argument('--train', type=str, default=os.environ['SM_CHANNEL_TRAIN']) parser.add_argument('--test', type=str, default=os.environ['SM_CHANNEL_TEST']) parser.add_argument('--model-dir', type=str, default=os.environ['SM_MODEL_DIR']) args, _ = parser.parse_known_args() … コンテナ内のパス (環境変数の中身): /opt/ml/input/data/train /opt/ml/input/data/test /opt/ml/model 環境変数から取得 Script Mode では普通の Python スクリプトとして実行される。はじめに環境変数からデータ・モデル入出力のパスを取得して、そこを読むように train.py を書く。推論用にモデルを読み込む。

© 2021, Amazon Web Services, Inc. or its Affiliates. Managed
Spot Training でトレーニングの料金を削減 • オンデマンドに比べて最大90%のコスト削減 • 中断が発生する可能性があるので checkpoints に途中経過を書き出し • 最大で待てる時間を指定呼び出し方: estimator = PyTorch("train.py", role=sagemaker.get_execution_role(), instance_count=1, instance_type="ml.p3.2xlarge", framework_version="1.6.0", py_version=”py3", use_spot_instances=True, max_run=1*24*60*60 max_wait=2*24*60*60, # max_run より長い時間を指定 checkpoint_s3_uri="s3://mybucket/checkpoints", checkpoint_local_path="/opt/ml/checkpoints/" ) estimator.fit("s3://mybucket/data/train") # fit でトレーニングは同様

Train • SageMaker Distributed Training − データ並列・モデル並列での分散学習が容易に • SageMaker Debugger
− 学習時のハードウェアリソース有効活用のため、プロファイリング機能を追加 • P4d インスタンス (NVIDIA A100 GPU) が東京リージョンでも利用可能に

深層学習向けアクセラレータ on AWS • NVIDIA GPU − 学習: A100 (P4d),
V100 32 GB (P3dn) / 16 GB (P3) − 推論: T4 (G4dn) • Intel − 学習: Habana Gaudi − 推論: (CPU instances) • AWS − 学習: AWS Trainium − 推論: AWS Inferentia (Inf1), AWS Graviton2 (C6g, etc.) Accelerator (Instance Family)

MLOps と推論

機械学習モデルのライフサイクルとプロジェクトの関係者 Data Quality Assurance Feature Engineering Model Monitoring Data Sourcing
Model Development Model Training & Evaluation Model Deployment & Inference Production Integration Data Engineers Data Scientists ML Engineers AWS Accounts, Controls, Dev environments, and MLOps stacks (DevOps tools, artefacts repos, ML logs insights) SysOps ML Workflow Automation - Model Management - Continuous Delivery

31 © 2021 Amazon Web Services, Inc. or its affiliates.
All rights reserved | Amazon SageMaker Pipelines 概要 Amazon SageMaker Pipelines フルマネージドな機械学習ワークフローを構築 Model registry モデルバージョン、メトリクス、承認、モデルデプロイのカタログ化 Real-time inference Batch scoring Input data Model drift Prepare or transform Explain Train Validate CI/CD とモデル系列追跡で ML Ops の自動化

All rights reserved | How Amazon SageMaker Pipelines works パイプライン実行の開始: • 手動 • データアップロード時の CloudWatch event • コード check-in (git push) Acceptable accuracy Non-acceptable accuracy Get input data Process data Train model Validation Deploy model Alert and stop

AWS Step Functions workflow その他、機械学習パイプラインの構築例 Test data Train data Data
Scientists/ Developers Git webhook docker push Amazon SageMaker Processing Amazon S3 (data) Amazon SageMaker Training Job / HPO AWS CodeCommit or 3rd party Git repo Amazon S3 (raw data) Amazon Elastic Container Registry (ECR) AWS CodeBuild Endpoint Amazon SageMaker Batch Transform / Endpoint deploy Amazon S3 (trained model) git push AWS CodePipeline

AWS のワークフロー管理ツール • サーバーレスオーケストレーションサービス • 分散アプリケーション・マイクロサービスの全体を「ステートマシン」と呼ばれる仕組みでオーケストレート •
定義したステートマシンは AWS コンソールから「ワークフロー」という形式で可視化 • ステートマシンの各ステップの実⾏履歴をログから追跡できる • Apache Airflow によるワークフローを構築可能なマネージドサービス • ETLジョブやデータパイプラインを実⾏するワークフローをマネージド型で実⾏可能。開発者がビジネス上の課題解決に注⼒できるようにする • Airflowのメトリクスを CloudWatch メトリクスとして扱い、ログを CloudWatch Logs に転送可能 Amazon SageMaker Pipelines Amazon Managed Workflows for Apache Airflow (MWAA) AWS Step Functions w/Data Science SDK (Python) • 機械学習の CI/CD を実現する Amazon SageMaker の機能 • 機械学習ワークフローのデータロードや学習処理などの⼀連の処理ステップを任意のタイミングや所定の時間に実⾏できる • 各ステップの処理結果は SageMaker Experiments で記録され、モデルの出来映えや学習パラメータなどを視覚化できる

ご案内 • AWS Startup ブログ − 他のスタートアップは AWS 使ってどんな感じで機械学習やってるの？と聞かれるのでSageMaker
と Personalize の事例まとめブログを書きました § https://aws.amazon.com/jp/blogs/startup/tech-case-study-jp-startup-ai-ml/ • JAWS-UG AI/ML 支部 − ユーザーグループが復活しました。スタートアップのお客様も中心メンバーにいます § https://jawsug-ai.connpass.com/

SageMaker One Step Forward Workshop

SageMaker One Step Forward Workshop

Yoshitaka Haribara

More Decks by Yoshitaka Haribara

Other Decks in Technology

Featured

Transcript

© 2021, Amazon Web Services, Inc. or its affiliates. All

Yoshitaka Haribara @_hariby Startup Solutions Architect Tokyo, Japan ソリューションアーキテクトとして日本のスタートアップに対する

Daisuke Sato @eccyan 複数企業のスタートアップを渡り歩き Startup SA としてジョイン。好きな筋トレはスクワットです！

• 導入 (このスライドで話します) Amazon SageMaker のおさらいと最近のアップデート • ハンズオンワークショップ (SageMaker JumpStart)

Amazon SageMaker のおさらいと最近のアップデート

Amazon SageMaker overview PREPARE SageMaker Ground Truth Label training data

Amazon SageMaker Data Wrangler • 迅速かつ容易に機械学習のためのデータを準備 • SageMaker Studio IDE

SageMaker Feature Store • 機械学習の学習・推論に必要な Feature を保存、更新、取得、共有等を可能にする専用リポジトリ • 格納された特徴データは、グループごとに整理され、メタ

JumpStart デモ

Amazon SageMaker 開発 Jupyter Notebook/Lab Amazon S3 The Jupyter Trademark

Amazon SageMaker 開発 Jupyter Notebook/Lab Amazon S3 学習 Amazon EC2

Amazon SageMaker 開発学習 Amazon EC2 P3 Instances Jupyter Notebook/Lab

Amazon SageMaker 開発学習 Amazon EC2 P3 Instances Jupyter Notebook/Lab

Amazon SageMaker 開発学習推論 Amazon EC2 P3 Instances Jupyter

Amazon SageMaker 推論 Endpoint Amazon API Gateway AWS Lambda (AWS

© 2021, Amazon Web Services, Inc. or its Affiliates. SageMaker

© 2021, Amazon Web Services, Inc. or its Affiliates. コード

© 2021, Amazon Web Services, Inc. or its Affiliates. Managed

Train • SageMaker Distributed Training − データ並列・モデル並列での分散学習が容易に • SageMaker Debugger

深層学習向けアクセラレータ on AWS • NVIDIA GPU − 学習: A100 (P4d),

MLOps と推論

機械学習モデルのライフサイクルとプロジェクトの関係者 Data Quality Assurance Feature Engineering Model Monitoring Data Sourcing

31 © 2021 Amazon Web Services, Inc. or its affiliates.

32 © 2021 Amazon Web Services, Inc. or its affiliates.

33 © 2021 Amazon Web Services, Inc. or its affiliates.

34 © 2021 Amazon Web Services, Inc. or its affiliates.

AWS Step Functions workflow その他、機械学習パイプラインの構築例 Test data Train data Data

AWS のワークフロー管理ツール • サーバーレスオーケストレーションサービス • 分散アプリケーション・マイクロサービスの全体を「ステートマシン」と呼ばれる仕組みでオーケストレート •

ご案内 • AWS Startup ブログ − 他のスタートアップは AWS 使ってどんな感じで機械学習やってるの？と聞かれるのでSageMaker

Thank you © 2021, Amazon Web Services, Inc. or its