JAWS-UG AI ML #11 SageMaker Updates

© 2021, Amazon Web Services, Inc. or its aﬃliates. All
rights reserved. Amazon SageMaker アップデート JAWS-UG AI/ML #11 Yoshitaka Haribara, Ph.D. Startup Solutions Architect, AWS

Yoshitaka Haribara, Ph. D. Startup Solutions Architect Tokyo, Japan 日本でスタートアップ担当の
ソリューションアーキテクトとして機械学習基盤の設計・構築の相談を受けています趣味はドラム 🥁 DAW ソフトウェア (e.g. GarageBand, Logic Pro, Pro Tools) にも興味あり好きな AWS サービスは Amazon SageMaker, Amazon Braket

8月号 AWS テクノロジー講座第4回機械学習の導入やサービス選定にあたっての考え方 9月号 AWS テクノロジー講座第5回
継続的にモデルを改善し続けるための機械学習基盤 10月号 AWS テクノロジー講座最終回機械学習のパフォーマンス向上のための技術 Software Design に記事を書いたりしました「スタートアップのための AWS テクノロジー講座」 (2020年)

JAWS-UG AI/ML 支部㊗リブート 🎉

1. AWS AI/ML サービスおさらい 2. AWS re:Invent 2020 以降で発表されたアップデート 3.
SageMaker Pipelines について Agenda

VISION SPEECH TEXT SEARCH CHATBOTS PERSONALIZATION FORECASTING FRAUD CONTACT CENTERS
Deep Learning AMIs & Containers GPUs & CPUs Elastic Inference Trainium Inferentia FPGA DeepGraphLibrary Amazon Rekognition Amazon Polly Amazon Transcribe +Medical Amazon Lex Amazon Personalize Amazon Forecast Amazon Comprehend +Medical Amazon Textract Amazon Kendra Amazon CodeGuru Amazon Fraud Detector Amazon Translate INDUSTRIAL AI CODE AND DEVOPS NEW Amazon DevOps Guru Voice ID For Amazon Connect Contact Lens NEW Amazon Monitron NEW AWS Panorama + Appliance NEW Amazon Lookout for Vision NEW Amazon Lookout for Equipment AWS の機械学習スタック NEW Amazon HealthLake HEALTH AI NEW Amazon Lookout for Metrics ANOMALY DETECTION Amazon Transcribe Medical Amazon Comprehend Medical Amazon SageMaker Label data NEW Aggregate & prepare data NEW Store & share features Auto ML Spark/R NEW Detect bias Visualize in notebooks Pick algorithm Train models Tune parameters NEW Debug & profile Deploy in production Manage & monitor NEW CI/CD Human review NEW: Model management for edge devices NEW: SageMaker JumpStart SAGEMAKER STUDIO IDE AI サービス: 機械学習の深い知識なしに利⽤可能 ML サービス: 機械学習のプロセス全体を効率化するマネージドサービス ML フレームワークとインフラストラクチャ: 機械学習の環境を⾃在に構築して利⽤

Amazon SageMaker overview PREPARE SageMaker Ground Truth Label training data
for machine learning SageMaker Data Wrangler NEW Aggregate and prepare data for machine learning SageMaker Processing Built-in Python, BYO R/Spark SageMaker Feature Store NEW Store, update, retrieve, and share features SageMaker Clarify NEW Detect bias and understand model predic:ons BUILD SageMaker Studio Notebooks Jupyter notebooks with elastic compute and sharing Built-in and Bring your-own Algorithms Dozens of optimized algorithms or bring your own Local Mode Test and prototype on your local machine SageMaker Autopilot Automatically create machine learning models with full visibility SageMaker JumpStart NEW Pre-built solutions for common use cases TRAIN & TUNE Managed Training Distributed infrastructure management SageMaker Experiments Capture, organize, and compare every step Automatic Model Tuning Hyperparameter op:miza:on Distributed Training NEW Training for large datasets and models SageMaker Debugger NEW Debug and proﬁle training runs Managed Spot Training Reduce training cost by 90% DEPLOY & MANAGE Managed Deployment Fully managed, ultra low latency, high throughput Kubernetes & Kubeflow Integration Simplify Kubernetes-based machine learning Multi-Model Endpoints Reduce cost by hosting multiple models per instance SageMaker Model Monitor Maintain accuracy of deployed models SageMaker Edge Manager NEW Manage and monitor models on edge devices SageMaker Pipelines NEW Workflow orchestration and automation Amazon SageMaker SageMaker Studio Integrated development environment (IDE) for ML

全部は紹介しきれないので AWS Summit Online 動画を https://www.youtube.com/watch?v=x28_DF5polM

更に詳しくは AWS re:Invent アップデートまとめブログ https://aws.amazon.com/jp/blogs/news/reinvent-recap-ai-ml-20/

AWS to offer NVIDIA A100 Tensor Core GPU-based Amazon EC2
instances https://aws.amazon.com/blogs/machine- learning/aws-to-offer-nvidia-a100-tensor- core-gpu-based-amazon-ec2-instances/

Amazon EC2 P4d インスタンス NVIDIA A100 Tensor Core GPU を搭載した
P4d インスタンス • p4d.24xlarge (A100 x 8枚搭載) の 1サイズのみの提供 (表参照) • GPU間は 600 GB/s の NVSwitch/NVLink で接続 • インスタンスあたり 400 Gbps の EFA 対応の⾼速なネットワークインターフェース • 1 TBのNVMe SSD を8枚搭載しており、RAID0 構成時、最⼤ 16 GB/s のスループット • Multi-Instance GPU (MIG) にも対応 https://aws.amazon.com/jp/ec2/instance-types/p4/ * p3dn.24xlarge: 31.212 USD/h

P4d のパフォーマンス様々な深層学習モデルのトレーニングにおいて、P3dn よりも2倍以上⾼速 Throughput Improvement DNN P3dn FP32 (imgs/sec)
P3dn FP16 (imgs/sec) P4d TF32 (imgs/sec) P4d FP16 (imgs/sec) P4d over p3dn TF32/FP32 P4d over P3dn FP16 Resnet50 3057 7413 6841 15621 2.2x 2.1x Resnet152 1145 2644 2823 5700 2.5x 2.2x Inception3 2010 4969 4808 10433 2.4x 2.1x Inception4 847 1778 2025 3811 2.4x 2.1x VGG16 1202 2092 4532 7240 3.8x 3.5x Alexnet 32198 50708 82192 133068 2.6x 2.6x SSD300 1554 2918 3467 6016 2.2x 2.1x https://aws.amazon.com/jp/blogs/compute/amazon-ec2-p4d-instances-deep-dive/ https://github.com/aws-samples/deep-learning-models

Habana Gaudi-based Amazon EC2 深層学習モデルのトレーニング⽤に特別に設計された、 Habana Labs の Gaudi アクセラレータを搭載した
EC2 インスタンス • 8カードの Gaudi アクセラレーターでの深層学習トレーニングにより、現在の GPU ベースの EC 2インスタンスより最⼤40％優れたコストパフォーマンス • TensorFlow, PyTorch などをサポート。⾃然⾔語処理、物体検出・分類、リコメンドやパーソナライズなど、深層学習のトレーニングワークロードに最適 • Amazon EC2 に加え、Amazon EKS/ECS, Amazon SageMaker が対応予定 Coming in 2021! https://habana.ai/wp- content/uploads/pdf/2020/Habana%20Gaudi%20customer%20enableme nt%20on%20AWS%20December%202020.pdf

AWS Trainium AWS により設計された⾼性能な機械学習トレーニングチップ • クラウドで ML モデルをトレーニングするための最⾼のコストパフォーマンスを提供 •
AWS Inferentia 同様 Neuron SDK を利⽤し、TensorFlow, MXNet, PyTorch といったフレームワークをサポート • Trainium チップは、画像分類、セマンティック検索、翻訳、⾳声認識、⾃然⾔語処理、レコメンデーションエンジンなど、アプリケーションのディープラーニングトレーニングワークロード向けに特別に最適化 • Amazon EC2 インスタンスに加え、AWS Deep Learning AMI, Amazon SageMaker, Amazon ECS, EKS, AWS Batch などのマネージドサービスを介して利⽤可能 Coming in 2021!

© 2021, Amazon Web Services, Inc. or its Affiliates. SageMaker
Python SDK (v2) https://sagemaker.readthedocs.io/en/stable/v2.html import sagemaker from sagemaker.pytorch import PyTorch # 各フレームワークに対応した Estimator クラス estimator = PyTorch("train.py", # トレーニングスクリプトなどを指定して初期化 role=sagemaker.get_execution_role(), instance_count=1, instance_type="ml.p3.2xlarge", framework_version="1.6.0", py_version="py3") estimator.fit("s3://mybucket/data/train") # fit でトレーニング predictor = estimator.deploy(initial_instance_count=2, # 2以上にすると Multi-AZ instance_type="ml.m5.xlarge") # deploy でエンドポイント作成少し命名規則が変わっています

SageMaker Pipelines

All rights reserved | Amazon SageMaker Pipelines 利用のメリット数行書くだけで、自動化された機械学習ワークフローを構築数ヶ月かかるコーディング時間を数時間に削減機械学習の開発を加速モデル成果物を自動でトラッキングし手動管理の手間を削減モデル成果物を自動的にトラッキングビルトインのテンプレートで CI/CD パイプラインを設定し機械学習モデルをスケーラブルにデプロイ本番環境における数千ものモデルにスケール

All rights reserved | Amazon SageMaker Pipelines 概要 Amazon SageMaker Pipelines フルマネージドな機械学習ワークフローを構築 Model registry モデルバージョン、メトリクス、承認、モデルデプロイのカタログ化 Real-time inference Batch scoring Input data Model drift Prepare or transform Explain Train Validate CI/CD とモデル系列追跡で ML Ops の自動化

All rights reserved | How Amazon SageMaker Pipelines works パイプライン実行の開始: • 手動 • データアップロード時の CloudWatch イベント • コード check-in (git push) Acceptable accuracy Non-acceptable accuracy Get input data Process data Train model Validation Deploy model Alert and stop

Stage: Deploy Produc2on CloudFormation DeployProduction https://github.com/aws/amazon-sagemaker-examples/blob/master/sagemaker-pipelines/tabular/customizing_build_train_deploy_project/sagemaker-pipelines-customized-project.ipynb

他にもある AWS のワークフロー管理ツール • サーバーレスオーケストレーションサービス • 分散アプリケーション・マイクロサービスの全体を「ステートマシン」と呼ばれる仕組みでオーケストレート
• 定義したステートマシンは AWS コンソールから「ワークフロー」という形式で可視化 • ステートマシンの各ステップの実⾏履歴をログから追跡できる • Apache Airflow によるワークフローを構築可能なマネージドサービス • ETLジョブやデータパイプラインを実⾏するワークフローをマネージド型で実⾏可能。開発者がビジネス上の課題解決に注⼒できるようにする • Airflowのメトリクスを CloudWatch メトリクスとして扱い、ログを CloudWatch Logs に転送可能 Amazon SageMaker Pipelines Amazon Managed Workﬂows for Apache Airﬂow (MWAA) AWS Step Functions w/Data Science SDK (Python) • 機械学習の CI/CD を実現する Amazon SageMaker の機能 • 機械学習ワークフローのデータロードや学習処理などの⼀連の処理ステップを任意のタイミングや所定の時間に実⾏できる • 各ステップの処理結果は SageMaker Experiments で記録され、モデルの出来映えや学習パラメータなどを視覚化できる

デモ動画 by Julien Simon • SageMaker Data Wrangler − https://www.youtube.com/watch?v=tbGGOic21PU
• SageMaker Feature Store − https://www.youtube.com/watch?v=-ydEYWhYlYw • SageMaker Pipelines − https://www.youtube.com/watch?v=Hvz2GGU3Z8g − https://www.youtube.com/watch?v=2CF-LBZjTn0

その他コンテンツ • AWS ブログ (SageMaker カテゴリ) − イベントレポートやユースケースの紹介など色々あります § https://aws.amazon.com/jp/blogs/news/category/artificial-
intelligence/sagemaker/ • AWS Startup ブログ − 他のスタートアップは AWS 使ってどんな感じで機械学習やってるの？と聞かれるのでSageMaker と Personalize の事例まとめブログを書きました § https://aws.amazon.com/jp/blogs/startup/tech-case-study-jp-startup-ai-ml/ • SageMaker Immersion Day − ハンズオンコンテンツ § https://sagemaker-immersionday.workshop.aws/ja/

JAWS-UG AI ML #11 SageMaker Updates

JAWS-UG AI ML #11 SageMaker Updates

Yoshitaka Haribara

More Decks by Yoshitaka Haribara

Other Decks in Technology

Featured

Transcript

© 2021, Amazon Web Services, Inc. or its aﬃliates. All

Yoshitaka Haribara, Ph. D. Startup Solutions Architect Tokyo, Japan 日本でスタートアップ担当の

8月号 AWS テクノロジー講座第4回機械学習の導入やサービス選定にあたっての考え方 9月号 AWS テクノロジー講座第5回

JAWS-UG AI/ML 支部㊗リブート 🎉

1. AWS AI/ML サービスおさらい 2. AWS re:Invent 2020 以降で発表されたアップデート 3.

VISION SPEECH TEXT SEARCH CHATBOTS PERSONALIZATION FORECASTING FRAUD CONTACT CENTERS

Amazon SageMaker overview PREPARE SageMaker Ground Truth Label training data

全部は紹介しきれないので AWS Summit Online 動画を https://www.youtube.com/watch?v=x28_DF5polM

更に詳しくは AWS re:Invent アップデートまとめブログ https://aws.amazon.com/jp/blogs/news/reinvent-recap-ai-ml-20/

AWS to offer NVIDIA A100 Tensor Core GPU-based Amazon EC2

Amazon EC2 P4d インスタンス NVIDIA A100 Tensor Core GPU を搭載した

P4d のパフォーマンス様々な深層学習モデルのトレーニングにおいて、P3dn よりも2倍以上⾼速 Throughput Improvement DNN P3dn FP32 (imgs/sec)

Habana Gaudi-based Amazon EC2 深層学習モデルのトレーニング⽤に特別に設計された、 Habana Labs の Gaudi アクセラレータを搭載した

AWS Trainium AWS により設計された⾼性能な機械学習トレーニングチップ • クラウドで ML モデルをトレーニングするための最⾼のコストパフォーマンスを提供 •

© 2021, Amazon Web Services, Inc. or its Affiliates. SageMaker

SageMaker Pipelines

17 © 2020 Amazon Web Services, Inc. or its affiliates.

19 © 2021 Amazon Web Services, Inc. or its aﬃliates.

20 © 2021 Amazon Web Services, Inc. or its affiliates.

21 © 2021 Amazon Web Services, Inc. or its affiliates.

22 © 2021 Amazon Web Services, Inc. or its affiliates.

23 © 2021 Amazon Web Services, Inc. or its affiliates.

24 © 2021 Amazon Web Services, Inc. or its aﬃliates.

Stage: Deploy Produc2on CloudFormation DeployProduction https://github.com/aws/amazon-sagemaker-examples/blob/master/sagemaker-pipelines/tabular/customizing_build_train_deploy_project/sagemaker-pipelines-customized-project.ipynb

他にもある AWS のワークフロー管理ツール • サーバーレスオーケストレーションサービス • 分散アプリケーション・マイクロサービスの全体を「ステートマシン」と呼ばれる仕組みでオーケストレート

デモ動画 by Julien Simon • SageMaker Data Wrangler − https://www.youtube.com/watch?v=tbGGOic21PU

その他コンテンツ • AWS ブログ (SageMaker カテゴリ) − イベントレポートやユースケースの紹介など色々あります § https://aws.amazon.com/jp/blogs/news/category/artificial-

Thank you © 2021, Amazon Web Services, Inc. or its