Accelerated Computing on AWS for NLP

© 2022, Amazon Web Services, Inc. or its affiliates. ©
2022, Amazon Web Services, Inc. or its affiliates. Accelerated Computing on AWS for NLP AWS で自然言語処理を高速化する方法 Takahiro Kubo Developer Relation Machine Learning

© 2022, Amazon Web Services, Inc. or its affiliates. Agenda
• Accelerated Computing on AWS for NLP • AWSの提供するAI/MLサービス • AWSのオープンソースへの貢献 • 教育・研究機関向けプログラム等

2022, Amazon Web Services, Inc. or its affiliates. Accelerated Computing on AWS for NLP 4

© 2022, Amazon Web Services, Inc. or its Affiliates. ELMo
(2018) BERT-Large (2018) GPT-2 (2019) Turing NLG (2020) GPT-3 (2020) Switch-C (2021) … 100B 1B 1T 10T 10B 100 M AI/ML インフラストラクチャの主要トレンド GROWTH IN MODEL COMPLEXITY (# of parameters) 1. 古典的な機械学習から深層学習（ディープラーニング）へ移行するにつれ、モデルはより複雑化 2. モデルの学習にかかる時間とコストは、数日から数週間へと爆発的に増加中 3. データサイエンティストやMLエンジニアは、自分たちのユースケースや経験に適したソフトウェアツールやハードウェアプラットフォームを模索

© 2022, Amazon Web Services, Inc. or its affiliates. アクセラレーテッドコンピューティングとは
6 CPU: 高速、低効率 GPU/FPGA/ASICs: 高スループット、高効率特定のカテゴリのアプリケーション（深層学習等）では、 GPU, FPGA, ASIC などを用いる事により超並列化と高効率化が可能専用ハードウェアによる並列処理によって、機械学習やデータ分析、画像処理など要求の厳しい処理を高速化すること

© 2022, Amazon Web Services, Inc. or its Affiliates. 機械学習向け
Amazon EC2 インスタンスの選択肢 Ice Lake CPU Cascade Lake CPU Habana accelerator EPYC CPU A100, A10G, T4G GPUs Graviton CPU Inferentia Chip Trainium Chip UltraScale+ FPGA C7g C6g C6i C6a M6g M6i M6a R6g R6i R6a F1 Inf1 G5g G5 P4 DL1 Trn1 Elastic Inference 従来の機械学習推論学習深層学習学習 + 推論アクセラレーテッドコンピューティング

© 2022, Amazon Web Services, Inc. or its Affiliates. GPUインスタンスの変遷
2017 2016 2010 2013 2018 2019 NVIDIA Grid K2 NVIDIA Tesla M60 NVIDIA Tesla T4 2020 2021 G2 AMD Radeon Pro V520 G5 NVIDIA A10G G5g NVIDIA T4G (ARM CPU) 学習 /推論 NVIDIA Tesla M2050 NVIDIA V100(32GB) NVIDIA V100(16GB) NVIDIA Tesla K80 CG1 NVIDIA A100 分散学習 2022 Inf1 Trn1 AWS Inferentia AWS Trainium 最大50%低価格最大70%低価格/最大 2.3倍スループット向上最大60%低価格/平均2.5倍高速化 G系インスタンス P系インスタンス

© 2022, Amazon Web Services, Inc. or its Affiliates. 小中規模モデル
単一GPUを利用 GPUインスタンスの選択中大規模モデル複数GPUを利用学習推論 G5 NVIDIA A10G NVIDIA Tesla T4 Trn1 AWS Trainium Trn1.2xlargeは小中規模への利用を検討可能 NVIDIA V100(32GB) NVIDIA A100 学習時間に比べコンパイル時間が少ないならTrn1 Inf1 AWS Inferentia Inf2公開

© 2022, Amazon Web Services, Inc. or its Affiliates. Amazon
EC2 P4d インスタンス • NVIDIA A100 Tensor Core GPU を搭載 • 前世代のP3インスタンスと比較して、機械学習モデルの学習コストを最大60%削減、パフォーマンスは平均 2.5倍向上 • 2022年1月現在、米国東部 (バージニアおよびオハイオ)、米国西部 (オレゴン)、欧州 (アイルランドおよびフランクフルト)、アジアパシフィック (東京およびソウル) の各リージョンで利用可能 https://aws.amazon.com/jp/ec2/instance-types/p4/ インスタンスサイズ GPU (A100) GPUメモリ (GB) vCPU メモリ (GB) NVSwitch (GB/s) NW帯域 (Gbps) NVMe SSD (TB) p4d.24xlarge 8 320 96 1152 600 400 8 0 2 4 6 8 10 12 FP64 TFLOPS FP32 TFLOPS FP16 TFLOPS INT8 TOPS GPU Memory BW (GB/s) GPU Memory (GB) NVLink BW (GB/s) V100 A100 Improvement (x)

© 2022, Amazon Web Services, Inc. or its Affiliates. Wikipedia
コーパスのデータセットでトレーニングされた PyTorch フレームワーク実装の BERT-Large モデル 3x P3 インスタンスよりも高速* Imagenet2012 データセットでトレーニングされた TensorFlow フレームワーク実装の ResNet50 モデル 2.1x P3 インスタンスよりも高速* LibrisPeech データセットでトレーニングされた PyTorch 実装の Jasper モデルの場合 2.3x P3 インスタンスよりも高速* * すべての比較は、単一の p4d.24xlarge インスタンスと p3.16xlarge インスタンス間で行われています。 P4d パフォーマンス

© 2022, Amazon Web Services, Inc. or its Affiliates. 小中規模モデル
単一GPUを利用コンパイルとは? 中大規模モデル複数GPUを利用学習推論 G5 NVIDIA A10G NVIDIA Tesla T4 Trn1 AWS Trainium Trn1.2xlargeは小中規模への利用を検討可能 NVIDIA V100(32GB) NVIDIA A100 学習時間に比べコンパイル時間が少ないならTrn1 Inf1 AWS Inferentia

© 2022, Amazon Web Services, Inc. or its Affiliates. AWS
Neuron SDK AWS Trainium と AWS Inferentia 上の機械学習を最適化する SDK。 PyTorch / TensorFlow等をサポートしわずか数行のコード変更で導入可能。 https://github.com/aws-neuron/aws-neuron-samples https://awsdocs-neuron.readthedocs-hosted.com AWS Neuron SDK ドキュメントサンプルコード主要なフレームワークをサポート Neuron コンパイラ Neuron ランタイムプロファイリングツール ※Trainium/Inferentia以外のインスタンスでも、SageMaker Training Compilerを使うことでコンパイルによる高速化の恩恵を受けられる Trn1では TensorFlowは対応予定、MXNet は対応予定なし

© 2022, Amazon Web Services, Inc. or its Affiliates. Trainium
Example: BERT-Large pre-training Bring your own model JIT-compile to Trainium XLAでコンパイルすることで Trainiumでも実行可能になる。

© 2022, Amazon Web Services, Inc. or its Affiliates. Inferentia
Example: BERT-Base • わずか数行のコードの変更のみで、事前学習済みモデルをInferentiaチップ向けにコンパイル可能

EC2 Trn1 インスタンス • AWSによってカスタム設計された高性能機械学習トレーニングチップ AWS Trainium を搭載したインスタンス • 最も費用効果の高いMLトレーニング性能を実現 • GPUインスタンスと比較し最大 50%低価格を実現 • 最大16個の AWS Trainium アクセラレータ、512GB の高速 HBM2メモリ、8TB のローカル NVMe SSDを搭載 • 最大800Gbps の Elastic Fabric Adapter (EFA) ネットワーク帯域 • Trainium 間は超高速 NeuronLink で接続 • Tensorflow、PyTorchなど主要MLフレームワークをサポート • Trn1上で学習し、デプロイ先は自由インスタンスサイズ Trainium アクセラレータメモリ (GB) vCPU メモリ (GB) NVMe SSD (TB) EBS帯域 (Gbps) NW帯域 (Gbps) オンデマンド価格 (USD/時間) Trn1.2xlarge 1 32 8 32 0.5 最大 20 最大 12.5 1.34 Trn1.32xlarge 16 512 128 512 8 80 800 21.50 https://aws.amazon.com/jp/ec2/instance-types/trn1/ *2022年10月時点の米国東部 (バージニア北部)の価格

© 2022, Amazon Web Services, Inc. or its Affiliates. Best
cost to train with Trn1 A M A Z O N E C 2 T R N 1 I N S T A N C E S D E L I V E R T H E B E S T T R A I N I N G P R I C E - P E R F O R M A N C E Higher throughput Trn1.32xl P4d.24xl 1x T h r o u g h p u t 1.5x Lower cost Trn1.32xl P4d.24xl 1x C o s t - t o - t r a i n 0.43x * Measured on PyTorch BERT-Large

EC2 Inf1 インスタンス • AWS が独自設計した機械学習推論チップAWS Inferentia を搭載クラウド上で深層学習モデルを実行する上で最も低価格を実現 • GPUインスタンスと比較し最大2.3倍のスループット向上、推論処理当たり最大70%低価格 • 6xlarge、24xlargeでは複数の Inferentia チップを高速チップ間通信で接続 • 最大 100Gbps のネットワークインタフェース • 2022年10月現在、東京を含む23のリージョンにて利用可能インスタンスサイズ Inferentia vCPU メモリ (GiB) ストレージ EBS帯域 (Gbps) NW帯域 (Gbps) オンデマンド価格 (USD/時間) inf1.xlarge 1 4 8 EBS Only 最大 3.5 最大 25 0.228 inf1.2xlarge 1 8 16 EBS Only 最大 3.5 最大 25 0.362 inf1.6xlarge 4 24 48 EBS Only 3.5 25 1.18 inf1.24xlarge 16 96 192 EBS Only 19 100 4.721 *2022年10月時点の米国東部 (バージニア北部)の価格

© 2022, Amazon Web Services, Inc. or its Affiliates. 様々なお客様でInf1によるコストパフォーマンス最適化を実現
https://aws.amazon.com/ec2/instance-types/inf1/#Customer_Testimonials Hotpot.ai Amazon Rekognition

© 2022, Amazon Web Services, Inc. or its Affiliates. Amazon内での
Inf1 活用事例 Amazon Alexa 実際の人間の会話のような自然な音声を生成する非常に複雑な音声合成モデルを導入しており、全世界で1億台以上の Alexa デバイスをサポートしています。Inf1インスタンスにより、GPUインスタンスと比較して運用コストを約30％削減し、推論レイテンシを25％改善することができました Amazon Robotics 私たちのシステムは 2022 年に 1,000 台以上の SageMaker ホストを使用する予定です。AWS Inferentia は、私たちの機械学習モデルを再トレーニングすることなく、35% 低いコストと 20% 高いスループットで、急速に増加するトラフィックに対応する機会を与えてくれます」 Amazon Prime Video EC2 Inf1 インスタンスに画像分類機械学習モデルをデプロイしたところ、パフォーマンスが 4 倍向上し、コストも最大 40% 削減することができました。

© 2022, Amazon Web Services, Inc. or its Affiliates. 日本国内のお客様の声
Money Forward, Inc. 「当社の AI チャットボットサービスを Amazon EC2 Inf1 インスタンスに移行するのは簡単でした。2 か月以内に移行を完了し、 Amazon Elastic Container Service（ECS）を使用して AWS Inf1 インスタンスで大規模なサービスを開始しました。Inf1 インスタンスあたり複数のモデルを提供することで、 (同等の GPU ベースのインスタンスに比べて) 推論レイテンシを 97% 削減し、推論コストを 50% 以上削減できました。」 https://aws.amazon.com/jp/builders-flash/202209/create-large-scale-inference-environment/

© 2022, Amazon Web Services, Inc. or its Affiliates. AWS
マネージドサービスでの Trn1・Inf1 対応 Amazon SageMaker • Trn1・Inf1 インスタンスを開始するための最も簡単で迅速な方法 • Amazon SageMaker は機械学習モデルをすばやく構築、トレーニング、デプロイするためのフルマネージドサービス • SageMaker トレーニングジョブでTrn1を使用可能 • Inf1 インスタンスと Neuron はモデルをワンクリックでデプロイできるよう SageMaker に統合 Amazon EKS & ECS • Trn1・Inf1 は Amazon EKS 及び ECS 上で利用可能 • Trn1 インスタンスでディープラーニングの学習ワークロードを実行可能 • Inf1 インスタンス上にモデルをデプロイするのに最適なマネージドコンテナサービス AWS Deep Learning AMI & Deep Learning コンテナ • Neuron は AWS Deep Learning AMI 及び AWS Deep Learning コンテナにプリインストール AWS Deep Learning Containers AWS Deep Learning AMIs Amazon SageMaker AWS Elastic Kubernetes Service Amazon Elastic Container Service

2022, Amazon Web Services, Inc. or its affiliates. AWSの提供するAI/MLサービス 25

© 2022, Amazon Web Services, Inc. or its affiliates. AWS
の AI/ML スタック広範かつ最も充実した機械学習の機能群 AI SERVICES Code + DevOps Amazon CodeGuru Amazon DevOps Guru Business processes Amazon Personalize Amazon Forecast Amazon Fraud Detector Amazon Lookout for Metrics Search Amazon Kendra Industrial Amazon Monitron Amazon Lookout for Equipment Amazon Lookout for Vision Healthcare Amazon HealthLake Amazon Comprehend Medical Amazon Transcribe Medical SPECIALIZED Chatbots Amazon Lex Text & Documents Amazon Translate Amazon Comprehend Amazon Textract Speech Amazon Polly Amazon Transcribe Amazon Transcribe Call Analytics Vision Amazon Rekognition AWS Panorama CORE ML SERVICES Manage edge devices Learn ML No-code ML for business analysts Prepare data Store features Detect bias Build with notebooks Manage & monitor Train models Deploy in production Tune parameters Explain predictions CI/CD Label data SAGEMAKER CANVAS SAGEMAKER STUDIO LAB AMAZON SAGEMAKER STUDIO IDE ML FRAMEWORKS & INFRASTRUCTURE TensorFlow, PyTorch, Apache MXNet, Hugging Face Amazon EC2 CPUs GPUs AWS Trainium Elastic inference AWS Inferentia FPGA Habana Gaudi Deep Learning Containers (DLC)

の AI/ML スタック広範かつ最も充実した機械学習の機能群 AI SERVICES Code + DevOps Amazon CodeGuru Amazon DevOps Guru Business processes Amazon Personalize Amazon Forecast Amazon Fraud Detector Amazon Lookout for Metrics Search Amazon Kendra Industrial Amazon Monitron Amazon Lookout for Equipment Amazon Lookout for Vision Healthcare Amazon HealthLake Amazon Comprehend Medical Amazon Transcribe Medical SPECIALIZED Chatbots Amazon Lex Text & Documents Amazon Translate Amazon Comprehend Amazon Textract Speech Amazon Polly Amazon Transcribe Amazon Transcribe Call Analytics Vision Amazon Rekognition AWS Panorama CORE ML SERVICES Manage edge devices Learn ML No-code ML for business analysts Prepare data Store features Detect bias Build with notebooks Manage & monitor Train models Deploy in production Tune parameters Explain predictions CI/CD Label data SAGEMAKER CANVAS SAGEMAKER STUDIO LAB AMAZON SAGEMAKER STUDIO IDE ML FRAMEWORKS & INFRASTRUCTURE TensorFlow, PyTorch, Apache MXNet, Hugging Face Amazon EC2 CPUs GPUs AWS Trainium Elastic inference AWS Inferentia FPGA Habana Gaudi Deep Learning Containers (DLC) 「機械学習モデル」のマネージドサービス主にWEB APIで利用可能「機械学習モデル開発環境」のマネージドサービス Jupyter Notebookベースの開発環境やMLOps機能の提供「機械学習モデル学習/推論リソース」のクラウドサービス GPUインタンスやコンテナイメージなど

の AI/ML スタック広範かつ最も充実した機械学習の機能群 AI SERVICES Code + DevOps Amazon CodeGuru Amazon DevOps Guru Business processes Amazon Personalize Amazon Forecast Amazon Fraud Detector Amazon Lookout for Metrics Search Amazon Kendra Industrial Amazon Monitron Amazon Lookout for Equipment Amazon Lookout for Vision Healthcare Amazon HealthLake Amazon Comprehend Medical Amazon Transcribe Medical SPECIALIZED Chatbots Amazon Lex Text & Documents Amazon Translate Amazon Comprehend Amazon Textract Speech Amazon Polly Amazon Transcribe Amazon Transcribe Call Analytics Vision Amazon Rekognition AWS Panorama CORE ML SERVICES Manage edge devices Learn ML No-code ML for business analysts Prepare data Store features Detect bias Build with notebooks Manage & monitor Train models Deploy in production Tune parameters Explain predictions CI/CD Label data SAGEMAKER CANVAS SAGEMAKER STUDIO LAB AMAZON SAGEMAKER STUDIO IDE ML FRAMEWORKS & INFRASTRUCTURE Amazon EC2 CPUs GPUs AWS Trainium Elastic inference AWS Inferentia FPGA Habana Gaudi TensorFlow, PyTorch, Apache MXNet, Hugging Face Deep Learning Containers (DLC) ここからはこちらにフォーカスして解説 (前節で解説)

© 2022, Amazon Web Services, Inc. or its affiliates. AWSは機械学習の学びから活用まで、すべての段階で効率的
かつ一貫した開発体験を提供 Learning Experimenting Building Deploying Scaling Prototyping SageMaker Studio Lab 機械学習を学び、検証し、試作する機械学習モデルを構築・運用・スケールする SageMaker

© 2022, Amazon Web Services, Inc. or its affiliates. SageMaker
Studio Labは機械学習の学び、検証、試作のフェーズで利用いただくのに最適 Learning Experimenting Building Deploying Scaling Prototyping SageMaker Studio Lab 機械学習を学び、検証し、試作する機械学習モデルを構築・運用・スケールする SageMaker

© 2022, Amazon Web Services, Inc. or its affiliates. Studio
Labはデータサイエンスの学習に適した環境がメールアドレスのみ、無料で開始可能。

© 2022, Amazon Web Services, Inc. or its affiliates. JupyterLabをベースにすることで、データサイエンティスト
にとって慣れ親しんだ使い勝手を提供ファイルブラウザでストレージ内のファイルを確認できるタブでウィンドウを管理できる。ターミナルも使用可能。

© 2022, Amazon Web Services, Inc. or its affiliates. GUIでGitが扱えるExtensionを
標準で搭載デバッガを利用可能 JupyterLabをベースにすることで、データサイエンティストにとって慣れ親しんだ使い勝手を提供

は機械学習モデルの開発からデプロイまで実行可能な統合開発環境 Learning Experimenting Building Deploying Scaling Prototyping SageMaker Studio Lab 機械学習を学び、検証し、試作する機械学習モデルを構築・運用・スケールする SageMaker

© 2022, Amazon Web Services, Inc. or its affiliates. ML
Codeに集中したい開発者のために、機械学習の技術的負債を解消するマネージドサービスを提供 Configuration Data Collection Data Verification Machine Resource Management Serving Infrastructure ML Code Analysis Tool Process Management Tools Feature Extraction Monitoring “Only a small fraction of real-world ML systems is composed of the ML code” source: Hidden Technical Debt in Machine Learning Systems [D. Sculley, & al.] – 2015 https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf Ground Truth Glue Clarify Data Wrangler Feature Store Processing Job Studio Auto Pilot JumpStart Debugger Model Monitor Endpoint Pipeline MWAA Edge Quick Sight Experiments Auto Scaling Training Job

© 2022, Amazon Web Services, Inc. or its affiliates. JupyterLabベースのAmazon
SageMaker Studioから、ワンストップでSageMakerの各種マネージドサービスを起動可能 Studio ML Code Feature Extraction Data Wrangler

© 2022, Amazon Web Services, Inc. or its affiliates. 開発したML
CodeをDockerコンテナで実行できるようにすることで、学習、推論のスケールやプロセス管理を委託 ML Code AWS Deep Learning Containers G系・P系インスタンス AWS Trainium AWS Inferentia Amazon ECR Studio Amazon S3 Data Amazon EC2 Amazon S3 Amazon SageMaker Training Job/ Processing Job Endpoint

© 2022, Amazon Web Services, Inc. or its affiliates. import
sagemaker # 各フレームワークに対応した Estimator クラス from sagemaker.pytorch import PyTorch estimator = PyTorch("train.py", role=sagemaker.get_execution_role(), instance_count=1, instance_type="ml.p3.2xlarge", framework_version="1.8.0", py_version="py3") # モデルトレーニング estimator.fit(“s3://mybucket/data/train”) # デプロイ predictor = estimator.deploy(initial_instance_count=2, instance_type="ml.m5.xlarge") # 推論の実行 predictor.predict(data) https://sagemaker.readthedocs.io/en/stable/v2.html SageMakerへの処理の委託は、SageMaker Python SDKを利用することで容易に実行可能

でサポートする DL/ML コンテナ一覧 ※ 2022年1月25日時点コンテナを持ち込むことで未対応のフレームワークにも対応可能。 Frameworks SageMaker container supported version Deep Learning TensorFlow Script mode: 2.0, 2.1, 2.2, 2.3, 2.4, 2.5, 2.6 EIA, Graviton, Neuron対応 PyTorch 0.4.0, 1.0.0, 1.1.0, 1.2.0, 1.3.0 1.4.0 1.5, 1.6, 1.7.1, 1.8.0, 1.8.1, 1.9.1 EIA, Graviton, Neuron 対応 Hugging Face TensorFlow 2.3, 2.4, 2.5, PyTorch 1.7-1.10 MXNet 0.12.1, 1.0.0, 1.1.0, 1.2.1, 1.3.0, 1.4.0, 1.4.1, 1.6.0 1.3.0, 1.4.0, 1.4.1, … 1.6, 1.7, 1.8, (for Elastic Inference) EIA, Graviton, Neuron 対応 ML scikit-learn 0.23.1 https://github.com/aws/sagemaker-python-sdk TensorFlow:: https://github.com/aws/sagemaker-tensorflow-serving-container PyTorch: https://sagemaker.readthedocs.io/en/stable/using_pytorch.html MXNet: https://sagemaker.readthedocs.io/en/stable/using_mxnet.html Sklearn: https://sagemaker.readthedocs.io/en/stable/using_sklearn.html Hugging Face: https://github.com/aws/deep-learning-containers/tree/master/huggingface

Studio LabからSageMaker Studioへプロジェクトのエクスポートが可能。 40 • 大規模なデータ作成や前処理 • 長時間の学習/分散学習 • MLOps/CI/CD • 本番運用/モデル監視 SageMakerの上記機能が必要になったとき、Gitリポジトリを通じ移行可能。

© 2022, Amazon Web Services, Inc. or its affiliates. 開発者がスムーズに移行できるよう、オープンソースの
JupyterLabをベースとした一貫した開発体験を提供。 41 ローカル(JupyterLab) SageMaker Studio Lab SageMaker Studio チーム間で統一かつ十分なCPU/GPUのある環境開発者それぞれのPCで稼働エンタープライズレベルのインフラがある環境 Learning Experimenting Building Deploying Scaling Prototyping

© 2022, Amazon Web Services, Inc. or its affiliates. 機械学習の学びから実践まで、各行程の進め方や実装方法について短時
間で楽しく学べるビデオコンテンツを提供。 Light Part Dark Part 製品やサービスに機械学習を導入するプロジェクトの進め方機械学習モデルの開発や運用をマネージドサービスで効率的に行う方法 https://bit.ly/3M1F9as https://bit.ly/3927PCN Learning Experimenting Building Deploying Scaling Prototyping サービスだけでなく、コンテンツの面でも機械学習の学びから活用までをサポート。

2022, Amazon Web Services, Inc. or its affiliates. AWSのオープンソースへの貢献 43

© 2022, Amazon Web Services, Inc. or its affiliates. オープンソースによる
Builder ツールの改善

© 2022, Amazon Web Services, Inc. or its affiliates. マネージドサービス
で OSS の実運用をサポート 45 Amazon Keyspaces for Apache Cassandra Amazon EKS (Kubernetes) FreeRTOS AWS RoboMaker (ROS) Amazon ElastiCache for Redis, Memcached AWS AppMesh (Envoy) TorchServe (Run PyTorch Models) AWS Lambda (Firecracker) Amazon OpenSearch Service

が貢献する AI/ML のオープンソース AutoGluon State of the art の AutoML auto.gluon.ai/ Apache MXNet 深層学習フレームワーク mxnet.apache.org/ Optimized DL Frameworks PyTorch, Tensorflow, Apache MXNet のサポート Hugging Face SageMaker Python SDK, DLC への統合 Kubernetes Support SageMaker Operators & Components による連携 Deep Graph Library (DGL) Graph Neural Networks www.dgl.ai/ TorchServe PyTorch の model server Deep Java Library (DJL) Java で深層学習 djl.ai/ Dive into Deep Learning インタラクティブな深層学習の教科書 d2l.ai/

© 2022, Amazon Web Services, Inc. or its affiliates. Jupyter
は、人々がコードとデータを物語の一部として埋め込み、それらの洞察を他の人に伝える computational narrative を作るツールです。「AWS では、Jupyter をできる限り優れたものにし、オープンソースコミュニティと関わり、お客様とすべての Jupyter ユーザーに代わって Jupyter を改善したいと考えています」 Jupyter へのサポートと貢献コミュニティへの貢献 • Jupyter Steering Council メンバー (Brian Granger, AWS) • JupyterLab, JupyterLab Git, Jupyter Server, Notebook, Kernel Gateway などへのコードコントリビューション • NumFOCUS Advisory Board メンバー Jupyter エクスペリエンスの向上 • Amazon SageMaker Studio など、エンタープライズ Jupyter ユーザー向けの製品を構築 • SSO およびノートブック共有などの機能との統合

© 2022, Amazon Web Services, Inc. or its affiliates. TorchServe
はオープンソースの PyTorch モデルサービングフレームワークで、カスタムコードを記述することなく、トレーニング済みの PyTorch モデルを大規模かつ高パフォーマンスで簡単にデプロイできます。 https://pytorch.org/serve/ TorchElastic Controller for Kubernetes は TorchElastic のネイティブ Kubernetes 実装で、 TorchElastic トレーニングに必要なポッドとサービスのライフサイクルを自動的に管理します。 PyTorch PyTorch に対する AWS の貢献 PyTorch は、ユーザーフレンドリーなフロントエンド、分散学習などのツールとエコシステムを通じて、効率的なモデル作成と迅速で柔軟な実験を可能にします。

© 2022, Amazon Web Services, Inc. or its affiliates. Meta、AWS
を戦略的クラウドプロバイダーに選定 • これまで5年以上にわたるコラボレーション • AWS 上で PyTorch の深層学習モデルを動かす際のパフォーマンス向上 • より簡単・素早くプロダクション導入「AWS によるグローバルなサービス展開と信頼性のもと、 Meta は今後も世界中で当社製品やサービスを利用する何十億人もの人々や、AWS 上で PyTorch を活用している顧客に革新的な体験を提供していきます」 Jason Kalich, VP of Production Engineering at Meta https://press.aboutamazon.com/news-releases/news-release-details/meta-selects-aws-key-long-term-strategic-cloud-provider https://aws.amazon.com/jp/about-aws/whats-new/2021/12/meta-selects-aws-key-long-term-strategic-cloud-provider/

© 2022, Amazon Web Services, Inc. or its affiliates. NLP
をすべての人が簡単に利用できるようにする AWS との強力なパートナーシップ Hugging Face は、最先端の NLP テクノロジーを提供する最も人気のあるオープンソース企業 Hugging Face NLP モデルのトレーニングで使用するための高性能なリソースを提供する SageMaker との統合 AWS

2022, Amazon Web Services, Inc. or its affiliates. 教育・研究機関向けプログラム等 53

© 2022, Amazon Web Services, Inc. or its affiliates. 教育プログラム
クラウド技術者を目指す学生向けの授業を行う教育機関向け無料カリキュラムパッケージ https://aws.amazon.com/jp/training/awsacademy/ 講師トレーニング、教材、学生が無料でアクセスできる AWS 実習環境 LearnerLab を含む教育機関向けカリキュラムパッケージ。受講者は各加盟教育機関の授業を受講する。【コース】初級コース (20時間分の授業用コンテンツ) ・ AWS Academy Cloud Foundations ・ AWS Academy Machine Learning Foundations 中級コース(40時間分の授業用コンテンツ) ・ AWS Academy Cloud Architecting ・ AWS Academy Cloud Developing

© 2022, Amazon Web Services, Inc. or its affiliates. 機械学習の無料お試し環境
https://aws.amazon.com/jp/sagemaker/studio-lab/ JupyterLab ベース、AWS 上のコンピューティングリソースに無料でアクセスして機械学習の学習と実験をすぐに始められる。クレジットカード登録不要。ユーザーセッション 1 回あたり12 時間の CPU または 4 時間の GPU のいずれかを選択して利用でき、利用できるユーザーセッションの数は無制限。プロジェクトごとに最低 15 GB の永続的ストレージを利用可能。セッションの期限が切れると、 SageMaker Studio Lab は環境のスナップショットを取得します。これにより、中断したところからすぐに再開可能。 GitHub と緊密に統合されており、Git コマンドラインを完全にサポート。

Cloud Credit for Research 56 ✓クラウドでホストされるサービス、ソフトウェア、ツールの構築、研究プロセスのクラウド移行等の新しいプロジェクトをサポートするプログラム。 ✓申請は四半期ごとに審査されます。 ✓申請金額に上限はありません。 ✓研究環境のオンプレミスからクラウドへの移行の検証 ✓研究プロジェクトのメンバーのトレーニング ✓研究の公開のための基盤構築などの目的で応募可能。 https://aws.amazon.com/jp/government-education/research- and-technical-computing/cloud-credit-for-research/ 研究助成プログラム

© 2022, Amazon Web Services, Inc. or its affiliates. DeepRacer学生リーグ
16歳以上個人参加可能、費用はかかりません（クレジットカード登録不要）クラウド上での機械学習の実践上位リーグに勝ち進めばグローバルチャンピオンシップへの参加可能 https://aws.amazon.com/jp/deepracer/student/japan-student-championship/

© 2022, Amazon Web Services, Inc. or its affiliates. AWSの採用
https://aws.amazon.com/jp/careers/newgraduate/ 問い合わせ：[email protected] AWS・amazonでは新卒・キャリア採用ともに積極的におこなっています。

© 2022, Amazon Web Services, Inc. or its affiliates. Amazon
Scienceの採用・インターン Amazon/AWSでの研究者としてのキャリアパスに関心がある方はこちらの資料をご参照ください。

2022, Amazon Web Services, Inc. or its affiliates. 参考資料 60

© 2022, Amazon Web Services, Inc. or its affiliates. 参考情報
• https://aws.amazon.com/jp/ec2/instance-types/inf1/ • https://aws.amazon.com/jp/machine-learning/inferentia/ • https://aws.amazon.com/jp/ec2/instance-types/trn1/ • https://aws.amazon.com/jp/machine-learning/trainium/ • https://awsdocs-neuron.readthedocs-hosted.com/

– AWS ブログ https://aws.amazon.com/jp/blogs/news/aws-trainium-amazon-ec2-trn1-ml-training-part1/ https://aws.amazon.com/jp/builders-flash/202209/create-large-scale-inference-environment/ https://aws.amazon.com/jp/solutions/case-studies/amazon-robotics-case-study/ https://aws.amazon.com/jp/blogs/machine-learning/how-amazon-search-reduced-ml-inference-costs-by-85-with-aws-inferentia/ https://aws.amazon.com/jp/solutions/case-studies/finchcomputing-case-study/ https://aws.amazon.com/jp/blogs/news/inference-environment-using-aws-inferentia-and-amazon-ecs-with-aws-cdk-part1/ https://aws.amazon.com/jp/blogs/news/inference-environment-using-aws-inferentia-and-amazon-ecs-with-aws-cdk-part2/ https://medium.com/pytorch/democratizing-gpr-ground-penetrating-radar-with-deep-learning-feddd9d2286d

– AWS ブログ https://aws.amazon.com/jp/blogs/news/how-infojobs-adevinta-improves-nlp-model-prediction-performance-with-aws-inferentia-and-amazon- sagemaker/ https://aws.amazon.com/jp/blogs/startup/event-report-deep-learning-accelerator-instances/ https://aws.amazon.com/jp/blogs/news/ec2-event-nttpc-anymotion-inf1-costperformance-optimization/ https://aws.amazon.com/jp/blogs/news/choose-the-best-ai-accelerator-and-model-compilation-for-computer-vision-inference-with-amazon-sagemaker/ https://aws.amazon.com/jp/blogs/news/serve-3000-deep-learning-models-on-amazon-eks-with-aws-inferentia-for-under-50-an-hour/ https://aws.amazon.com/jp/blogs/news/scaling-ad-verification-with-machine-learning-and-aws-inferentia/ https://aws.amazon.com/jp/blogs/news/achieve-12x-higher-throughput-and-lowest-latency-for-pytorch-natural-language-processing-applications-out- of-the-box-on-aws-inferentia/

© 2022, Amazon Web Services, Inc. or its affiliates. 🤗
Hugging Face 連携の参考資料 AWS Machine Learning Summit • Accelerate NLP training with Amazon SageMaker https://youtu.be/1LwjUbzcJok Documentation • Use Hugging Face with Amazon SageMaker https://docs.aws.amazon.com/sagemaker/latest/dg/hugging-face.html • Hugging Face on Amazon SageMaker https://huggingface.co/docs/sagemaker/main • Deploying HuggingFace TorchScript models on AWS using the Neuron SDK https://huggingface.co/docs/transformers/master/en/serialization#deploying-huggingface-torchscript-models-on-aws- using-the-neuron-sdk Example Notebook • SageMaker で PyTorch と Hugging Face を使ってテキスト分類モデルをトレーニングしたい — PyTorch Getting Started Demo. • SageMaker で TensorFlow と Hugging Face を使ってテキスト分類モデルをトレーニングしたい — TensorFlow Getting Started example. • Hugging Face と SageMaker を使ってデータ並列処理による分散トレーニングを実行したい – Distributed Training example. • Hugging Face と SageMaker を使ってモデル並列処理による分散トレーニングを実行したい – Model Parallelism example. • SageMaker でスポットインスタンスを使用して Hugging Face のモデルをトレーニングしたい – Spot Instances example. • SageMaker で Hugging Face を使ってテキスト分類モデルをトレーニングするときに、カスタムメトリクスを取りたい– Training with Custom Metrics example. • SageMaker で Hugging Face を使って、TensorFlow の分散学習がしたい – Distributed TensorFlow Training example. • Hugging Face のモデルを Neuron Container で Inf1 にデプロイ – Inf1 Neuron Container example.

Accelerated Computing on AWS for NLP

Accelerated Computing on AWS for NLP

More Decks by Takahiro Kubo

Other Decks in Technology

Featured

Transcript