Amazon EC2 シリコン革命 / Amazon EC2 Silicon Innovation

Slide 1

Slide 1 text

Slide 2

Slide 2 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon EC2 シリコン⾰命〜 AWSで実現する最新の機械学習プラットフォームを⽀える技術〜渡辺啓太 A W S - 3 3 アマゾンウェブサービスジャパン合同会社コンピュート事業本部シニアソリューションアーキテクトセルフマネジッドマシンラーニング担当

Slide 3

Slide 3 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. 本セッションについて⽬的機械学習分野への応⽤に有⽤なAWS カスタムチップの紹介を⾏うとともに、それらを活⽤して学習から推論までを⼀気通貫して⾏うアーキテクチャの⼀例を紹介する対象者 - カスタムシリコンのAWS Nitro チップや、AWS Graviton プロセッサ、機械学習向けAWS Inferentia、AWS Trainium の概要に興味をお持ちの⽅ - AWS カスタムチップを⽤いたコストパフォーマンスの良い機械学習に興味をお持ちの⽅話さないこと - AWS ML Service について

Slide 4

Slide 4 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. 渡辺啓太（Keita Watanabe） Senior Solutions Architect, AI/ML Frameworks 略歴 • 現職では Solutions Architect として、マネージド・サービスを⽤いない機械学習システムの開発を⽀援 • ⾃動運転を⼿掛けるスタートアップ企業にてML Researcher として⾃動運転⾞両の意思決定システムの研究開発に従事 • ⽇本最⼤級の E コマースサイトを⼿掛ける企業にて Data Scientist として商品画像検索サービスの研究開発に従事好きなAWS Service • AWS ParallelCluter • Amazon EKS • Amazon EC2

Slide 5

Slide 5 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda - はじめに - AWS のシリコン⾰命〜”コスパ”に対するAWS の挑戦 - AWS Nitro System - AWS Graviton - AWS Inferentia とAWS Trainium - ⼤規模機械学習を⽀える技術 - 分散学習を⽀える技術 - 分散推論を⽀える技術 - AWS カスタムシリコンを⽤いて⼤規模⾔語モデルの学習から推論を⼀気通貫して⾏うアーキテクチャの紹介

Slide 6

Slide 6 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda - はじめに - AWS のシリコン⾰命〜”コスパ”に対するAWS の挑戦 - AWS Nitro System - AWS Graviton - AWS Inferentia とAWS Trainium - ⼤規模機械学習を⽀える技術 - 分散学習を⽀える技術 - 分散推論を⽀える技術 - ⼤規模⾔語モデルの学習から推論を⼀気通貫して⾏うアーキテクチャの紹介

Slide 7

Slide 7 text

Slide 8

Slide 8 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. YEAR 2012 2016 2018 2019 2020 2021 … … YOLO, GNMT 210M BERT-L 340M GPT-2 1.5B GPT-3 175B 2022 Alexnet 62M SWITCH-C 1.6T モデルは驚異的なペースで⼤規模化モデルのサイズ (パラメータ数)

Slide 9

Slide 9 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. メリット • Amazon の20年以上の経験にもとづいて開発 • 基盤モデル Amazon Titan Text により、要約やテキスト⽣成等、⾔語タスクを⾃動化 • 基盤モデル Amazon Titan Embeddings により、検索やリコメンドの精度を向上 9 Amazon が責任をもって開発した⾼性能基盤モデル Amazon Titan Titan Text ⾃然⾔語処理 NLP タスク Titan Embeddings 検索やリコメンドのようなタスク

Slide 10

Slide 10 text

Slide 11

Slide 11 text

Slide 12

Slide 12 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda - はじめに - AWS のシリコン⾰命〜”コスパ”に対する AWS の挑戦 - AWS Nitro System - AWS Graviton - AWS Inferentia とAWS Trainium - ⼤規模機械学習を⽀える技術 - 分散学習を⽀える技術 - 分散推論を⽀える技術 - ⼤規模⾔語モデルの学習から推論を⼀気通貫して⾏うアーキテクチャの紹介性能とコストの両⽴⼤規模モデルの学習・推論を効率よく低価格で実⾏可能な計算資源ここで解決したい課題

Slide 13

Slide 13 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS のシリコン⾰命 ~”コスパ“に対するAWSの挑戦~ AWS Nitro System ハイパーバイザーネットワークストレージ/SSD セキュリティ AWS Graviton パワフル+効率的最新プロセッサ AWS Inferentia AWS Trainium 機械学習アクセラレーション

Slide 14

Slide 14 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda - はじめに - AWS のシリコン⾰命〜”コスパ”に対する AWS の挑戦 - AWS Nitro System - AWS Graviton - AWS Inferentia とAWS Trainium - ⼤規模機械学習を⽀える技術 - 分散学習を⽀える技術 - 分散推論を⽀える技術 - ⼤規模⾔語モデルの学習から推論を⼀気通貫して⾏うアーキテクチャの紹介性能とコストの両⽴⼤規模モデルの学習・推論を効率よく低価格で実⾏可能な計算資源ここで解決したい課題

Slide 15

Slide 15 text

Slide 16

Slide 16 text

Slide 17

Slide 17 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. User space Kernel Without EFA Application MPI implementation EFA kernel driver Libfabric EFA device With EFA Application MPI implementation TCP/IP stack ENA network driver ENA device Elastic Fabric Adapter (EFA) Nitro System ハードウェアを⽤いて、⾼速なインスタンス間通信を可能とするネットワークインターフェイス

Slide 18

Slide 18 text

Slide 19

Slide 19 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda - はじめに - AWS のシリコン⾰命〜”コスパ”に対する AWS の挑戦 - AWS Nitro System - AWS Graviton - AWS Inferentia とAWS Trainium - ⼤規模機械学習を⽀える技術 - 分散学習を⽀える技術 - 分散推論を⽀える技術 - ⼤規模⾔語モデルの学習から推論を⼀気通貫して⾏うアーキテクチャの紹介性能とコストの両⽴⼤規模モデルの学習・推論を効率よく低価格で実⾏可能な計算資源ここで解決したい課題

Slide 20

Slide 20 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Graviton プロセッサ 64 ビット Arm プロセッサコア搭載カスタム AWS シリコンお客様に代わって迅速なイノベーション・ビルド・イテレートを実施クラウドネイティブなワークロードに最適化 AWS Graviton パワフル+効率的最新プロセッサ

Slide 21

Slide 21 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Graviton の歴史 Graviton 2018リリース第 1 世代 Graviton プロセッサ Graviton2 対Graviton ⽐ 4倍の vCPUs 7倍の CPU 性能 Graviton3 2021発表対Graviton2 ⽐ 25%性能向上対x86インスタンス 60% 電⼒効率向上 Graviton3E 2022発表 HPC ⽤途に最適化対Graviton3 ⽐最⼤35％計算性能向上

Slide 22

Slide 22 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Graviton 搭載 EC2 インスタンスファミリー Graviton Graviton2 Graviton3 Graviton3E ⼀般⽤途向け A1 M6g,M6gd T4g (無償トライアル) M7g (New) コンピューティング最適化 C6g, C6gd, C6gn C7g C7gn (Preview) メモリ最適化 R6g, R6gd X2gd R7g (New) ⾼速コンピューティング G5g ストレージ最適化 Im4gn, Is4gen HPC最適化 HPC7g(アナウンス) 橙字︓東京・⼤阪対応⽩字︓東京対応・⼤阪未対応紫字︓東京・⼤阪未対応 2023/3/15現在 https://www.youtube.com/watch?v=MNHch4kIkyo

Slide 23

Slide 23 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Graviton3 インスタンスの機械学習推論性能向上 0 1 2 3 4 5 6 7 8 TF MLPerf Resnet50 TF MLPerf Bert TF Rec Model TF NLP Model PT Torchbench Resnet50 PT MLPerf Bert TensorFlow とPyTorch における推論の相対性能 c7g.4xl c6g.4xl

Slide 24

Slide 24 text

Slide 25

Slide 25 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda - はじめに - AWS のシリコン⾰命〜”コスパ”に対する AWS の挑戦 - AWS Nitro System - AWS Graviton - AWS Inferentia とAWS Trainium - ⼤規模機械学習を⽀える技術 - 分散学習を⽀える技術 - 分散推論を⽀える技術 - ⼤規模⾔語モデルの学習から推論を⼀気通貫して⾏うアーキテクチャの紹介性能とコストの両⽴⼤規模モデルの学習・推論を効率よく低価格で実⾏可能な計算資源ここで解決したい課題

Slide 26

Slide 26 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. 24 リージョンで利⽤可能 2019年12⽉⼀般提供開始 AWS 独⾃設計 ML チップ搭載インスタンス AWS Inferentia 初代 ML 推論チップ AWS Trainium 高性能 ML 学習チップ⽶国リージョンで利⽤可能 2022年10⽉⼀般提供開始 AWS Inferentia2 第２世代 ML 推論チップ⽶国リージョンで利⽤可能 2023年04⽉⼀般提供開始 NEW

Slide 27

Slide 27 text

Slide 28

Slide 28 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon EC2 Inf1 インスタンス⾼いスループット Inf1.xl GPU instance 1x スループット 1.25x 低コスト Inf1.xl GPU instance 1x 推論あたりのコスト 0.3x * Measured on PyTorch BERT-Base

Slide 29

Slide 29 text

Slide 30

Slide 30 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon 内での Inf1 活⽤事例 Amazon Alexa ⾃然な⾳声を⽣成する⾳声合成モデルを導⼊しており、全世界で1億台以上の Alexa デバイスをサポート。運⽤コストを 30％削減し、推論レイテンシーを 25％改善 https://aws.amazon.com/jp/blogs/news/majority-of-alexa-now-running-on-faster-more-cost- effective-amazon-ec2-inf1-instances/ 事例紹介記事

Slide 31

Slide 31 text

Slide 32

Slide 32 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon 内での Inf1 活⽤事例 Amazon Search 商品検索エンジンは、何⼗億もの商品をインデックスし、世界中の何億もの顧客にサービスを提供。 Transformer ベースの⾃然⾔語処理モデルを使⽤しインフラストラクチャのコストを 85% 削減 https://aws.amazon.com/jp/blogs/machine-learning/how-amazon-search-reduced-ml-inference- costs-by-85-with-aws-inferentia/ 事例紹介記事

Slide 33

Slide 33 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon 内での Inf1 活⽤事例 Amazon Robotics 1,000 台以上の SageMaker ホストを使⽤。モデルを再トレーニングすることなく、35% 低いコストと 20% ⾼いスループットで急速に増加するトラフィックに対応 https://aws.amazon.com/jp/solutions/case-studies/amazon-robotics-case-study/ 事例紹介記事

Slide 34

Slide 34 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. [Inf1] ⽇本国内のお客様の声株式会社マネーフォワード様「当社の AI チャットボットサービスを Amazon EC2 Inf1 インスタンスに移⾏するのは簡単でした。2 か⽉以内に移⾏を完了し、Amazon Elastic Container Service （ECS）を使⽤して AWS Inf1 インスタンスで⼤規模なサービスを開始しました。 Inf1 インスタンスあたり複数のモデルを提供することで、 (同等の GPU ベースのインスタンスに⽐べて) 推論レイテンシを 97% 削減し、推論コストを 50% 以上削減できました。」

Slide 35

Slide 35 text

Slide 36

Slide 36 text

Slide 37

Slide 37 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon EC2 Trn1/Trn1n インスタンス最も費⽤対効果の⾼い⾼性能 MLトレーニング向けインスタンス https://aws.amazon.com/jp/ec2/instance-types/trn1/ • 同等の GPU インスタンスと⽐較し最⼤50% 低価格を実現 • 最⼤16個の AWS Trainium アクセラレータ、 512GB の⾼速 HBM2メモリ、8TB のローカル NVMe SSDを搭載 • 最⼤ 1,600 Gbps (Trn1n) の Elastic Fabric Adapter (EFA) ネットワーク帯域 • Trainium 間は超⾼速 NeuronLink で接続 • Tensorflow、PyTorchなど主要MLフレームワークをサポート

Slide 38

Slide 38 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon EC2 Trn1/Trn1n インスタンス⼀覧インスタンスサイズ Trainium アクセラレータメモリ vCPU メモリネットワーク帯域オンデマンド価格 (USD/時間) Trn1.2 xlarge 1 32 GB 8 32 GB 最⼤ 10 Gbps 1.34 Trn1.32 xlarge 16 512 GB 128 512 GB 800 Gbps 21.5 Trn1n.32 xlarge 16 512 GB 128 512 GB 1600 Gbps 24.78 https://aws.amazon.com/jp/ec2/instance-types/trn1/ NEW *2023年4⽉時点の⽶国東部 (バージニア北部)の価格

Slide 39

Slide 39 text

Slide 40

Slide 40 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. 国内のお客様での Trn1 活⽤事例「私たちはユーモアを取り⼊れて、その場で⾯⽩い答えを出す、⾰新的でインタラクティブな AI チャットボットサービス「⼤喜利 AI」を提供するため、⼤規模⾔語モデルを採⽤しています。テンソル並列、データ並列を活⽤して、Trn1.32xlarge インスタンスで GPT ベースの⽇本語モデルを事前トレーニングしました。トレーニングは 28 ⽇以内に完了し、以前の GPU ベースのインフラストラクチャよりも 33% のコスト削減を実現しました。モデルが急速に複雑化し続けているため、⼤規模なモデルのトレーニングをスピードアップするために、Trn1 の 2 倍のネットワーク帯域幅を持つ Trn1n インスタンスを楽しみにしています。」株式会社わたしは最⾼技術責任者（CTO）⼩橋洋平様

Slide 41

Slide 41 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Trn1 の使⽤例（⼀部抜粋） https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch- neuronx/training/mnist_mlp/train.py Point 1: モデルやデータを Trainium上に配置（GPUと同様の⼿続き）

Slide 42

Slide 42 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Trn1 の使⽤例（⼀部抜粋） https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch- neuronx/training/mnist_mlp/train.py Point 2: コンパイルとトレーニングステップの実⾏

Slide 43

Slide 43 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Trainium: ⾼性能、低電⼒、低コストを両⽴ Details: Hugging Face Bert-Large, FP32, On-Demand EC2 pricing 2.3x ⾼速なトレーニング GPU Cluster Trn1 Cluster Hours トレーニング時間 47% 低電⼒ GPU Cluster Trn1 Cluster Kilowatts 電⼒ 72% 低コスト GPU Cluster Trn1 Cluster USD トレーニングコスト Training BERT Large with AWS Trainium

Slide 44

Slide 44 text

Slide 45

Slide 45 text

Slide 46

Slide 46 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon EC2 Inf2 インスタンス⼀覧インスタンスサイズ Inferentia 2 アクセラレータメモリ vCPU メモリネットワーク帯域オンデマンド価格 (USD/時間) Inf2.xlarge 1 32 GB 4 16 GB 最⼤ 15 Gbps 0.76 Inf2.8xlarge 1 32 GB 32 128 GB 最⼤ 25 Gbps 1.97 Inf2.24xlarge 6 192 GB 96 384 GB 50 Gbps 6.49 Inf2.48xlarge 12 384 GB 192 768 GB 100 Gbps 12.98 *2023年4⽉時点の⽶国東部 (バージニア北部)の価格

Slide 47

Slide 47 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Inferentia2: ⾼パフォーマンス、省エネ、低コスト BERT-Large with AWS Inferentia2 1000万件の推論を実⾏時に同等のスループットを達成するインスタンス数で⽐較 50% より少ないインスタンス GPU Instances Inf2.2xl Instances インスタンス数 50% 省エネ GPU Instances Inf2.2xl Watts Power 65% 低コスト GPU Instances Inf2.2xl USD Inference Cost

Slide 48

Slide 48 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Inf2 の使⽤例（⼀部抜粋） https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch- neuronx/inference/hf_pretrained_bert_inference_on_trn1.ipynb

Slide 49

Slide 49 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Stable Diffusion on Inf2.xlarge インスタンス GPU インスタンスと⽐べて 50% 低コスト（⽣成イメージ数あたりのコスト) を実現クエリ例１︓ “A photo of an astronaut riding a horse on mars” （⽕星で⾺に乗る宇宙⾶⾏⼠の写真）

Slide 50

Slide 50 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Stable Diffusion on Inf2.xlarge インスタンス GPU インスタンスと⽐べて 50% 低コスト（⽣成イメージ数あたりのコスト) を実現クエリ例2︓ “a highly detailed matte painting of a man on a hill watching a city” （丘の上の男が街を眺める様⼦を描いた⾼精細なマットペイント作品）

Slide 51

Slide 51 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Inferentia・Trainium について詳しくは https://jawsug-ai.connpass.com/event/261173/ https://resources.awscloud.com/aws-ai-and-machine-learning-japan-aws-innovate/

Slide 52

Slide 52 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda ここまで - はじめに - AWS のシリコン⾰命〜”コスパ”に対する AWS の挑戦 - AWS Nitro System - AWS Graviton - AWS Inferentia とAWS Trainium - ⼤規模機械学習を⽀える技術 - 分散学習を⽀える技術 - 分散推論を⽀える技術 - ⼤規模⾔語モデルの学習から推論を⼀気通貫して⾏うアーキテクチャの紹介性能とコストの両⽴⼤規模モデルの学習・推論を効率よく低価格で実⾏可能な計算資源ここで解決したい課題

Slide 53

Slide 53 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda - はじめに - AWS のシリコン⾰命〜”コスパ”に対する AWS の挑戦 - AWS Nitro System - AWS Graviton - AWS Inferentia とAWS Trainium - ⼤規模機械学習を⽀える技術 - 分散学習を⽀える技術 - 分散推論を⽀える技術 - ⼤規模⾔語モデルの学習から推論を⼀気通貫して⾏うアーキテクチャの紹介ここで解決したい課題拡張性⼤規模モデルの学習・推論に対応可能な分散学習・分散推論

Slide 54

Slide 54 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda - はじめに - AWS のシリコン⾰命〜”コスパ”に対する AWS の挑戦 - AWS Nitro System - AWS Graviton - AWS Inferentia とAWS Trainium - ⼤規模機械学習を⽀える技術 - 分散学習を⽀える技術 - 分散推論を⽀える技術 - ⼤規模⾔語モデルの学習から推論を⼀気通貫して⾏うアーキテクチャの紹介ここで解決したい課題拡張性⼤規模モデルの学習・推論に対応可能な分散学習・分散推論

Slide 55

Slide 55 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. L1 L2 L3 L4 L1 L4 トレーニングデータ ML モデル Worker #0 Worker #1 Worker #2 L2 L1 L4 L3 L3 L1 L4 L2 L1 L2 L3 L4 L1 L4 L2 L1 L4 L3 L3 L1 L4 L2 Tensor 並列型分散学習

Slide 56

Slide 56 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Transformers https://arxiv.org/pdf/1909.08053.pdf • 近年の⼤規模⾃然⾔語処理モデルで⽤いられる深層学習モデル • GPT モデルなどの⼤規模なTransformers ベースモデルの効率的な学習には、主要なコンポーネントであるAttention Block とMLP Block の効率的な並列化が重要 Trn1ではMegatron-LM を⽤いた集団通信の効率化（無駄な通信の削減）をサポート

Slide 57

Slide 57 text

Slide 58

Slide 58 text

Slide 59

Slide 59 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. 第２世代 EFA によるスケーリングの例 GPT-3 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/general/benchmarks/trn1/trn1- performance.html#trn1-performance 0 20 40 60 80 100 120 140 1 16 インスタンス数 Trn1 スケーリング効率 92.74 % - Megatron-LM を⽤いたTrn1 上での分散学習例 - 1インスタンス - Global minibatch 64 - 16 インスタンス - Global minibatch 1024 ⼀秒間に処理するシーケンス数 https://awsdocs- neuron.readthedocs- hosted.com/en/latest/frameworks/t orch/torch- neuronx/tutorials/training/megatron _lm_gpt.html#megatron-lm- pretraining-tutorial

Slide 60

Slide 60 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda - はじめに - AWS のシリコン⾰命〜”コスパ”に対する AWS の挑戦 - AWS Nitro System - AWS Graviton - AWS Inferentia とAWS Trainium - ⼤規模機械学習を⽀える技術 - 分散学習を⽀える技術 - 分散推論を⽀える技術 - ⼤規模⾔語モデルの学習から推論を⼀気通貫して⾏うアーキテクチャの紹介ここで解決したい課題拡張性⼤規模モデルの学習・推論に対応可能な分散学習・分散推論

Slide 61

Slide 61 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Inf2 による分散推論 Inf2 Inf2 Inf2 Inf2 Inf2 Inf2 Inf2 Inf2 Inf2 Inf2 Inf2 Inf2 プレビュー最⼤12チップをリングトポロジーで接続 • Neuron リンク V2 による⾼速チップ間通信 • 10TB/s 広帯域メモリアクセス、384GBの⼤容量アクセラレータメモリ N E U R O N リンク V 2 ⾼速インタコネクト

Slide 62

Slide 62 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Inferentia2 LLM 推論性能 GPU インスタンス 619.6 368.6 OPT-30B スループット性能 (tokens/sec) FP16, Seqlen 2048, B16 65% ⾼い性能 inf2.48xlarge Out of Memory OPT-66B スループット性能 (tokens/sec) FP16, Seqlen 2048, GPU インスタンス GPU インスタンス $59.15 $ 122.7 OPT-30B 1M 推論あたりのコスト (USD) FP16, Seqlen 2048, B16 52% 低コスト 351 inf2.48xlarge inf2.48xlarge

Slide 63

Slide 63 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda ここまで - はじめに - AWS のシリコン⾰命〜”コスパ”に対する AWS の挑戦 - AWS Nitro System - AWS Graviton - AWS Inferentia とAWS Trainium - ⼤規模機械学習を⽀える技術 - 分散学習を⽀える技術 - 分散推論を⽀える技術 - ⼤規模⾔語モデルの学習から推論を⼀気通貫して⾏うアーキテクチャの紹介ここで解決したい課題拡張性⼤規模モデルの学習・推論に対応可能な分散学習・分散推論

Slide 64

Slide 64 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda - はじめに - AWS のシリコン⾰命〜”コスパ”に対する AWS の挑戦 - AWS Nitro System - AWS Graviton - AWS Inferentia とAWS Trainium - ⼤規模機械学習を⽀える技術 - 分散学習を⽀える技術 - 分散推論を⽀える技術 - ⼤規模⾔語モデルの学習から推論を⼀気通貫して⾏うアーキテクチャの紹介ここで解決したい課題オーケストレーション多数のインスタンスからなる計算資源のオーケストレーション

Slide 65

Slide 65 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS の機械学習関連サービス Elastic Fabric Adapter Amazon S3 Amazon EBS Amazon FSx for Lustre Amazon EFS Storage & networking Amazon SageMaker AWS Deep Learning AMIs Amazon EKS Amazon ECS AWS Deep Learning Containers ML Frameworks Frameworks & Services Trn1 UltraClusters Amazon EC2 Trn1/Trn1n, Inf2, Inf1 Accelerated Compute

Slide 66

Slide 66 text

Slide 67

Slide 67 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Elastic Kubernetes Service (Amazon EKS) • マネージドKubernetes クラスター • Kubernetes エコシステムの OSS やツールがそのまま動かせる • VPC や FSx Lustre、S3 等の AWSサービスとの連携

Slide 68

Slide 68 text

Slide 69

Slide 69 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Kubeflow Pipeline Notebook Training Serving データの前処理やトレーニングの実⾏、モデルのデプロイなどの Machine Learning Model-Development Lifecycle (MDLC) ͷ֤εςοϓʢ+ύΠϓϥΠϯʣΛΧόʔ͢ΔToolͷू·Γ

Slide 70

Slide 70 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Kubeflow on AWS https://awslabs.github.io/kubeflow-manifests/ • AWS マネージドサービスとの統合可能なKubeflow ディストリビューション • 対応サービス（⼀部） • Amazon S3 • Amazon Elastic File System • Amazon FSx for Lustre • Application Load Balancer • Amazon Sagemaker Component を⽤いた Sagamaker 統合

Slide 71

Slide 71 text

Slide 72

Slide 72 text

Slide 73

Slide 73 text

Slide 74

Slide 74 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Kubeflow Pipelines とは • Componentͱͯ͠ɺ֤εςοϓΛ࣮૷͠ɺͦΕΒΛPipelineͱͯ͠Ұ ࿈ͷॲཧʹ·ͱΊΔ͜ͱͰMLύΠϓϥΠϯΛߏங͢Δπʔϧ ߏ੒ཁૉ • Pipelineͷ֬ೝɾ࣮ߦʹ༻͍ΔUI • Pipeline࣮ߦΛεέδϡʔϦϯά͢Δ Engine • ύΠϓϥΠϯͷఆٛɺϏϧυɺσϓ ϩΠ͕ՄೳͳPython SDK • SDKͰͷύΠϓϥΠϯ։ൃɺ͓Αͼ ࣮ߦʹؔ͢ΔNotebook αϙʔτ

Slide 75

Slide 75 text

Slide 76

Slide 76 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. EX. Kubeflow Pipelines による学習→推論 • 右図︓BERT モデルのファインチューンから推論までのパイプライン • Kubernetes の NodeSelector 機能により各ステップを別のインスタンスタイプ上で実⾏可能 • Bert-train → Trn1 • Bert-trace → CPUインスタンス • Bert-infer → Inf2

Slide 77

Slide 77 text

Slide 78

Slide 78 text

Slide 79

Slide 79 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. まとめ⼤規模モデルの学習・推論基盤に求められるもの性能とコストオーケストレーション拡張性⼤規模モデルの学習・推論を効率よく低価格で実⾏可能な計算資源としての Inf2/Trn1 分散学習（Trn1）: EFA v2 による効率的な集団通信、分散学習ライブラリのサポート分散推論（Inf2）︓ Neuron Link v2を⽤いた⾼速チップ間通信多数の計算資源のオーケストレーションの選択肢としてのEKSと Kubeflow

Slide 80

Slide 80 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Thank you! © 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. 渡辺啓太アマゾンウェブサービスジャパン合同会社シニアソリューションアーキテクト、AI/ML フレームワークス