Upgrade to Pro — share decks privately, control downloads, hide ads and more …

独自設計チップ AWS Inferentia と AWS Trainium による機械学習の高速化とコスト最適化 / AWS Innovate AWS Inferentia Trainium

独自設計チップ AWS Inferentia と AWS Trainium による機械学習の高速化とコスト最適化 / AWS Innovate AWS Inferentia Trainium

2022年 2 月 24 日開催、 AWS Innovate – AI/ML Editionより

AWS では、広く利用され歴史もある NVIDIA GPU を搭載したインスタンスを提供する一方、より高い性能、より高いコストパフォーマンスを持つ幅広い選択肢を提供するため、機械学習ワークロード向けの独自設計カスタムチップ、AWS Inferentia 及びAWS Trainium を開発してきました。このセッションでは、AWS Inferentia を搭載した ML 推論向け Inf1インスタンス、AWS Trainium を搭載した ML学習向け Trn1インスタンス、それぞれの概要と活用方法についてご紹介します。

動画視聴はこちらから
https://resources.awscloud.com/aws-ai-and-machine-learning-japan-aws-innovate/accelerate-machine-learning-and-cost-optimization-with-proprietary-chips-aws-inferentia-and-aws-trainium-jp-aws-innovate

Hiroshi Tokoyo

February 24, 2022
Tweet

More Decks by Hiroshi Tokoyo

Other Decks in Technology

Transcript

  1. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. • • • AWS does not offer binding price quotes. AWS pricing is publicly available and is subject to change in accordance with the AWS Customer Agreement available at http://aws.amazon.com/agreement/. Any pricing information included in this document is provided only as an estimate of usage charges for AWS services based on certain information that you have provided. Monthly charges will be based on your actual use of AWS services, and may vary from the estimates provided.
  2. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. 2 4 F e b r u a r y 2 0 2 2
  3. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  4. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. • • • • •
  5. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  6. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. AWS Nitro System , , , SSD, AWS Inferentia AWS Trainium AWS Graviton
  7. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  8. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. c6gn.8xlarge ファミリー 世代 機能 サイズ
  9. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Ice Lake CPU Cascade Lake CPU Habana accelerator EPYC CPU A100, A10G, T4G GPUs Graviton CPU Inferentia Chip Trainium Chip UltraScale+ FPGA C7g C6g C6i C5a M6g M6i M6a R6g R6i R5a F1 Inf1 G5g G5 P4 DL1 Trn1 Elastic Inference
  10. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Ice Lake CPU Cascade Lake CPU Habana accelerator EPYC CPU A100, A10G, T4G GPUs Graviton CPU Inferentia Chip Trainium Chip UltraScale+ FPGA C7g C6g C6i C5a M6g M6i M6a R6g R6i R5a F1 Inf1 G5g G5 P4 DL1 Trn1 Elastic Inference
  11. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  12. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. • • AWS AWS Inferentia • • GPU 2.3 70% • (TensorFlow, PyTorch, MXNet) https://aws.amazon.com/ec2/instance-types/inf1/
  13. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. • 4 • 1~16 Inferentia • 6xlarge 24xlarge Inferentia • 100Gbps • 2022 1 23 • EC2 • Savings Plan *2022年1月時点の米国東部 (バージニア北部)の価格
  14. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. • AWS • 4 Neuron / • 128 TOPS (2,000 TOPS @24xlarge) • 2 • 8GB DRAM • FP16, BF16, INT8 • FP32 BF16 • https://aws.amazon.com/machine-learning/inferentia/
  15. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. $0.000 $0.300 $0.600 $0.900 G4dn.xl G5.xl Inf1.xl Yolov5 $0.000 $0.025 $0.050 $0.075 G4dn.xl G5.xl Inf1.xl Resnet50 $0.000 $0.100 $0.200 $0.300 G4dn.xl G5.xl Inf1.xl Bert-Base -49% -68% Bert-Base Yolov5 Resnet50 -42%
  16. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. https://awsdocs-neuron.readthedocs-hosted.com/ https://github.com/aws/aws-neuron-sdk
  17. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. • • • • • • AWS Deep Learning Containers AWS Deep Learning AMIs Amazon SageMaker AWS Elastic Kubernetes Service Amazon Elastic Container Service
  18. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. 0 40 80 120 160 0.00 0.30 0.60 0.90 1.20 G4dn Inf1
  19. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. https://aws.amazon.com/ec2/instance-types/inf1/#Customer_Testimonials Hotpot.ai Amazon Rekognition
  20. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. AMAZON ALEXA © 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  21. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  22. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. ELMo (2018) BERT-Large (2018) GPT-2 (2019) Turing NLG (2020) GPT-3 (2020) Switch-C (2021) … 100B 1B 1T 10T 10B 100M G R O W T H I N M O D E L C O M P L E X I T Y ( # O F P A R A M E T E R S ) 1. 2. 3. ML
  23. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. T H E M O S T C O S T - E F F I C I E N T D L I N S T A N C E I N T H E C L O U D B F 1 6 / F P 1 6 F P 3 2 840 TFLOPS T F 3 2 3.4 PFLOPS 3.4 PFLOPS T R A N S I S T O R S P E R C H I P 55,000,000,000 Trn1 3 GHz 512 GB 13.1 TB/sec 768 GB/sec 800 Gbps EFA
  24. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. T H E M O S T C O S T - E F F I C I E N T D L I N S T A N C E I N T H E C L O U D • • • • •
  25. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. • • • • • • • • • • https://aws.amazon.com/machine-learning/trainium/ Collective compute Neuron Neuron
  26. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. FP32 P R E C I S I O N R A N G E S TF32 BF16 FP16 cFP8 UINT8 0 1 2 3 4 BF16/FP16 TF32 FP32 Normalized Performance P3dn P4d Trn1 NLP/DLRM Computer vision >5x >2.5x 1.4x
  27. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. • • • • • https://arxiv.org/pdf/1502.02551.pdf
  28. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Petabits/s throughput, billions of IOPS Trn1 10K+ Trainium Chips Trn1 Trn1 Trn1 Trn1 Trn1 Trn1 Trn1 EC2 UltraClusters Petabit non- blocking TOR E C 2 U L T R A C L U S T E R 1 T R A I N I U M
  29. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. P3dn 256 GB 320 GB 512 GB P4d Trn1 P3dn 300 GB/s 600 GB/s 768 GB/s P4d Trn1 P3dn 100 Gb/s 400 Gb/s 800 Gb/s P4d Trn1
  30. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. 0 0.4 0.8 1.2 1.6 16 32 64 128 256 512 1024 Relative performance Batch size Trn1.32xl P4d.24xl S T R O N G S C A L I N G Trainium Compute Compute Comm Computation Communication Time
  31. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Amazon FSx for Lustre Amazon S3 EC2 Trn1 UltraCluster Amazon EC2 Trn1 Amazon SageMaker AWS Deep Learning AMIs Amazon EKS Amazon ECS AWS Deep Learning Containers Elastic Fabric Adapter Amazon EBS Amazon EFS Pytorch TensorFlow
  32. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  33. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. • • • • • • • •
  34. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. © 2021, Amazon Web Services, Inc. or its affiliates. All rights reserved. Hiroshi Tokoyo [email protected]
  35. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Appendix
  36. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. • AWS Inferentia • https://aws.amazon.com/jp/machine-learning/inferentia/ • Amazon EC2 Inf1 • https://aws.amazon.com/ec2/instance-types/inf1/ • AWS Trainium • https://aws.amazon.com/jp/machine-learning/trainium/ • Amazon EC2 Trn1 • https://aws.amazon.com/ec2/instance-types/trn1/ • Amazon EC2 Trn1 URL • https://pages.awscloud.com/EC2-Trn1-Preview.html
  37. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. • • • • • •
  38. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. • • • • • • •
  39. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. • • • • • • •
  40. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. • • • • • • •
  41. © 2022, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. • • • • • • •