Slide 4
Slide 4 text
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Placement Group
Fundamental architectures for training and inference
explain hardware component (GPU/Trainium/EFA)
Availability Zone
Amazon FSx for Lustre
/scratch
EKS Nodegroup
Availability Zone 1
Availability Zone 2
Training Inference
• Instances: P5, P4d(e), Trn1, g5
• Scale: POC = 1-64 instances, PROD = 4-100s…
• Care for: EFA, EC2 capacity, shared network
• Cost objective: cost-to-train ($/iteration)
• Instances: G5, g4dn, Inf1, Inf2, CPU based instances
• Scale: POC = 1-64 instances, PROD = 1-1000s…
• Care for: scaling latency (predictive, metric, capacity)
• Cost objective: serving at scale and fast, $/inference
Tightly coupled, communication heavy &
inter-node latency sensitive
Loosely coupled, fast scaling in/out &
query latency sensitive