DeepSeek on AWS - Speaker Deck

Slide 1

Slide 1 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. 2 © 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Yoshitaka Haribara, Ph.D. T A I A A I # 0 7 - S C A L I N G A I P E R F O R M A N C E Sr. GenAI/Quantum Startup Solutions Architect AWS DeepSeek on AWS

Slide 2

Slide 2 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. 3 • DeepSeek-R1 and Distilled Models Overview • Accelerators: NVIDIA H200 GPU and AWS Trainium • Deployment options on AWS: Bedrock, SageMaker AI, and EC2 • Best practices Agenda

Slide 3

Slide 3 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. 4 DeepSeek offers a range of open weights models and efficient distilled variants. Base and R1 Models (671B) • DeepSeek-V3: Base MoE model • DeepSeek-R1-Zero: Pure reinforcement learning • DeepSeek-R1: Cold-start data before RL Distilled Models • DeepSeek-R1-Distill-Qwen (1.5B, 7B, 14B, 32B) • DeepSeek-R1-Distill-Llama (8B and 70B) DeepSeek enables organizations to leverage advanced reasoning capabilities across multiple tasks.

Slide 4

Slide 4 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. 5 Core Capabilities • Advanced reasoning capabilities optimized for complex problem-solving (e.g. mathematics and coding tasks). • Outperforms on AIME 2024, MATH-500, and SWE-bench Verified. • Reportedly 90-95% more affordable than comparable models. • 671B Mixture of Experts (MoE) architecture, activation of 37B parameter. • DeepSeek-R1 requires at least 800 GB of HBM memory in FP8 format for inference.

Slide 5

Slide 5 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. 6 EC2 accelerated compute instances for AI/ML G6 (L4) P5 (H100) DL1 G6e (L40S) P4 (A100) P5e (H200) Inf1 Inf2 P5en (H200) Trn1 GPUs AI/ML accelerators and ASICs Trn2 G5 (A10G) AWS Trainium, Inferentia H100, H200, B200, GB200, A100, L40S, L4, A10G Cloud AI100 Standard Radeon GPU Xilinx accelerator Xilinx FPGA DL2q Gaudi accelerator Announced GB200 B200

Slide 6

Slide 6 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. 7 CPU CPU NSC EBS Host EFA PCIe SSD EFA SSD … Switching layer PCIe PCIe PCIe ML chip interconnect ML chip ML chip ML chip ML chip … Accelerators Accelerated compute architecture

Slide 7

Slide 7 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. 10 P5 instances Optimized for AI training and inference 900 GB/s NVSwitch for GPU peer-to-peer connections Scale-out with non-blocking interconnect Elastic Fabric Adapter (EFA) Instance GPU GPU memory CPU vCPU Instance memory Networking Local storage P5 8 NVIDIA H100 640 GB AMD Milan 192 2 TB 3200 Gbps EFAv2 30 TB SSD P5e 8 NVIDIA H200 1128 GB AMD Milan 192 2 TB 3200 Gbps EFAv2 30 TB SSD P5en 8 NVIDIA H200 1128 GB Intel SPR 192 2 TB 3200 Gbps EFAv3 30 TB SSD

Slide 8

Slide 8 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. 11 Bedrock Marketplace implementation • Bedrock Marketplace enables core DeepSeek-R1 deployment in managed endpoints • Complete code samples and step-by-step deployment guides provided for quick implementation • Standard Bedrock security and monitoring features

Slide 9

Slide 9 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. 12 Bedrock Marketplace delivers 100+ models from 30+ providers EVOLUTIONARY SCALE WIDN CAMB.AI GRETEL ARCEE AI PREFERRED NETWORKS WRITER UPSTAGE NCSOFT STOCKMARK KARAKURI JOHN SNOW LABS LIQUID DATABRICKS CYBERAGENT HUGGING FACE STABILITY AI LG AI RESEARCH M I S T R A L AI SNOWFLAKE N V I D I A DEEPSEEK

Slide 10

Slide 10 text

Slide 11

Slide 11 text

Slide 12

Slide 12 text

Slide 13

Slide 13 text

Slide 14

Slide 14 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. 17 Tips: Use proper chat template (model tokenizer) Example with DeepSeek-Distill-Llama-8B (via Bedrock CMI) 17 <｜begin▁of▁sentence｜><｜User｜>A man has 53 socks in his drawer: 21 identical blue, 15 identical black and 17 identical red. The lights are out, and he is completely in the dark. How many socks must he take out to make 100 percent certain he has at least one pair of black socks?<｜Assistant｜> When using Bedrock Playground, we must add proper chat template tags for optimal results. E.g.: When using InvokeModel API, we must configure proper tokenizer to apply the chat template. E.g.: tokenizer = AutoTokenizer.from_pretrained(hf_model_id) messages = [{"role": "user", "content": test_prompt}] prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=not continuation) Bad quality output Good quality output

Slide 15

Slide 15 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. 18 DeepSeek-R1 Responsible AI concerns 18 (through the ApplyGuardrail API) can provide an extra layer of security and responsible AI measures

Slide 16

Slide 16 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. 19 Enterprise Protection • Enterprise-grade security features built-in • Complete data privacy when using AWS services • No data sharing with model providers • End-to-end encryption for all operations • Access controls and governance features • Compliance with AWS security standards

Slide 17

Slide 17 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. 20 Critical Concerns • Models hosted by AWS without any communication with DeepSeek servers or APIs • No customer data used to improve base models • Enterprise data protection capabilities • Privacy control through AWS services

Slide 18

Slide 18 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. 24 Model Options • Distilled models maintain most core capabilities while reducing latency and cost • Optimized for different computational and performance requirements • DeepSeek-R1-Distill-Llama offered in 8B and 70B versions • DeepSeek-R1-Distill-Qwen available in 1.5B, 7B, 14B, 32B variants (SageMaker AI only)

Slide 19

Slide 19 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. 25 Custom Model Import implementation • Bedrock Custom Model Import enables DeepSeek deployment • Support for Llama 8B and 70B distilled DeepSeek R1 variants • Complete code samples and step-by-step deployment guides provided for quick implementation • Standard Bedrock security and monitoring features • Pricing is on-demand in 5-minute window from first successful invocation • There is a cold-start and scaling up/down time

Slide 20

Slide 20 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. 43 Trn1/Trn2 instances Powered by AWS Trainium/Trainium2 custom ML chips Optimized for large-scale training distributed workloads Trn2 Ultraservers with extended NeuronLink for trillion-parameter AI Neuron Kernel Interface (NKI) for custom operators Instance Accelerators Accelerator memory vCPU Instance memory Networking trn1.32xlarge 16 512 GB 128 512 GB 800 Gbps EFAv2 trn1n.32xlarge 16 512 GB 128 512 GB 1600 Gbps EFAv2 trn2.48xlarge 16 1.5 TB 192 2 TB 3.2 Tbps EFAv3

Slide 21

Slide 21 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. 44 AWS Trainium architecture • Tensor engine are based on power-optimized systolic array • AWS Neuron SDK supports typical architecutres such as Llama

Slide 22

Slide 22 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. 45 Summary: DeepSeek-R1 deployment options on AWS 1. Amazon Bedrock Marketplace (Amazon SageMaker JumpStart) for the DeepSeek-R1 model 2. Amazon Bedrock Custom Model Import for the DeepSeek-R1-Distill models 3. Amazon EC2 Trn1 instances for the DeepSeek-R1-Distill models DeepSeek on AWS Blog ↑ https://aws.amazon.com/blogs/aws/deepseek-r1-models-now-available-on-aws/

Slide 23

Slide 23 text

Slide 24

Slide 24 text

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. 47 Further reading • DeepSeek • Anthropic CEO Dario Blog • https://darioamodei.com/on-deepseek-and-export-controls • Startup Customer Case Studies on AWS • Sakana AI • https://aws.amazon.com/startups/learn/letting-nature-lead-how-sakana-ai-is- transforming-model-building?lang=en-US • ELYZA (Llama2 Speculative Decoding on AWS Inferentia2 chip) • https://aws.amazon.com/jp/blogs/startup/tech-interview-elyza-2024/ • LLM Development on Trn1 • https://aws.amazon.com/jp/blogs/machine-learning/unlocking-japanese-llms- with-aws-trainium-innovators-showcase-from-the-aws-llm-development-support- program/