Slide 1

Slide 1 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Deploy LLMs with AWS Inferentia & Ray to optimize performance and cost Keita Watanabe C M P 3 1 9 - R 1 Senior Solutions Architect AWS Scott Perry Senior Solutions Architect AWS

Slide 2

Slide 2 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. What are we building today?

Slide 3

Slide 3 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Purpose-built accelerators for generative AI AWS Inferentia Lowest cost per inference in the cloud for running deep learning (DL) models Up to 70% lower cost per inference than comparable Amazon EC2 instances AWS Inferentia2 High performance at the lowest cost per inference for LLMs and diffusion models Up to 40% better price performance than comparable Amazon EC2 instances AWS Trainium The most cost-efficient, high- performance training of LLMs and diffusion models Up to 50% savings on training costs over comparable Amazon EC2 instances

Slide 4

Slide 4 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Llama 2 • High performance • Open source • Multiple sizes • Multiple variants Source: https://arxiv.org/pdf/2307.09288.pdf

Slide 5

Slide 5 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Ray support for Trainium and Inferentia N A T I V E S U P P O R T F O R A W S E C 2 T R N 1 A N D I N F 2 I N S T A N C E S • Native support for Trainium and Inferentia available in Ray 2.7 release • Can define number of NeuronCores required in cluster, actors, and tasks • Support for Ray Serve and Ray Train N EW

Slide 6

Slide 6 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Architecture

Slide 7

Slide 7 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Let’s code!

Slide 8

Slide 8 text

© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Thank you! © 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Please complete the session survey in the mobile app Thank you! © 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Please complete the session survey in the mobile app