re:Invent 2023 CMP319 Deploy LLMs with AWS Inferentia & Ray to optimize performance and cost

© 2023, Amazon Web Services, Inc. or its affiliates. All
rights reserved. © 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Deploy LLMs with AWS Inferentia & Ray to optimize performance and cost Keita Watanabe C M P 3 1 9 - R 1 Senior Solutions Architect AWS Scott Perry Senior Solutions Architect AWS

rights reserved. © 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Purpose-built accelerators for generative AI AWS Inferentia Lowest cost per inference in the cloud for running deep learning (DL) models Up to 70% lower cost per inference than comparable Amazon EC2 instances AWS Inferentia2 High performance at the lowest cost per inference for LLMs and diffusion models Up to 40% better price performance than comparable Amazon EC2 instances AWS Trainium The most cost-efficient, high- performance training of LLMs and diffusion models Up to 50% savings on training costs over comparable Amazon EC2 instances

rights reserved. © 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Llama 2 • High performance • Open source • Multiple sizes • Multiple variants Source: https://arxiv.org/pdf/2307.09288.pdf

rights reserved. © 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Ray support for Trainium and Inferentia N A T I V E S U P P O R T F O R A W S E C 2 T R N 1 A N D I N F 2 I N S T A N C E S • Native support for Trainium and Inferentia available in Ray 2.7 release • Can define number of NeuronCores required in cluster, actors, and tasks • Support for Ray Serve and Ray Train N EW

rights reserved. © 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Thank you! © 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Please complete the session survey in the mobile app Thank you! © 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Please complete the session survey in the mobile app

re:Invent 2023 CMP319 Deploy LLMs with AWS Infe...

re:Invent 2023 CMP319 Deploy LLMs with AWS Inferentia & Ray to optimize performance and cost

Keita Watanabe

More Decks by Keita Watanabe

Other Decks in Technology

Featured

Transcript

© 2023, Amazon Web Services, Inc. or its affiliates. All

© 2023, Amazon Web Services, Inc. or its affiliates. All

© 2023, Amazon Web Services, Inc. or its affiliates. All

© 2023, Amazon Web Services, Inc. or its affiliates. All

© 2023, Amazon Web Services, Inc. or its affiliates. All

© 2023, Amazon Web Services, Inc. or its affiliates. All

© 2023, Amazon Web Services, Inc. or its affiliates. All

© 2023, Amazon Web Services, Inc. or its affiliates. All