Slide 1

Slide 1 text

© 2023, Amazon Web Services, Inc. or its affiliates. © 2023, Amazon Web Services, Inc. or its affiliates. Generative AI on AWS Deep Dive Sungmin Kim Sr. AI/ML Specialist SA, AWS

Slide 2

Slide 2 text

© 2023, Amazon Web Services, Inc. or its affiliates. Agenda • Generative AI 101 • Rise of Foundation Models • 3 Ways to use Foundation Models § Model Builders § Model Tuners § Model Consumers • Customizing a Foundation Model § Prompt Engineering § Fine-tuning § Pre-training • Benefits of deploying models in SageMaker • Use cases

Slide 3

Slide 3 text

© 2023, Amazon Web Services, Inc. or its affiliates. © 2023, Amazon Web Services, Inc. or its affiliates. Generative AI 101

Slide 4

Slide 4 text

© 2023, Amazon Web Services, Inc. or its affiliates. Traditional programming vs. machine learning Input Output Developer Input Output Developer Algorithm Model Training Data Traditional programming Machine learning Program

Slide 5

Slide 5 text

© 2023, Amazon Web Services, Inc. or its affiliates. Machine Learning ⊃ Deep Learning Artificial Intelligence Machine Learning Deep Learning

Slide 6

Slide 6 text

© 2023, Amazon Web Services, Inc. or its affiliates. Deep Learning - 인공 뉴런(Neuron) 자 극 전 달

Slide 7

Slide 7 text

© 2023, Amazon Web Services, Inc. or its affiliates. Deep Learning – 무수히 많은 가중치(Weight) 계산! Data Model ML Algorithm 숫자 0, 1, 2,.. 0.01, 0.02,.. 001, 010, 011, .. model {w1 , w2 , …, wn , b} f(x) = w1 *x1 + w2 *x2 + … + wn *xn + b = W*x + b ML 알고리즘은 Numeric data로 f(x)라는 식(모델)의 매개변수(w1 , w2 , …, b)를 추정 하는 방법 인공 뉴런 Deep Neural Network Model

Slide 8

Slide 8 text

© 2023, Amazon Web Services, Inc. or its affiliates. Deep Learning Artificial Intelligence Machine Learning Deep Learning Gen. AI GAN DALL-E Stable Diffusion ML ⊃ Deep Learning ⊃ Generative AI CNN RNN

Slide 9

Slide 9 text

© 2023, Amazon Web Services, Inc. or its affiliates. Generative AI is transforming AI I M A G E G E N E R A T I O N , T R A N S F O R M A T I O N , U P S C A L I N G Image transformation 4x Upscaling Generated by Stable Diffusion 2.0

Slide 10

Slide 10 text

© 2023, Amazon Web Services, Inc. or its affiliates. Deep Learning Artificial Intelligence Machine Learning Deep Learning Gen. AI LLM GAN DALL-E Stable Diffusion GPT-2, GPT-3 PaLM LLaMA Amazon Titan ML ⊃ Deep Learning ⊃ Generative AI ⊃ LLM CNN RNN

Slide 11

Slide 11 text

© 2023, Amazon Web Services, Inc. or its affiliates. Large Language Model (LLM) T E X T G E N E R A T I O N , T E X T S U M M A R I Z A T I O N , E X T R A C T I O N , C L A S S I F I C A T I O N Cohere’s transcript summarization API AI21 Labs’ wordtune read AI21 Labs’ wordtune

Slide 12

Slide 12 text

© 2023, Amazon Web Services, Inc. or its affiliates. Amazon CodeWhisperer C O D E G E N E R A T I O N 12 ML-powered service that generates code recommendations based on developer comments in natural language and existing code. Large Language Model (LLM)

Slide 13

Slide 13 text

© 2023, Amazon Web Services, Inc. or its affiliates. Deep Learning Artificial Intelligence Machine Learning Deep Learning Gen. AI LLM GAN DALL-E Stable Diffusion GPT-2, GPT-3 PaLM LLaMA Amazon Titan ML ⊃ Deep Learning ⊃ Generative AI ⊃ LLM CNN RNN

Slide 14

Slide 14 text

© 2023, Amazon Web Services, Inc. or its affiliates. Generative AI refers to artificial intelligence that can generate novel content Reduces time and cost to develop ML models and innovate faster Applicable to many use cases like text summarization, question answering, digital art creation, code generation, etc. Powered by foundation models pre-trained on large sets of data Tasks can be customized for specific domains with minimal fine-tuning AI that can produce original content close enough to human generated content for real-world tasks

Slide 15

Slide 15 text

© 2023, Amazon Web Services, Inc. or its affiliates. © 2023, Amazon Web Services, Inc. or its affiliates. Rise of Foundation Models

Slide 16

Slide 16 text

© 2023, Amazon Web Services, Inc. or its affiliates. 2019 2021 2022 1B 2B 3B 4B 5B 6B YEAR PARAMETERS 330M 175B 540B increase in size of model as measured by number of parameters 2019–2022 1,600x AI models are getting bigger … A L O T B I G G E R

Slide 17

Slide 17 text

© 2023, Amazon Web Services, Inc. or its affiliates. Typical ML Workflow ML Problem Framing Real-World Problem Define ML Problem Data Preparation Build Training & Tuning Deploy Raw Data

Slide 18

Slide 18 text

© 2023, Amazon Web Services, Inc. or its affiliates. Summarization Text generation Question Answering Model1 Model2 Model3 Traditional Model

Slide 19

Slide 19 text

© 2023, Amazon Web Services, Inc. or its affiliates. Foundation Model Foundation Model Pre-train / Fine-tune Task-specific training Data

Slide 20

Slide 20 text

© 2023, Amazon Web Services, Inc. or its affiliates. Traditional Model vs Foundation Model Text generation Summarization Information extraction Q&A Chatbot Pretrain Adapt Tasks Unlabeled data Foundation model Text generation Summarization Information extraction Q&A Chatbot Train Deploy Tasks ML models … … … … Labeled data … … … …

Slide 21

Slide 21 text

© 2023, Amazon Web Services, Inc. or its affiliates. How does generative AI work? Foundation model Text input Output Text generation model (also known as large language model) Image generation model Video Audio Code generation model “Summarize this article …….” [Text] “………..” “a photo of an astronaut riding a horse on mars” “A young couple walking in rain.” “Children singing nature songs” “Write Python code to sort array …” [Image] [Video] [Audio] [Code]

Slide 22

Slide 22 text

© 2023, Amazon Web Services, Inc. or its affiliates. Generative AI and foundation models Generative AI is a type of AI that can create new content and ideas, including conversations, stories, images, videos, and music. Like all AI, generative AI is powered by ML models—very large models that are pre-trained on vast amounts of data and commonly referred to as Foundation Models (FMs). LLM FM CV NLP NLP – Natural Language Processing CV – Computer Vision LLM – Large Language Model FM – Foundation Model

Slide 23

Slide 23 text

© 2023, Amazon Web Services, Inc. or its affiliates. © 2023, Amazon Web Services, Inc. or its affiliates. 3 Ways to use generative AI

Slide 24

Slide 24 text

© 2023, Amazon Web Services, Inc. or its affiliates. We need to customize a foundation model, why? • Specific Task • Closed-domain knowledge • Current Knowledge • Improving the performance/quality • Reduce likelihood of hallucinations Foundation Model Common Crawl Wikipedia Initial Pretraining Prompt (Question) Response (Answer) Which FMs are supported by Amazon Bedrock? Radio X FM & TV

Slide 25

Slide 25 text

© 2023, Amazon Web Services, Inc. or its affiliates. Ways to use generative AI Use generative AI services or APIs offered by foundation model vendors Build your own foundation model from scratch Start with publicly available foundation models Model Builders Model Tuners Model Consumers

Slide 26

Slide 26 text

© 2023, Amazon Web Services, Inc. or its affiliates. Ways to use generative AI Start with publicly available foundation models Use generative AI services or APIs offered by foundation model vendors No control over data, costs, and no customization support Substantial undifferentiated work needed to operationalize Build your own foundation model from scratch Expensive, time consuming and requires deep expertise Model Builders Model Tuners Model Consumers

Slide 27

Slide 27 text

© 2023, Amazon Web Services, Inc. or its affiliates. How to customize a foundation model Prompt engineering on existing models Fine tuning Pretraining Training duration (and cost) Not required Minutes to hours Days to weeks to months Customization • No customization on model • Customizing the prompt Some • Specific task tuning • Added domain- specific training data FULL • NN architecture and size • Vocabulary size • Context length • Training data Expertize needed Low Medium High

Slide 28

Slide 28 text

© 2023, Amazon Web Services, Inc. or its affiliates. Prompt Engineering on existing models Foundation Model Pre-train / Fine-tune Task-specific training Data Prompt Generated artifact

Slide 29

Slide 29 text

© 2023, Amazon Web Services, Inc. or its affiliates. Fine-tuning Foundation Model Pre-train / Fine-tune Task-specific training Data Prompt Generated artifact

Slide 30

Slide 30 text

© 2023, Amazon Web Services, Inc. or its affiliates. Pre-training Foundation Model Pre-train / Fine-tune Task-specific training Data Prompt Generated artifact Pre-train

Slide 31

Slide 31 text

© 2023, Amazon Web Services, Inc. or its affiliates. The Generative AI Journey Prompt engineering on existing models Fine tuning Pretraining Custom Foundation Models Model Builders Model Tuners Model Consumers

Slide 32

Slide 32 text

© 2023, Amazon Web Services, Inc. or its affiliates. SageMaker Training and Inference How do I access foundation models on AWS? Amazon Bedrock • The easiest way to build and scale generative AI applications with foundation models (FMs) • Access directly or fine-tune foundation model using API • Serverless • Foundation model providers – Amazon, Anthropic, AI21, Stability Amazon SageMaker JumpStart • Machine learning (ML) hub with foundation models (public and proprietary), • Built-in algorithms, and prebuilt ML solutions • Deploy FM as SageMaker Endpoint (hosting) • Fine-tuning leverages SageMaker Training jobs • Choose SageMaker managed accelerated computing instance API Layer Amazon Bedrock Prompt / text embedding Fine-tune Foundation Models Prompt / text embeddings API Layer SageMaker Endpoint Foundation Models SageMaker Jumpstart Model hub, deploy, fine-tune Accelerated Computing Trn1(n), Inf2, P4d, P5 Fine-tune

Slide 33

Slide 33 text

© 2023, Amazon Web Services, Inc. or its affiliates. The AWS AI/ML stack B R O A D E S T A N D M O S T C O M P L E T E S E T O F M A C H I N E L E A R N I N G C A P A B I L I T I E S ML FRAMEWORKS & INFRASTRUCTURE STUDIO IDE AMAZON SAGEMAKER CANVAS No-code ML for business analysts STUDIO LAB Learn ML GROUND TRUTH Label data Prepare data Geospatial ML Store features Build with notebooks Train models Tune parameters Manage and monitor Deploy in production PyTorch, Apache MXNet, TensorFlow Amazon EC2 CPUs GPUs AWS Trainium AWS Inferentia FPGA Habana Gaudi CORE SPECIALIZED AI SERVICES BUSINESS PROCESSES Amazon Personalize Amazon Forecast Amazon Fraud Detector Amazon Lookout for Metrics SEARCH Amazon Kendra CONVERSATION Amazon Lex Amazon Transcribe Call Analytics Contact Lens Voice ID CODE + DEVOPS Amazon CodeGuru Amazon CodeWhisperer Amazon DevOps Guru INDUSTRIAL Amazon Monitron Amazon Lookout for Equipment Amazon Lookout for Vision HEALTH Amazon HealthLake Amazon Comprehend Medical Amazon Transcribe Medical Amazon Omics TEXT Amazon Translate Amazon Comprehend SPEECH Amazon Polly Amazon Transcribe VISION Amazon Textract Amazon Rekognition AWS Panorama CI/CD | GOVERNANCE | RESPONSIBLE ML EDGE MANAGER Manage edge devices Generative AI TEXT/VISION Amazon Bedrock Foundation Models Amazon Bedrock API JUMPSTART Foundation Models Solutions

Slide 34

Slide 34 text

© 2023, Amazon Web Services, Inc. or its affiliates. End-to-End Machine Learning Platform Zero setup Flexible Model Training Pay by the second $ Amazon SageMaker 손 쉬 운 기 계 학 습 모 델 생 성 , 훈 련 및 서 비 스 배포 완전 관리 서비 스

Slide 35

Slide 35 text

© 2023, Amazon Web Services, Inc. or its affiliates. Foundation models available through Amazon SageMaker

Slide 36

Slide 36 text

© 2023, Amazon Web Services, Inc. or its affiliates. © 2023, Amazon Web Services, Inc. or its affiliates. Prompt tuning/engineering

Slide 37

Slide 37 text

© 2023, Amazon Web Services, Inc. or its affiliates. Data + Task input Output Model Prompt Engineering, new way of using ML! Prompt Generated artifact Input Output Model

Slide 38

Slide 38 text

© 2023, Amazon Web Services, Inc. or its affiliates. Why Prompt Engineering? Prompt Generated artifact Input Output Ensure more accurate and useful responses

Slide 39

Slide 39 text

© 2023, Amazon Web Services, Inc. or its affiliates. Why Prompt Engineering? Before After Fine-tune a model with prompt instruction Prompt Generated artifact Input Output

Slide 40

Slide 40 text

© 2023, Amazon Web Services, Inc. or its affiliates. Prompt Engineering Techniques • Many advanced prompting techniques have been designed to improve performance on complex tasks • Zero-shot prompts • Few-shot prompts • Chain-of-thought (CoT) prompting

Slide 41

Slide 41 text

© 2023, Amazon Web Services, Inc. or its affiliates. Zero shot learning Prompt Generated artifact Input Output

Slide 42

Slide 42 text

© 2023, Amazon Web Services, Inc. or its affiliates.

Slide 43

Slide 43 text

© 2023, Amazon Web Services, Inc. or its affiliates. Few shot learning Prompt Task description Example Input 1 Example Input 2 Example Input 3 Output Indicator Generated response Input Output

Slide 44

Slide 44 text

© 2023, Amazon Web Services, Inc. or its affiliates. Example of few shot learning Movie review sentiment classifier. Review: "I loved this movie!" This review is positive. Review: "I am not sure, I think the movie was fine.” This review is neutral. Review: "This movie was a waste of time and money" This review is negative. Review: "I really had fun watching this movie” This review is Positive Input Output Model Task Description Examples Output indicator

Slide 45

Slide 45 text

© 2023, Amazon Web Services, Inc. or its affiliates. Chain of Thought Prompting • Chain-of-thought (CoT) prompting is a technique for improving the reasoning ability of large language models by prompting them to generate a series of intermediate steps that lead to the final answer of a multi-step problem. Source: https://arxiv.org/pdf/2201.11903.pdf

Slide 46

Slide 46 text

© 2023, Amazon Web Services, Inc. or its affiliates. Multiple ways to do prompt engineering Prompt Engineering Foundation Models SageMaker Jumpstart Model hub, deploy, fine-tune 1 SageMaker JumpStart 2 SageMaker Ground Truth Plus

Slide 47

Slide 47 text

© 2023, Amazon Web Services, Inc. or its affiliates. Try-Out Experience • Try out the models and model prompts without running code or incurring costs • Available for proprietary models in Top 10 in HELM(Holistic Evaluation of Language Models) benchmarks and public models for comparison purposes • This is a shared environment in a SageMaker escrow account, model providers don’t see any of the prompts but we recommend 1/ don’t use confidential or proprietary in the try-out experience or 2/ create endpoint to test proprietary data

Slide 48

Slide 48 text

© 2023, Amazon Web Services, Inc. or its affiliates. SageMaker Ground Truth Plus Fully managed service by AWS to create high-quality training datasets Question & Answer Generation Image Captions Text Ranking Video Captions

Slide 49

Slide 49 text

© 2023, Amazon Web Services, Inc. or its affiliates. SageMaker Ground Truth Plus Question & Answer Generation

Slide 50

Slide 50 text

© 2023, Amazon Web Services, Inc. or its affiliates. SageMaker Ground Truth Plus Text Ranking

Slide 51

Slide 51 text

© 2023, Amazon Web Services, Inc. or its affiliates. SageMaker Ground Truth Plus Image Captions

Slide 52

Slide 52 text

© 2023, Amazon Web Services, Inc. or its affiliates. SageMaker Ground Truth Plus Video Captions

Slide 53

Slide 53 text

© 2023, Amazon Web Services, Inc. or its affiliates. Multiple ways to do prompt engineering Prompt Engineering Foundation Models SageMaker Jumpstart Model hub, deploy, fine-tune 1 SageMaker JumpStart 2 SageMaker Ground Truth Plus

Slide 54

Slide 54 text

© 2023, Amazon Web Services, Inc. or its affiliates. © 2023, Amazon Web Services, Inc. or its affiliates. Fine-tuning foundation models

Slide 55

Slide 55 text

© 2023, Amazon Web Services, Inc. or its affiliates. Foundation models available through Amazon SageMaker

Slide 56

Slide 56 text

© 2023, Amazon Web Services, Inc. or its affiliates. Proprietary Models in Gated Preview only Foundation models available through Amazon SageMaker

Slide 57

Slide 57 text

© 2023, Amazon Web Services, Inc. or its affiliates. Tasks Algorithms/models Vision Text Tabular Audio SageMaker JumpStart: ML hub for SageMaker Customers 400+ algorithms and pre-trained, state-of-the-art, open-source models from PyTorch Hub, TensorFlow Hub, and Hugging Face, etc.

Slide 58

Slide 58 text

© 2023, Amazon Web Services, Inc. or its affiliates. Browse and search SageMaker JumpStart content Search for topics or problem types, and get relevant results across all content Browse by content type to explore solutions, models, example notebooks, blogs, and video tutorials

Slide 59

Slide 59 text

© 2023, Amazon Web Services, Inc. or its affiliates. Easy deploy experience • Training instance type • Security Settings

Slide 60

Slide 60 text

© 2023, Amazon Web Services, Inc. or its affiliates. Easy fine-tune experience • Labeled data set path • Training instance type • Hyper-parameters & Security settings

Slide 61

Slide 61 text

© 2023, Amazon Web Services, Inc. or its affiliates. Demo Text Generation Text Generation with HuggingFace Models

Slide 62

Slide 62 text

© 2023, Amazon Web Services, Inc. or its affiliates. Try out models via AWS Console Deploy the model for inference using SageMaker hosting options includes single node Only selected models can be fine-tuned Automate ML workflow Easy to use FMs on SageMaker Jumpstart Data stays in your account including model, instances, logs, model inputs, model outputs Fully integrated with Amazon SageMaker features Choose foundation models offered by model providers 1 Try out model and/or deploy 2 Fine tune model and automate ML workflow 3

Slide 63

Slide 63 text

© 2023, Amazon Web Services, Inc. or its affiliates. Stability AI Models • Text2Image • Upscaling • In-painting Tasks • Generate photo- realistic images from text input • Improve quality of generated images Features • Fine-tuning on SD 2.1 model AlexaTM Models • AlexaTM 20B Tasks • Machine translation • Question Answering • Summarization • Annotation • Data Generation Co:here Models • Cohere Command Medium (6B) Tasks • Text generation • Information extraction • Question Answering • Summarization LightOn Models • Lyra-Fr (10B) Tasks • Text Generation • Information Extraction •Question Answering • Summarization • Sentiment Analysis • Classification Models •Jurassic-1 Grande Series •Jurassic Tasks • Text generation • Long-form generation • Summarization • Paraphrasing • Chat • Information extraction • Question Answering • Classification AI21 Proprietary models Models • Flan T-5 models (8 variants) • DistilGPT2, GPT2 • Bloom models (3 variants) Tasks • Machine Translation • Question Answering • Summarization • Annotation • Data generation HuggingFace Publicly available SageMaker JumpStart models and features

Slide 64

Slide 64 text

© 2023, Amazon Web Services, Inc. or its affiliates. © 2023, Amazon Web Services, Inc. or its affiliates. Benefits of deploying models on Amazon SageMaker

Slide 65

Slide 65 text

© 2023, Amazon Web Services, Inc. or its affiliates. Considerations when deploying models Cost saving opportunity in production Separation of Concerns Spend 90% Prediction 10% Training ML App interface Security

Slide 66

Slide 66 text

© 2023, Amazon Web Services, Inc. or its affiliates. Benefits of deploying models on Amazon SageMaker Cost saving opportunity in production Separation of Concerns Spend 90% Prediction 10% Training ML App interface Security

Slide 67

Slide 67 text

© 2023, Amazon Web Services, Inc. or its affiliates. Amazon SageMaker Deployment Hosting Services Inference Image Training Image Training Data Model artifacts Amazon SageMaker Amazon S3 Amazon ECR

Slide 68

Slide 68 text

© 2023, Amazon Web Services, Inc. or its affiliates. Amazon SageMaker Deployment Hosting Services Inference Image Training Image Training Data Model artifacts Amazon SageMaker Amazon S3 Amazon ECR Model artifacts

Slide 69

Slide 69 text

© 2023, Amazon Web Services, Inc. or its affiliates. Amazon SageMaker Deployment Hosting Services Inference Image Training Image Training Data Model artifacts Amazon SageMaker Amazon S3 Amazon ECR Model artifacts Inference Image

Slide 70

Slide 70 text

© 2023, Amazon Web Services, Inc. or its affiliates. Amazon SageMaker Deployment Hosting Services Inference Image Training Image Training Data Model artifacts Endpoint Amazon SageMaker Amazon S3 Amazon ECR Model artifacts Inference Image

Slide 71

Slide 71 text

© 2023, Amazon Web Services, Inc. or its affiliates. Amazon SageMaker Deployment SageMaker Endpoints (Private API) Auto Scaling group Availability Zone 1 Availability Zone 2 Availability Zone 3 Elastic Load Balancing Model Endpoint Client Deployment / Hosting Amazon SageMaker ML Compute Instances Input Data (Request) Prediction (Response)

Slide 72

Slide 72 text

© 2023, Amazon Web Services, Inc. or its affiliates. Amazon SageMaker Deployment SageMaker Endpoints (Public API) Auto Scaling group Availability Zone 1 Availability Zone 2 Availability Zone 3 Elastic Load Balancing Model Endpoint Amazon API Gateway Client Deployment / Hosting Amazon SageMaker ML Compute Instances Input Data (Request) Prediction (Response)

Slide 73

Slide 73 text

• AWS protects model tuner/consumer’s data • AWS protects model provider’s IP • Proprietary model package and endpoint is hosted in SageMaker/Bedrock owned escrow account • Containers have no outbound network access Security

Slide 74

Slide 74 text

© 2023, Amazon Web Services, Inc. or its affiliates. Benefits of deploying models on Amazon SageMaker Cost saving opportunity in production Separation of Concerns Spend 90% Prediction 10% Training ML App interface Security

Slide 75

Slide 75 text

© 2023, Amazon Web Services, Inc. or its affiliates. Single model deployment Single container Multi-container Invoke Response Inference Pipelines Real-time synchronous response Serverless GPUs CPUs Deploy model to serve inference Near real-time asynchronous response Invoke Response Offline batch inference Submit Complete Amazon SageMaker Multi-Model deployment Model Container Infrastructure

Slide 76

Slide 76 text

© 2023, Amazon Web Services, Inc. or its affiliates. Save costs by deploying on Amazon SageMaker Infrastructure cost Operations cost Infrastructure cost Operations cost Security and compliance cost • Compute instances • Storage • Network Operating, managing, and maintaining infrastructure Security and compliance for ML features, encrypt data and models, access policies, track and trace Deploy on SageMaker Self-managed deployment on Amazon EKS or Amazon ECS

Slide 77

Slide 77 text

© 2023, Amazon Web Services, Inc. or its affiliates. Benefits of deploying models on Amazon SageMaker Cost saving opportunity in production Separation of Concerns Spend 90% Prediction 10% Training ML App interface Security

Slide 78

Slide 78 text

© 2023, Amazon Web Services, Inc. or its affiliates. Cost Challenges with Inference Latency & Throughput Model Size

Slide 79

Slide 79 text

© 2023, Amazon Web Services, Inc. or its affiliates. Large Model Inference (LMI) container Large ML models with 100 billion + parameters Easily parallelize models across multiple GPUs to fit models into the instance and achieve low latency Deploy models on the most performant and cost- effective GPU-based instances or on AWS Inferentia Leverage 500GB of Amazon EBS volume per endpoint

Slide 80

Slide 80 text

© 2023, Amazon Web Services, Inc. or its affiliates. AWS purpose built accelerators for deep learning AWS Inferentia Lowest cost inference in the cloud for running deep learning models AWS Trainium The most cost-efficient for high performance training of LLMs and diffusion models AWS Inferentia2 High-performance at the lowest cost-per-inference for LLMs and diffusion models

Slide 81

Slide 81 text

© 2023, Amazon Web Services, Inc. or its affiliates. AWS purpose built accelerators for deep learning AWS Inferentia Lowest cost inference in the cloud for running deep learning models AWS Trainium The most cost-efficient for high performance training of LLMs and diffusion models AWS Inferentia2 High-performance at the lowest cost-per-inference for LLMs and diffusion models Inference Training

Slide 82

Slide 82 text

© 2023, Amazon Web Services, Inc. or its affiliates. AWS Trainium: High performance, less power, lower cost T R A I N I N G B E R T L A R G E W I T H A W S T R A I N I U M Details: Hugging Face Bert-Large, FP32, On-Demand EC2 pricing 2.3x Faster training GPU Cluster Trn1 Cluster Hours Time to train 47% Less energy GPU Cluster Trn1 Cluster Kilowatts Power 72% Lower cost GPU Cluster Trn1 Cluster USD Cost to train

Slide 83

Slide 83 text

© 2023, Amazon Web Services, Inc. or its affiliates. AWS Inferentia2: High performance, less power, lower cost R E A L - T I M E D E P L O Y M E N T B E R T - L A R G E W I T H A W S I N F E R E N T I A 2 50% Fewer instances GPU Instances Inf2.2xl Instances Number of instances 50% Less energy GPU Instances Inf2.2xl Watts Power 65% Lower cost GPU Instances Inf2.2xl USD Inference cost

Slide 84

Slide 84 text

© 2023, Amazon Web Services, Inc. or its affiliates. 2.9 3.2 3.1 1.3 8.8 6.1 1.9 3.1 0.3 12 0 2 4 6 8 10 12 14 BLO O M Z-176B (LM I) G PT-J-6B O PT-30B O PT66B G PT-N eoX-20B Stable D iffusion 768x768 Stable D iffusion 2 (AITem plate) Stable D iffusion 2 D epth2Im g FlanT5-xxl BLO O M -176B - H ugging Face… Latency P90 in secs • Input token size is 5 • Output token size is 50 • Batch size ranges from 1 to 11 • SageMaker Container: LMI [1] BLOOM-176B was tested on p4de.24xlarge (A100) with batch size 50 and int8 showed throughput of 350 tokens/sec. [2] Stable Diffusion 2.1 model is 768x768 image generation with 50 denoising steps. All other values are default [3] Stable Diffusion 2 Depth model using 512x512 image, DDIM scheduler with 512-base model, 62 denoising steps, strength of 0.8 (50 effective steps). All other values are default. [4] TGI to Hugging Face Text Generation Inference library p4de g5 p4d Benchmarking results for Large Model Inference on Amazon SageMaker

Slide 85

Slide 85 text

© 2023, Amazon Web Services, Inc. or its affiliates. Benefits of deploying models in Amazon SageMaker Cost saving opportunity in production Separation of Concerns Spend 90% Prediction 10% Training ML App interface Security

Slide 86

Slide 86 text

© 2023, Amazon Web Services, Inc. or its affiliates. © 2023, Amazon Web Services, Inc. or its affiliates. Use cases

Slide 87

Slide 87 text

© 2023, Amazon Web Services, Inc. or its affiliates. Text-to-Image Generation Source: https://aws.amazon.com/ko/blogs/tech/ai-art-stable-diffusion-sagemaker-jumpstart/

Slide 88

Slide 88 text

© 2023, Amazon Web Services, Inc. or its affiliates. Retrieval Augmented Generation (RAG) with LLMs Source: https://aws.amazon.com/blogs/machine-learning/question-answering-using- retrieval-augmented-generation-with-foundation-models-in-amazon-sagemaker-jumpstart/

Slide 89

Slide 89 text

© 2023, Amazon Web Services, Inc. or its affiliates. RAG Example: Medical Chatbot on AWS 환자 Query: HDL 콜레스테롤이 뭐지? Search Relevant Information Hospital Knowledge Sources Relevant Information for Enhanced Context: HDL 콜레스테롤은 쉽게 얘기하면 좋은 콜레스테롤 이라고 말할 수 있습니다. HDL 콜레스테롤 수치가 낮다면 심혈관계 위험인자가 되기 때문에 HDL 콜레스테롤 수치를 높여야 합니다. Generated Text Response: HDL 콜레스테롤은 혈액 속 남아있는 콜레스테롤을 간으로 운반하여 배설되게 하기 때문에, 혈관을 청소하는 ‘좋은 콜레스테롤'로 불려요. Fine-Tune API Layer SageMaker Endpoint For Prompt / Text embeddings Combine query to Enhanced Context 1 3 4 5 6 2 1 2 Call Inference API Accelerated Computing Trn1, Inf2, P4d, P5 SageMaker Jumpstart Model hub, deploy, fine-tune SageMaker Training and Inference Foundation Models Query: HDL 콜레스테롤이 뭐지? Enhanced Context: HDL 콜레스테롤은 쉽게 얘기하면 좋은 콜레스테롤 이라고 말할 수 있습니다. HDL 콜레스테롤 수치가 낮다면 심혈관계 위험인자가 되기 때문에 HDL 콜레스테롤 수치를 높여야 합니다.

Slide 90

Slide 90 text

© 2023, Amazon Web Services, Inc. or its affiliates. © 2023, Amazon Web Services, Inc. or its affiliates. Summary

Slide 91

Slide 91 text

© 2023, Amazon Web Services, Inc. or its affiliates. Generative AI and foundation models Generative AI is a type of AI that can create new content and ideas, including conversations, stories, images, videos, and music. Like all AI, generative AI is powered by ML models—very large models that are pre-trained on vast amounts of data and commonly referred to as Foundation Models (FMs). LLM FM CV NLP NLP – Natural Language Processing CV – Computer Vision LLM – Large Language Model FM – Foundation Model

Slide 92

Slide 92 text

© 2023, Amazon Web Services, Inc. or its affiliates. How does generative AI work? Foundation model Text input Output Text generation model (also known as large language model) Image generation model Video Audio Code generation model “Summarize this article …….” [Text] “………..” “a photo of an astronaut riding a horse on mars” “A young couple walking in rain.” “Children singing nature songs” “Write Python code to sort array …” [Image] [Video] [Audio] [Code]

Slide 93

Slide 93 text

© 2023, Amazon Web Services, Inc. or its affiliates. Ways to use generative AI Model Tuners Model Consumers Model Builders Use Amazon SageMaker Foundation Model Hub & JumpStart Third-party state-of-the-art pre- trained foundation models: Generative AI solutions powered by foundation models Build your own foundation model Amazon SageMaker GPT-J Bloom from HF Amazon Bedrock Amazon CodeWhisperer

Slide 94

Slide 94 text

© 2023, Amazon Web Services, Inc. or its affiliates. Prompt engineering on existing models Fine tuning Pretraining Training duration (and cost) Not required Minutes to hours Days to weeks to months Customization • No customization on model • Customizing the prompt Some • Specific task tuning • Added domain- specific training data FULL • NN architecture and size • Vocabulary size • Context length • Training data Expertize needed Low Medium High The Generative AI Journey How to customize a foundation model

Slide 95

Slide 95 text

© 2023, Amazon Web Services, Inc. or its affiliates. Benefits of deploying models in Amazon SageMaker Cost saving opportunity in production Separation of Concerns Spend 90% Prediction 10% Training ML App interface Security

Slide 96

Slide 96 text

© 2023, Amazon Web Services, Inc. or its affiliates. Generative AI Customers Demands Flexibility The most cost- effective infrastructure Secure customization The easiest way to build with FMs Generative AI- powered solutions

Slide 97

Slide 97 text

© 2023, Amazon Web Services, Inc. or its affiliates. Why AWS for generative AI? Flexibility The most cost- effective infrastructure Secure customization The easiest way to build with FMs Generative AI- powered solutions

Slide 98

Slide 98 text

© 2023, Amazon Web Services, Inc. or its affiliates. Why AWS for generative AI? Flexibility The most cost- effective infrastructure Secure customization The easiest way to build with FMs Generative AI- powered solutions AI21 Labs, Anthropic, Stability AI, Amazon에서 구축한 다양한 FM 중에서 선택하여 여러분의 use case에 적합한 모델을 찾아보세요.

Slide 99

Slide 99 text

© 2023, Amazon Web Services, Inc. or its affiliates. Why AWS for generative AI? Flexibility The most cost- effective infrastructure Secure customization The easiest way to build with FMs Generative AI- powered solutions 적은 labeled 샘플을 통해 비즈니스에 적합한 FM을 사용자에 맞게 만드세요. 모든 데이터는 암호화되어 Amazon Virtual Private Cloud (VPC)를 벗어나지 않으므로 여러분의 데이터가 private하고 confidential하게 유지된다는 점에서 신뢰할 수 있습니다.

Slide 100

Slide 100 text

© 2023, Amazon Web Services, Inc. or its affiliates. Why AWS for generative AI? Flexibility The most cost- effective infrastructure Secure customization The easiest way to build with FMs Generative AI- powered solutions AWS가 설계한 ML 칩과 NVIDIA GPU 기반의 인프라로 generative AI를 위한 최고의 가격 대비 성능을 확보하세요. 인프라를 비용 효율적으로 확장하여 수천억 개의 파라미터가 포함된 FM을 학습하고 실행할 수 있습니다.

Slide 101

Slide 101 text

© 2023, Amazon Web Services, Inc. or its affiliates. Why AWS for generative AI? Flexibility The most cost- effective infrastructure Secure customization The easiest way to build with FMs Generative AI- powered solutions 편리한 제어와 통합을 통해 Amazon SageMaker와 Amazon S3와 같은 광범위한 AWS 기능과 서비스를 사용하여 AWS에서 실행되는 애플리케이션 및 워크로드에 FM을 빠르게 통합 및 배포하세요.

Slide 102

Slide 102 text

© 2023, Amazon Web Services, Inc. or its affiliates. Why AWS for generative AI? Flexibility The most cost- effective infrastructure Secure customization The easiest way to build with FMs Generative AI- powered solutions generative AI가 built-in되어 제공되므로, AI 코딩의 지원툴인 Amazon CodeWhisperer와 같은 서비스를 통해 생산성을 향상시킬 수 있습니다. 추가적으로, AWS AI 서비스와 선도적인 FM을 결합한 AWS 샘플 솔루션을 사용하여 call summarization과 question answering과 같은 일반적인 generative AI의 use cases를 배포할 수 있습니다.

Slide 103

Slide 103 text

© 2023, Amazon Web Services, Inc. or its affiliates. Resources [1] SageMaker Immersion Day https://sagemaker-immersionday.workshop.aws/ [2] Generative AI on Amazon SageMaker https://catalog.us-east-1.prod.workshops.aws/workshops/972fd252-36e5-4eed-8608-743e84957f8e/en-US [3] Generative AI and Data Science on AWS – Large Language Models (LLMs) – Fine-Tuning with PEFT, RLHF https://catalog.us-east-1.prod.workshops.aws/workshops/f772b430-37d0-4adc-ba65-2f3e229caa5c [4] Building Generative AI Applications with SageMaker Foundation Models https://catalog.workshops.aws/building-gen-ai-apps-with-found-models [5] GenAI Workshop - Deploying Text-to-Image Models using SageMaker JumpStart https://catalog.us-east-1.prod.workshops.aws/workshops/7c37b5bd-40a7-40d3-beee-fb40de7d451d [6] Generative AI Workloads on Trainium and Inferentia https://catalog.us-east-1.prod.workshops.aws/workshops/06367dba-1077-4a51-967c-477dbbbb48b1 [7] Github Repository for SageMaker JumpStart Foundation Models Notebooks https://github.com/aws/amazon-sagemaker-examples/tree/main/introduction_to_amazon_algorithms/jumpstart-foundation- models

Slide 104

Slide 104 text

© 2023, Amazon Web Services, Inc. or its affiliates. Thank you! © 2023, Amazon Web Services, Inc. or its affiliates.