Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Generative AI on AWS - Deep Dive

Generative AI on AWS - Deep Dive

- Generative AI 101
- Rise of Foundation Models
- 3 Ways to use Foundation Models
(1) Model Builders
(2) Model Tuners
(3) Model Consumers
- Customizing a Foundation Model
(1) Prompt Engineering
(2) Fine-tuning
(3) Pre-training
- Benefits of deploying models in SageMaker
- Use cases

Sungmin Kim

June 27, 2023
Tweet

More Decks by Sungmin Kim

Other Decks in Technology

Transcript

  1. © 2023, Amazon Web Services, Inc. or its affiliates. ©

    2023, Amazon Web Services, Inc. or its affiliates. Generative AI on AWS Deep Dive Sungmin Kim Sr. AI/ML Specialist SA, AWS
  2. © 2023, Amazon Web Services, Inc. or its affiliates. Agenda

    • Generative AI 101 • Rise of Foundation Models • 3 Ways to use Foundation Models § Model Builders § Model Tuners § Model Consumers • Customizing a Foundation Model § Prompt Engineering § Fine-tuning § Pre-training • Benefits of deploying models in SageMaker • Use cases
  3. © 2023, Amazon Web Services, Inc. or its affiliates. ©

    2023, Amazon Web Services, Inc. or its affiliates. Generative AI 101
  4. © 2023, Amazon Web Services, Inc. or its affiliates. Traditional

    programming vs. machine learning Input Output Developer Input Output Developer Algorithm Model Training Data Traditional programming Machine learning Program
  5. © 2023, Amazon Web Services, Inc. or its affiliates. Machine

    Learning ⊃ Deep Learning Artificial Intelligence Machine Learning Deep Learning
  6. © 2023, Amazon Web Services, Inc. or its affiliates. Deep

    Learning - 인공 뉴런(Neuron) 자 극 전 달
  7. © 2023, Amazon Web Services, Inc. or its affiliates. Deep

    Learning – 무수히 많은 가중치(Weight) 계산! Data Model ML Algorithm 숫자 0, 1, 2,.. 0.01, 0.02,.. 001, 010, 011, .. model {w1 , w2 , …, wn , b} f(x) = w1 *x1 + w2 *x2 + … + wn *xn + b = W*x + b ML 알고리즘은 Numeric data로 f(x)라는 식(모델)의 매개변수(w1 , w2 , …, b)를 추정 하는 방법 인공 뉴런 Deep Neural Network Model
  8. © 2023, Amazon Web Services, Inc. or its affiliates. Deep

    Learning Artificial Intelligence Machine Learning Deep Learning Gen. AI GAN DALL-E Stable Diffusion ML ⊃ Deep Learning ⊃ Generative AI CNN RNN
  9. © 2023, Amazon Web Services, Inc. or its affiliates. Generative

    AI is transforming AI I M A G E G E N E R A T I O N , T R A N S F O R M A T I O N , U P S C A L I N G Image transformation 4x Upscaling Generated by Stable Diffusion 2.0
  10. © 2023, Amazon Web Services, Inc. or its affiliates. Deep

    Learning Artificial Intelligence Machine Learning Deep Learning Gen. AI LLM GAN DALL-E Stable Diffusion GPT-2, GPT-3 PaLM LLaMA Amazon Titan ML ⊃ Deep Learning ⊃ Generative AI ⊃ LLM CNN RNN
  11. © 2023, Amazon Web Services, Inc. or its affiliates. Large

    Language Model (LLM) T E X T G E N E R A T I O N , T E X T S U M M A R I Z A T I O N , E X T R A C T I O N , C L A S S I F I C A T I O N Cohere’s transcript summarization API AI21 Labs’ wordtune read AI21 Labs’ wordtune
  12. © 2023, Amazon Web Services, Inc. or its affiliates. Amazon

    CodeWhisperer C O D E G E N E R A T I O N 12 ML-powered service that generates code recommendations based on developer comments in natural language and existing code. Large Language Model (LLM)
  13. © 2023, Amazon Web Services, Inc. or its affiliates. Deep

    Learning Artificial Intelligence Machine Learning Deep Learning Gen. AI LLM GAN DALL-E Stable Diffusion GPT-2, GPT-3 PaLM LLaMA Amazon Titan ML ⊃ Deep Learning ⊃ Generative AI ⊃ LLM CNN RNN
  14. © 2023, Amazon Web Services, Inc. or its affiliates. Generative

    AI refers to artificial intelligence that can generate novel content Reduces time and cost to develop ML models and innovate faster Applicable to many use cases like text summarization, question answering, digital art creation, code generation, etc. Powered by foundation models pre-trained on large sets of data Tasks can be customized for specific domains with minimal fine-tuning AI that can produce original content close enough to human generated content for real-world tasks
  15. © 2023, Amazon Web Services, Inc. or its affiliates. ©

    2023, Amazon Web Services, Inc. or its affiliates. Rise of Foundation Models
  16. © 2023, Amazon Web Services, Inc. or its affiliates. 2019

    2021 2022 1B 2B 3B 4B 5B 6B YEAR PARAMETERS 330M 175B 540B increase in size of model as measured by number of parameters 2019–2022 1,600x AI models are getting bigger … A L O T B I G G E R
  17. © 2023, Amazon Web Services, Inc. or its affiliates. Typical

    ML Workflow ML Problem Framing Real-World Problem Define ML Problem Data Preparation Build Training & Tuning Deploy Raw Data
  18. © 2023, Amazon Web Services, Inc. or its affiliates. Summarization

    Text generation Question Answering Model1 Model2 Model3 Traditional Model
  19. © 2023, Amazon Web Services, Inc. or its affiliates. Foundation

    Model Foundation Model Pre-train / Fine-tune Task-specific training Data
  20. © 2023, Amazon Web Services, Inc. or its affiliates. Traditional

    Model vs Foundation Model Text generation Summarization Information extraction Q&A Chatbot Pretrain Adapt Tasks Unlabeled data Foundation model Text generation Summarization Information extraction Q&A Chatbot Train Deploy Tasks ML models … … … … Labeled data … … … …
  21. © 2023, Amazon Web Services, Inc. or its affiliates. How

    does generative AI work? Foundation model Text input Output Text generation model (also known as large language model) Image generation model Video Audio Code generation model “Summarize this article …….” [Text] “………..” “a photo of an astronaut riding a horse on mars” “A young couple walking in rain.” “Children singing nature songs” “Write Python code to sort array …” [Image] [Video] [Audio] [Code]
  22. © 2023, Amazon Web Services, Inc. or its affiliates. Generative

    AI and foundation models Generative AI is a type of AI that can create new content and ideas, including conversations, stories, images, videos, and music. Like all AI, generative AI is powered by ML models—very large models that are pre-trained on vast amounts of data and commonly referred to as Foundation Models (FMs). LLM FM CV NLP NLP – Natural Language Processing CV – Computer Vision LLM – Large Language Model FM – Foundation Model
  23. © 2023, Amazon Web Services, Inc. or its affiliates. ©

    2023, Amazon Web Services, Inc. or its affiliates. 3 Ways to use generative AI
  24. © 2023, Amazon Web Services, Inc. or its affiliates. We

    need to customize a foundation model, why? • Specific Task • Closed-domain knowledge • Current Knowledge • Improving the performance/quality • Reduce likelihood of hallucinations Foundation Model Common Crawl Wikipedia Initial Pretraining Prompt (Question) Response (Answer) Which FMs are supported by Amazon Bedrock? Radio X FM & TV
  25. © 2023, Amazon Web Services, Inc. or its affiliates. Ways

    to use generative AI Use generative AI services or APIs offered by foundation model vendors Build your own foundation model from scratch Start with publicly available foundation models Model Builders Model Tuners Model Consumers
  26. © 2023, Amazon Web Services, Inc. or its affiliates. Ways

    to use generative AI Start with publicly available foundation models Use generative AI services or APIs offered by foundation model vendors No control over data, costs, and no customization support Substantial undifferentiated work needed to operationalize Build your own foundation model from scratch Expensive, time consuming and requires deep expertise Model Builders Model Tuners Model Consumers
  27. © 2023, Amazon Web Services, Inc. or its affiliates. How

    to customize a foundation model Prompt engineering on existing models Fine tuning Pretraining Training duration (and cost) Not required Minutes to hours Days to weeks to months Customization • No customization on model • Customizing the prompt Some • Specific task tuning • Added domain- specific training data FULL • NN architecture and size • Vocabulary size • Context length • Training data Expertize needed Low Medium High
  28. © 2023, Amazon Web Services, Inc. or its affiliates. Prompt

    Engineering on existing models Foundation Model Pre-train / Fine-tune Task-specific training Data Prompt Generated artifact
  29. © 2023, Amazon Web Services, Inc. or its affiliates. Fine-tuning

    Foundation Model Pre-train / Fine-tune Task-specific training Data Prompt Generated artifact
  30. © 2023, Amazon Web Services, Inc. or its affiliates. Pre-training

    Foundation Model Pre-train / Fine-tune Task-specific training Data Prompt Generated artifact Pre-train
  31. © 2023, Amazon Web Services, Inc. or its affiliates. The

    Generative AI Journey Prompt engineering on existing models Fine tuning Pretraining Custom Foundation Models Model Builders Model Tuners Model Consumers
  32. © 2023, Amazon Web Services, Inc. or its affiliates. SageMaker

    Training and Inference How do I access foundation models on AWS? Amazon Bedrock • The easiest way to build and scale generative AI applications with foundation models (FMs) • Access directly or fine-tune foundation model using API • Serverless • Foundation model providers – Amazon, Anthropic, AI21, Stability Amazon SageMaker JumpStart • Machine learning (ML) hub with foundation models (public and proprietary), • Built-in algorithms, and prebuilt ML solutions • Deploy FM as SageMaker Endpoint (hosting) • Fine-tuning leverages SageMaker Training jobs • Choose SageMaker managed accelerated computing instance API Layer Amazon Bedrock Prompt / text embedding Fine-tune Foundation Models Prompt / text embeddings API Layer SageMaker Endpoint Foundation Models SageMaker Jumpstart Model hub, deploy, fine-tune Accelerated Computing Trn1(n), Inf2, P4d, P5 Fine-tune
  33. © 2023, Amazon Web Services, Inc. or its affiliates. The

    AWS AI/ML stack B R O A D E S T A N D M O S T C O M P L E T E S E T O F M A C H I N E L E A R N I N G C A P A B I L I T I E S ML FRAMEWORKS & INFRASTRUCTURE STUDIO IDE AMAZON SAGEMAKER CANVAS No-code ML for business analysts STUDIO LAB Learn ML GROUND TRUTH Label data Prepare data Geospatial ML Store features Build with notebooks Train models Tune parameters Manage and monitor Deploy in production PyTorch, Apache MXNet, TensorFlow Amazon EC2 CPUs GPUs AWS Trainium AWS Inferentia FPGA Habana Gaudi CORE SPECIALIZED AI SERVICES BUSINESS PROCESSES Amazon Personalize Amazon Forecast Amazon Fraud Detector Amazon Lookout for Metrics SEARCH Amazon Kendra CONVERSATION Amazon Lex Amazon Transcribe Call Analytics Contact Lens Voice ID CODE + DEVOPS Amazon CodeGuru Amazon CodeWhisperer Amazon DevOps Guru INDUSTRIAL Amazon Monitron Amazon Lookout for Equipment Amazon Lookout for Vision HEALTH Amazon HealthLake Amazon Comprehend Medical Amazon Transcribe Medical Amazon Omics TEXT Amazon Translate Amazon Comprehend SPEECH Amazon Polly Amazon Transcribe VISION Amazon Textract Amazon Rekognition AWS Panorama CI/CD | GOVERNANCE | RESPONSIBLE ML EDGE MANAGER Manage edge devices Generative AI TEXT/VISION Amazon Bedrock Foundation Models Amazon Bedrock API JUMPSTART Foundation Models Solutions
  34. © 2023, Amazon Web Services, Inc. or its affiliates. End-to-End

    Machine Learning Platform Zero setup Flexible Model Training Pay by the second $ Amazon SageMaker 손 쉬 운 기 계 학 습 모 델 생 성 , 훈 련 및 서 비 스 배포 완전 관리 서비 스
  35. © 2023, Amazon Web Services, Inc. or its affiliates. Foundation

    models available through Amazon SageMaker
  36. © 2023, Amazon Web Services, Inc. or its affiliates. ©

    2023, Amazon Web Services, Inc. or its affiliates. Prompt tuning/engineering
  37. © 2023, Amazon Web Services, Inc. or its affiliates. Data

    + Task input Output Model Prompt Engineering, new way of using ML! Prompt Generated artifact Input Output Model
  38. © 2023, Amazon Web Services, Inc. or its affiliates. Why

    Prompt Engineering? Prompt Generated artifact Input Output Ensure more accurate and useful responses
  39. © 2023, Amazon Web Services, Inc. or its affiliates. Why

    Prompt Engineering? Before After Fine-tune a model with prompt instruction Prompt Generated artifact Input Output
  40. © 2023, Amazon Web Services, Inc. or its affiliates. Prompt

    Engineering Techniques • Many advanced prompting techniques have been designed to improve performance on complex tasks • Zero-shot prompts • Few-shot prompts • Chain-of-thought (CoT) prompting
  41. © 2023, Amazon Web Services, Inc. or its affiliates. Zero

    shot learning Prompt Generated artifact Input Output
  42. © 2023, Amazon Web Services, Inc. or its affiliates. Few

    shot learning Prompt Task description Example Input 1 Example Input 2 Example Input 3 Output Indicator Generated response Input Output
  43. © 2023, Amazon Web Services, Inc. or its affiliates. Example

    of few shot learning Movie review sentiment classifier. Review: "I loved this movie!" This review is positive. Review: "I am not sure, I think the movie was fine.” This review is neutral. Review: "This movie was a waste of time and money" This review is negative. Review: "I really had fun watching this movie” This review is Positive Input Output Model Task Description Examples Output indicator
  44. © 2023, Amazon Web Services, Inc. or its affiliates. Chain

    of Thought Prompting • Chain-of-thought (CoT) prompting is a technique for improving the reasoning ability of large language models by prompting them to generate a series of intermediate steps that lead to the final answer of a multi-step problem. Source: https://arxiv.org/pdf/2201.11903.pdf
  45. © 2023, Amazon Web Services, Inc. or its affiliates. Multiple

    ways to do prompt engineering Prompt Engineering Foundation Models SageMaker Jumpstart Model hub, deploy, fine-tune 1 SageMaker JumpStart 2 SageMaker Ground Truth Plus
  46. © 2023, Amazon Web Services, Inc. or its affiliates. Try-Out

    Experience • Try out the models and model prompts without running code or incurring costs • Available for proprietary models in Top 10 in HELM(Holistic Evaluation of Language Models) benchmarks and public models for comparison purposes • This is a shared environment in a SageMaker escrow account, model providers don’t see any of the prompts but we recommend 1/ don’t use confidential or proprietary in the try-out experience or 2/ create endpoint to test proprietary data
  47. © 2023, Amazon Web Services, Inc. or its affiliates. SageMaker

    Ground Truth Plus Fully managed service by AWS to create high-quality training datasets Question & Answer Generation Image Captions Text Ranking Video Captions
  48. © 2023, Amazon Web Services, Inc. or its affiliates. SageMaker

    Ground Truth Plus Question & Answer Generation
  49. © 2023, Amazon Web Services, Inc. or its affiliates. Multiple

    ways to do prompt engineering Prompt Engineering Foundation Models SageMaker Jumpstart Model hub, deploy, fine-tune 1 SageMaker JumpStart 2 SageMaker Ground Truth Plus
  50. © 2023, Amazon Web Services, Inc. or its affiliates. ©

    2023, Amazon Web Services, Inc. or its affiliates. Fine-tuning foundation models
  51. © 2023, Amazon Web Services, Inc. or its affiliates. Foundation

    models available through Amazon SageMaker
  52. © 2023, Amazon Web Services, Inc. or its affiliates. Proprietary

    Models in Gated Preview only Foundation models available through Amazon SageMaker
  53. © 2023, Amazon Web Services, Inc. or its affiliates. Tasks

    Algorithms/models Vision Text Tabular Audio SageMaker JumpStart: ML hub for SageMaker Customers 400+ algorithms and pre-trained, state-of-the-art, open-source models from PyTorch Hub, TensorFlow Hub, and Hugging Face, etc.
  54. © 2023, Amazon Web Services, Inc. or its affiliates. Browse

    and search SageMaker JumpStart content Search for topics or problem types, and get relevant results across all content Browse by content type to explore solutions, models, example notebooks, blogs, and video tutorials
  55. © 2023, Amazon Web Services, Inc. or its affiliates. Easy

    deploy experience • Training instance type • Security Settings
  56. © 2023, Amazon Web Services, Inc. or its affiliates. Easy

    fine-tune experience • Labeled data set path • Training instance type • Hyper-parameters & Security settings
  57. © 2023, Amazon Web Services, Inc. or its affiliates. Demo

    Text Generation Text Generation with HuggingFace Models
  58. © 2023, Amazon Web Services, Inc. or its affiliates. Try

    out models via AWS Console Deploy the model for inference using SageMaker hosting options includes single node Only selected models can be fine-tuned Automate ML workflow Easy to use FMs on SageMaker Jumpstart Data stays in your account including model, instances, logs, model inputs, model outputs Fully integrated with Amazon SageMaker features Choose foundation models offered by model providers 1 Try out model and/or deploy 2 Fine tune model and automate ML workflow 3
  59. © 2023, Amazon Web Services, Inc. or its affiliates. Stability

    AI Models • Text2Image • Upscaling • In-painting Tasks • Generate photo- realistic images from text input • Improve quality of generated images Features • Fine-tuning on SD 2.1 model AlexaTM Models • AlexaTM 20B Tasks • Machine translation • Question Answering • Summarization • Annotation • Data Generation Co:here Models • Cohere Command Medium (6B) Tasks • Text generation • Information extraction • Question Answering • Summarization LightOn Models • Lyra-Fr (10B) Tasks • Text Generation • Information Extraction •Question Answering • Summarization • Sentiment Analysis • Classification Models •Jurassic-1 Grande Series •Jurassic Tasks • Text generation • Long-form generation • Summarization • Paraphrasing • Chat • Information extraction • Question Answering • Classification AI21 Proprietary models Models • Flan T-5 models (8 variants) • DistilGPT2, GPT2 • Bloom models (3 variants) Tasks • Machine Translation • Question Answering • Summarization • Annotation • Data generation HuggingFace Publicly available SageMaker JumpStart models and features
  60. © 2023, Amazon Web Services, Inc. or its affiliates. ©

    2023, Amazon Web Services, Inc. or its affiliates. Benefits of deploying models on Amazon SageMaker
  61. © 2023, Amazon Web Services, Inc. or its affiliates. Considerations

    when deploying models Cost saving opportunity in production Separation of Concerns Spend 90% Prediction 10% Training ML App interface Security
  62. © 2023, Amazon Web Services, Inc. or its affiliates. Benefits

    of deploying models on Amazon SageMaker Cost saving opportunity in production Separation of Concerns Spend 90% Prediction 10% Training ML App interface Security
  63. © 2023, Amazon Web Services, Inc. or its affiliates. Amazon

    SageMaker Deployment Hosting Services Inference Image Training Image Training Data Model artifacts Amazon SageMaker Amazon S3 Amazon ECR
  64. © 2023, Amazon Web Services, Inc. or its affiliates. Amazon

    SageMaker Deployment Hosting Services Inference Image Training Image Training Data Model artifacts Amazon SageMaker Amazon S3 Amazon ECR Model artifacts
  65. © 2023, Amazon Web Services, Inc. or its affiliates. Amazon

    SageMaker Deployment Hosting Services Inference Image Training Image Training Data Model artifacts Amazon SageMaker Amazon S3 Amazon ECR Model artifacts Inference Image
  66. © 2023, Amazon Web Services, Inc. or its affiliates. Amazon

    SageMaker Deployment Hosting Services Inference Image Training Image Training Data Model artifacts Endpoint Amazon SageMaker Amazon S3 Amazon ECR Model artifacts Inference Image
  67. © 2023, Amazon Web Services, Inc. or its affiliates. Amazon

    SageMaker Deployment SageMaker Endpoints (Private API) Auto Scaling group Availability Zone 1 Availability Zone 2 Availability Zone 3 Elastic Load Balancing Model Endpoint Client Deployment / Hosting Amazon SageMaker ML Compute Instances Input Data (Request) Prediction (Response)
  68. © 2023, Amazon Web Services, Inc. or its affiliates. Amazon

    SageMaker Deployment SageMaker Endpoints (Public API) Auto Scaling group Availability Zone 1 Availability Zone 2 Availability Zone 3 Elastic Load Balancing Model Endpoint Amazon API Gateway Client Deployment / Hosting Amazon SageMaker ML Compute Instances Input Data (Request) Prediction (Response)
  69. • AWS protects model tuner/consumer’s data • AWS protects model

    provider’s IP • Proprietary model package and endpoint is hosted in SageMaker/Bedrock owned escrow account • Containers have no outbound network access Security
  70. © 2023, Amazon Web Services, Inc. or its affiliates. Benefits

    of deploying models on Amazon SageMaker Cost saving opportunity in production Separation of Concerns Spend 90% Prediction 10% Training ML App interface Security
  71. © 2023, Amazon Web Services, Inc. or its affiliates. Single

    model deployment Single container Multi-container Invoke Response Inference Pipelines Real-time synchronous response Serverless GPUs CPUs Deploy model to serve inference Near real-time asynchronous response Invoke Response Offline batch inference Submit Complete Amazon SageMaker Multi-Model deployment Model Container Infrastructure
  72. © 2023, Amazon Web Services, Inc. or its affiliates. Save

    costs by deploying on Amazon SageMaker Infrastructure cost Operations cost Infrastructure cost Operations cost Security and compliance cost • Compute instances • Storage • Network Operating, managing, and maintaining infrastructure Security and compliance for ML features, encrypt data and models, access policies, track and trace Deploy on SageMaker Self-managed deployment on Amazon EKS or Amazon ECS
  73. © 2023, Amazon Web Services, Inc. or its affiliates. Benefits

    of deploying models on Amazon SageMaker Cost saving opportunity in production Separation of Concerns Spend 90% Prediction 10% Training ML App interface Security
  74. © 2023, Amazon Web Services, Inc. or its affiliates. Cost

    Challenges with Inference Latency & Throughput Model Size
  75. © 2023, Amazon Web Services, Inc. or its affiliates. Large

    Model Inference (LMI) container Large ML models with 100 billion + parameters Easily parallelize models across multiple GPUs to fit models into the instance and achieve low latency Deploy models on the most performant and cost- effective GPU-based instances or on AWS Inferentia Leverage 500GB of Amazon EBS volume per endpoint
  76. © 2023, Amazon Web Services, Inc. or its affiliates. AWS

    purpose built accelerators for deep learning AWS Inferentia Lowest cost inference in the cloud for running deep learning models AWS Trainium The most cost-efficient for high performance training of LLMs and diffusion models AWS Inferentia2 High-performance at the lowest cost-per-inference for LLMs and diffusion models
  77. © 2023, Amazon Web Services, Inc. or its affiliates. AWS

    purpose built accelerators for deep learning AWS Inferentia Lowest cost inference in the cloud for running deep learning models AWS Trainium The most cost-efficient for high performance training of LLMs and diffusion models AWS Inferentia2 High-performance at the lowest cost-per-inference for LLMs and diffusion models Inference Training
  78. © 2023, Amazon Web Services, Inc. or its affiliates. AWS

    Trainium: High performance, less power, lower cost T R A I N I N G B E R T L A R G E W I T H A W S T R A I N I U M Details: Hugging Face Bert-Large, FP32, On-Demand EC2 pricing 2.3x Faster training GPU Cluster Trn1 Cluster Hours Time to train 47% Less energy GPU Cluster Trn1 Cluster Kilowatts Power 72% Lower cost GPU Cluster Trn1 Cluster USD Cost to train
  79. © 2023, Amazon Web Services, Inc. or its affiliates. AWS

    Inferentia2: High performance, less power, lower cost R E A L - T I M E D E P L O Y M E N T B E R T - L A R G E W I T H A W S I N F E R E N T I A 2 50% Fewer instances GPU Instances Inf2.2xl Instances Number of instances 50% Less energy GPU Instances Inf2.2xl Watts Power 65% Lower cost GPU Instances Inf2.2xl USD Inference cost
  80. © 2023, Amazon Web Services, Inc. or its affiliates. 2.9

    3.2 3.1 1.3 8.8 6.1 1.9 3.1 0.3 12 0 2 4 6 8 10 12 14 BLO O M Z-176B (LM I) G PT-J-6B O PT-30B O PT66B G PT-N eoX-20B Stable D iffusion 768x768 Stable D iffusion 2 (AITem plate) Stable D iffusion 2 D epth2Im g FlanT5-xxl BLO O M -176B - H ugging Face… Latency P90 in secs • Input token size is 5 • Output token size is 50 • Batch size ranges from 1 to 11 • SageMaker Container: LMI [1] BLOOM-176B was tested on p4de.24xlarge (A100) with batch size 50 and int8 showed throughput of 350 tokens/sec. [2] Stable Diffusion 2.1 model is 768x768 image generation with 50 denoising steps. All other values are default [3] Stable Diffusion 2 Depth model using 512x512 image, DDIM scheduler with 512-base model, 62 denoising steps, strength of 0.8 (50 effective steps). All other values are default. [4] TGI to Hugging Face Text Generation Inference library p4de g5 p4d Benchmarking results for Large Model Inference on Amazon SageMaker
  81. © 2023, Amazon Web Services, Inc. or its affiliates. Benefits

    of deploying models in Amazon SageMaker Cost saving opportunity in production Separation of Concerns Spend 90% Prediction 10% Training ML App interface Security
  82. © 2023, Amazon Web Services, Inc. or its affiliates. ©

    2023, Amazon Web Services, Inc. or its affiliates. Use cases
  83. © 2023, Amazon Web Services, Inc. or its affiliates. Text-to-Image

    Generation Source: https://aws.amazon.com/ko/blogs/tech/ai-art-stable-diffusion-sagemaker-jumpstart/
  84. © 2023, Amazon Web Services, Inc. or its affiliates. Retrieval

    Augmented Generation (RAG) with LLMs Source: https://aws.amazon.com/blogs/machine-learning/question-answering-using- retrieval-augmented-generation-with-foundation-models-in-amazon-sagemaker-jumpstart/
  85. © 2023, Amazon Web Services, Inc. or its affiliates. RAG

    Example: Medical Chatbot on AWS 환자 Query: HDL 콜레스테롤이 뭐지? Search Relevant Information Hospital Knowledge Sources Relevant Information for Enhanced Context: HDL 콜레스테롤은 쉽게 얘기하면 좋은 콜레스테롤 이라고 말할 수 있습니다. HDL 콜레스테롤 수치가 낮다면 심혈관계 위험인자가 되기 때문에 HDL 콜레스테롤 수치를 높여야 합니다. Generated Text Response: HDL 콜레스테롤은 혈액 속 남아있는 콜레스테롤을 간으로 운반하여 배설되게 하기 때문에, 혈관을 청소하는 ‘좋은 콜레스테롤'로 불려요. Fine-Tune API Layer SageMaker Endpoint For Prompt / Text embeddings Combine query to Enhanced Context 1 3 4 5 6 2 1 2 Call Inference API Accelerated Computing Trn1, Inf2, P4d, P5 SageMaker Jumpstart Model hub, deploy, fine-tune SageMaker Training and Inference Foundation Models Query: HDL 콜레스테롤이 뭐지? Enhanced Context: HDL 콜레스테롤은 쉽게 얘기하면 좋은 콜레스테롤 이라고 말할 수 있습니다. HDL 콜레스테롤 수치가 낮다면 심혈관계 위험인자가 되기 때문에 HDL 콜레스테롤 수치를 높여야 합니다.
  86. © 2023, Amazon Web Services, Inc. or its affiliates. ©

    2023, Amazon Web Services, Inc. or its affiliates. Summary
  87. © 2023, Amazon Web Services, Inc. or its affiliates. Generative

    AI and foundation models Generative AI is a type of AI that can create new content and ideas, including conversations, stories, images, videos, and music. Like all AI, generative AI is powered by ML models—very large models that are pre-trained on vast amounts of data and commonly referred to as Foundation Models (FMs). LLM FM CV NLP NLP – Natural Language Processing CV – Computer Vision LLM – Large Language Model FM – Foundation Model
  88. © 2023, Amazon Web Services, Inc. or its affiliates. How

    does generative AI work? Foundation model Text input Output Text generation model (also known as large language model) Image generation model Video Audio Code generation model “Summarize this article …….” [Text] “………..” “a photo of an astronaut riding a horse on mars” “A young couple walking in rain.” “Children singing nature songs” “Write Python code to sort array …” [Image] [Video] [Audio] [Code]
  89. © 2023, Amazon Web Services, Inc. or its affiliates. Ways

    to use generative AI Model Tuners Model Consumers Model Builders Use Amazon SageMaker Foundation Model Hub & JumpStart Third-party state-of-the-art pre- trained foundation models: Generative AI solutions powered by foundation models Build your own foundation model Amazon SageMaker GPT-J Bloom from HF Amazon Bedrock Amazon CodeWhisperer
  90. © 2023, Amazon Web Services, Inc. or its affiliates. Prompt

    engineering on existing models Fine tuning Pretraining Training duration (and cost) Not required Minutes to hours Days to weeks to months Customization • No customization on model • Customizing the prompt Some • Specific task tuning • Added domain- specific training data FULL • NN architecture and size • Vocabulary size • Context length • Training data Expertize needed Low Medium High The Generative AI Journey How to customize a foundation model
  91. © 2023, Amazon Web Services, Inc. or its affiliates. Benefits

    of deploying models in Amazon SageMaker Cost saving opportunity in production Separation of Concerns Spend 90% Prediction 10% Training ML App interface Security
  92. © 2023, Amazon Web Services, Inc. or its affiliates. Generative

    AI Customers Demands Flexibility The most cost- effective infrastructure Secure customization The easiest way to build with FMs Generative AI- powered solutions
  93. © 2023, Amazon Web Services, Inc. or its affiliates. Why

    AWS for generative AI? Flexibility The most cost- effective infrastructure Secure customization The easiest way to build with FMs Generative AI- powered solutions
  94. © 2023, Amazon Web Services, Inc. or its affiliates. Why

    AWS for generative AI? Flexibility The most cost- effective infrastructure Secure customization The easiest way to build with FMs Generative AI- powered solutions AI21 Labs, Anthropic, Stability AI, Amazon에서 구축한 다양한 FM 중에서 선택하여 여러분의 use case에 적합한 모델을 찾아보세요.
  95. © 2023, Amazon Web Services, Inc. or its affiliates. Why

    AWS for generative AI? Flexibility The most cost- effective infrastructure Secure customization The easiest way to build with FMs Generative AI- powered solutions 적은 labeled 샘플을 통해 비즈니스에 적합한 FM을 사용자에 맞게 만드세요. 모든 데이터는 암호화되어 Amazon Virtual Private Cloud (VPC)를 벗어나지 않으므로 여러분의 데이터가 private하고 confidential하게 유지된다는 점에서 신뢰할 수 있습니다.
  96. © 2023, Amazon Web Services, Inc. or its affiliates. Why

    AWS for generative AI? Flexibility The most cost- effective infrastructure Secure customization The easiest way to build with FMs Generative AI- powered solutions AWS가 설계한 ML 칩과 NVIDIA GPU 기반의 인프라로 generative AI를 위한 최고의 가격 대비 성능을 확보하세요. 인프라를 비용 효율적으로 확장하여 수천억 개의 파라미터가 포함된 FM을 학습하고 실행할 수 있습니다.
  97. © 2023, Amazon Web Services, Inc. or its affiliates. Why

    AWS for generative AI? Flexibility The most cost- effective infrastructure Secure customization The easiest way to build with FMs Generative AI- powered solutions 편리한 제어와 통합을 통해 Amazon SageMaker와 Amazon S3와 같은 광범위한 AWS 기능과 서비스를 사용하여 AWS에서 실행되는 애플리케이션 및 워크로드에 FM을 빠르게 통합 및 배포하세요.
  98. © 2023, Amazon Web Services, Inc. or its affiliates. Why

    AWS for generative AI? Flexibility The most cost- effective infrastructure Secure customization The easiest way to build with FMs Generative AI- powered solutions generative AI가 built-in되어 제공되므로, AI 코딩의 지원툴인 Amazon CodeWhisperer와 같은 서비스를 통해 생산성을 향상시킬 수 있습니다. 추가적으로, AWS AI 서비스와 선도적인 FM을 결합한 AWS 샘플 솔루션을 사용하여 call summarization과 question answering과 같은 일반적인 generative AI의 use cases를 배포할 수 있습니다.
  99. © 2023, Amazon Web Services, Inc. or its affiliates. Resources

    [1] SageMaker Immersion Day https://sagemaker-immersionday.workshop.aws/ [2] Generative AI on Amazon SageMaker https://catalog.us-east-1.prod.workshops.aws/workshops/972fd252-36e5-4eed-8608-743e84957f8e/en-US [3] Generative AI and Data Science on AWS – Large Language Models (LLMs) – Fine-Tuning with PEFT, RLHF https://catalog.us-east-1.prod.workshops.aws/workshops/f772b430-37d0-4adc-ba65-2f3e229caa5c [4] Building Generative AI Applications with SageMaker Foundation Models https://catalog.workshops.aws/building-gen-ai-apps-with-found-models [5] GenAI Workshop - Deploying Text-to-Image Models using SageMaker JumpStart https://catalog.us-east-1.prod.workshops.aws/workshops/7c37b5bd-40a7-40d3-beee-fb40de7d451d [6] Generative AI Workloads on Trainium and Inferentia https://catalog.us-east-1.prod.workshops.aws/workshops/06367dba-1077-4a51-967c-477dbbbb48b1 [7] Github Repository for SageMaker JumpStart Foundation Models Notebooks https://github.com/aws/amazon-sagemaker-examples/tree/main/introduction_to_amazon_algorithms/jumpstart-foundation- models
  100. © 2023, Amazon Web Services, Inc. or its affiliates. Thank

    you! © 2023, Amazon Web Services, Inc. or its affiliates.