AWS_UG_BCN_Eduardo_Ordax Talk

© 2024, Amazon Web Services, Inc. or its affiliates. All
rights reserved. E D U A R D O O R D A X P R I N C I P A L G O T O M A R K E T G E N E R A T I V E A I A M A Z O N W E B S E R V I C E S H T T P S : / / W W W . L I N K E D I N . C O M / I N / E O R D A X /

rights reserved. >$200bn market expected for Generative AI by 20303 425% increase in VC investment in Generative AI since 20201 ~80% of current AI research is focused on Generative AI2 7% global GDP growth expected due to Generative AI4 Economic impact: 18% of all jobs could be automated on an employment-weighted basis5 2023 2024 3x Average Spend on LLMs

rights reserved. Hype Time WE ARE HERE

rights reserved. Capability Time WE ARE HERE

rights reserved. Adoption Time WE ARE HERE

rights reserved. EXPERIMENTING we are at the very early days of Generative AI INTERNAL PILOTING EXTERNAL PILOTING PRODUCTION NONE OF THE ABOVE

rights reserved.

rights reserved. What is MLOps? PROCESS PEOPLE TECHNOLOGY MLOps The combination of people, processes, and technology to productionize ML solutions efficiently.

rights reserved. PROCESS PEOPLE TECHNOLOGY MLOps Productionize ML solutions efficiently. LLMOps Productionize Large Language Model-based solutions MLOps Vs LLMOps Differentiators

rights reserved. MLOps Vs LLMOps Differentiators PROCESS PEOPLE TECHNOLOGY LLMOps Processes & People Providers, fine-tuners, & consumers Foundation Model Customization Prompt Engineering, RAG, Fine Tuning Proprietary Vs Open Source Evaluation & Monitoring Human Feedback, Toxicity, Bias… Model Deployment API endpoints Vs Self deployment Technology Adapted to new processes

rights reserved. How to Interact with GenAI

rights reserved. 14 Generative AI Lifecycle

rights reserved. Generative AI User Types & Skills Providers Entities who build foundation models from scratch themselves and provide them as a product to tuner and consumer. Skills Deep end-to-end ML, NLP expertise and data science, labeler “squad“ Fine-Tuners Fine-tune foundational models from providers to fit custom requirements. Orchestrate the deployment of the model as a service for use by consumers Skills Strong end-to-end ML expertise and knowledge of model deployment and inference. Strong domain knowledge for tuning including prompt engineering. Consumers Interact with Generative AI services from provider or tuner by text prompting or visual interface to complete desired actions. Skills No ML expertise required. Mostly application developers or end- users with understanding of the service capabilities. Only prompt engineering is required for better results. MLOps LLMOps

rights reserved. Experimentation Data ML Governance Platform Administration Model Approvers Auditors & Compliance Product Owner Lead Data Scientists Data Scientists Data Owners MLOps Engineers System Administrators Security Data Engineers Business Stakeholders ML Consumers Model Build Model Test Model Deployment Data Scientists ML Engineers ML Production Environment Provision infrastructure Provide user access Provide data access Ingest data Prepare, combine and catalogue Data Visualize data Prove that ML can solve a business problem, i.e. PoC Automate model build/training providing scaled data Automate model testing and guardrails Serving and monitoring the model testing Centralized model and code artifact storage/versioning/auditing MLOps People & Processes

rights reserved. Experimentation Data ML Governance Platform Administration Model Approvers Auditors & Compliance Product Owner Lead Data Scientists Data Scientists Data Owners MLOps Engineers System Administrators Security Data Engineers Business Stakeholders ML Consumers Model Build Model Test Model Deployment Data Scientists ML Engineers ML Production Environment Provision infrastructure Provide user access Provide data access Ingest data Prepare, combine and catalogue Data Visualize data Prove that ML can solve a business problem, i.e. PoC Automate model build/training providing scaled data Automate model testing and guardrails Serving and monitoring the model testing Centralized model and code artifact storage/versioning/auditing Application Generative AI Developers AppDev Prompt Engineers /Testers Generative AI End-users Fine-Tuners Fine-Tuners Web UI Data Labelers / Editors Web UI LLMOps People & Processes

rights reserved. The Journey of

rights reserved. Model Selection Model Evaluation API Fine Tuning RAG (Basic & Advanced) Prompt Engineering Integrate into App Monitor Application Execute Actions The Journey of Consumers Model Deployment MLOps is required

rights reserved. 22 Step 1. Understand top proprietary and open source FM capabilities Select Foundation Model - Consumers

rights reserved. Step1 Understand Top FM Capabilities

rights reserved. 24 Step 1. Understand top proprietary and open source FM capabilities Step 2. Test & evaluate the top selected FMs (e.g. top 3) Step 3. Select the best FM based on your priorities Quick short listing: Use a small set of test prompts based on the task Use case-based benchmarking: Evaluate the models based on predefined prompts and outputs (prompt catalog) Priority-based decision: Select based on business priorities cost, latency, precision Select Foundation Model - Consumers

rights reserved. The broadest choice of Foundation Models Amazon Summarization, complex reasoning, writing, coding Contextual answers, summarization, paraphrasing High-quality images and art Text generation, search, classification Q&A and reading comprehension Text summarization, generation, Q&A, search, image generation Titan Text Premier Titan Text Lite Titan Text Express Titan Text Embeddings Titan Text Embeddings V2 Titan Multimodal Embeddings Titan Image Generator Claude 3.5 Sonnet Claude 3 Opus Claude 3 Sonnet Claude 3 Haiku Claude 2.1 Claude 2 Claude Instant Llama 3.1 8B Llama 3.1 70B Llama 3.1 40B Llama 2 13B Llama 2 70B Command Command Light Embed English Embed Multilingual Command R+ Command R Stable Diffusion XL1.0 Stable Diffusion XL 0.8 Jurassic-2 Ultra Jurassic-2 Mid Jamba Instruct Mistral Large Mistral Small Mistral 7B Mixtral 8x7B Mixtral 8x22B Text summarization, Q&A, text classification, text completion, code generation

rights reserved. Step2 Evaluate Top FMs

rights reserved. 28 High speed, smaller model, lower precision, smaller cost Lower speed, larger model, higher precision, larger cost P0: lower cost No priority P1: Precision Model Evaluation Score HIL/LLM Feedback FM1 5/5 <Feedback summary> FM2 4/5 <Feedback summary> FM3 3/5 <Feedback summary> Model Cost FM1 $$$$ FM2 $ FM3 $$$ Model Speed FM1 ⚡⚡ FM2 ⚡ FM3 ⚡ Speed Precision Cost Step3 Select the best FM based on priorities

© 2024, Amazon Web Services, Inc. or its aﬃliates. All
rights reserved. Playground Evaluate directly in the playground as you try out different models; compare across cost latency and usage dimensions Programmatic Evaluate as part of your application development lifecycle or model customization. Metrics include accuracy, toxicity, and robustness Human-in-the-loop Bring your own team Use the framework offered to organize your team for your evaluation. Better suited for subjective criteria such as brand voice, clarity, and tone Human-in-the-loop AWS Managed team Leverage AWS expertise for an expert evaluation, in case you don’t have the budget or expertise in-house Use programmatic evaluation as you iterate on the use case or the model Use playground as you narrow down on the use case and identify the FM Bring your own team as you start testing your first prototype or get ready for pilot AWS managed team as you get ready for production launch of your application Amazon Bedrock Model Evaluation

rights reserved. Pricing based on input and output token count for LLMs Great for prototyping, POCs, and small workloads with more relaxed requirements for throughput and latency Reserve throughput (input/output tokens per minute) at a fixed cost Flexible commitment term of 1 month or 6 months Pay-as-you-go, no commitment Provision sufficient throughput to meet your application’s performance requirements Amazon Bedrock Consumption Options

rights reserved. Foundation Model Customization Prompt Engineering Vs RAG Vs Fine Tuning

rights reserved. Create prompt Build from scratch; modify for task Start from template Link prompts Single use case may need several FMs coming together Iterate & Refine Try out variations of the prompt; try out models Test prompt Set up scripts to test with user inputs Track Results Archive prompt Snapshot the prompt; compare with production Deploy prompt Need ability to rollback and rollforward Monitor prompt Track performance and improve for use case Prompt Engineering Lifecycle

rights reserved. Prompt templates • Pre-deﬁned for common use cases • By domain Prompt flows • Plumbs the output of one to input of next • Visual designer Prompt variations, migration • Variations for easy iteration • Migrate models Prompt testing • Built-in interface for test & inspect • Visualize flow Prompt versioning • Snapshot the prompt • A/B testing Prompt deployment • Alias as a pointer to version • Perform CI/CD Prompt monitoring • Track performance past deployment • Dashboard 1 2 3 4 5 6 7 Amazon Bedrock Prompt Engineering

rights reserved. in action Splitting into chunks Generating Embeddings Vector DB Data Source User Query Generating Embeddings Retrieval Augment User Query Respond to the user Vector DB 1. Index your data sources into embeddings 2. Setup your LLM & Retriever chain 3. Use your RAG powered chatbot!

rights reserved. Amazon Confidential and Trademark. 36

rights reserved. M O D E L Anthropic—Claude Meta—Llama Amazon Titan Text AI21 Labs—Jurassic2 U S E R Q U E R Y A U G M E N T E D P R O M P T A N S W E R KNOWLEDGE BASES FOR AMAZON BEDROCK 3 2 1 4 5 6 Securely connect FMs to data sources for RAG to deliver more relevant responses Fully managed RAG workflow including ingestion, retrieval, and augmentation Built-in session context management for multiturn conversations Automatic citations with retrievals to improve transparency A M A Z O N B E D R O C K Amazon Bedrock Knowledge Bases

rights reserved. Large number of unlabeled datasets Maintaining model accuracy for your domain P U R P O S E D A T A N E E D Maximizing accuracy for speciﬁc tasks Small number of labeled examples P U R P O S E D A T A N E E D Amazon Bedrock Model Customization

rights reserved. Foundation Model Customization Prompt Engineering Vs RAG Vs Fine Tuning

rights reserved. Model Selection Model Evaluation API Fine Tuning RAG (Basic & Advanced) Prompt Engineering Integrate into App Monitor Application Execute Actions The Journey of Consumers

rights reserved. Monitoring & Observability Monitoring Prompts Standalone metrics like Readability could be informative. LLMs should be utilized to check Toxicity. Embedding distances from the reference prompts are smart metrics Functional Monitoring Number of requests, latency, token usage, costs, and error rates Monitoring Responses Check responses against what you expect: relevance, hallucinations, sentiment, toxicity or harmful content. Prompt leakage can be discovered by monitoring responses and comparing it to the DB of prompts instructions Alerting and Thresholds Automatic response blocking if input prompt triggers an alert. Same applies to screening responses for PII leakage, Toxicity, and other quality metrics

rights reserved. The Journey of Providers Data Ingestion Distribute Training Data Cleaning & Preparation Feature Engineering Model Evaluation Model Deployment Monitoring & Feedback

rights reserved. The Journey of Providers Data Labeling Fine Tuning Training job Data Cleaning & Preparation Feature Engineering Model Evaluation Model Deployment Monitoring & Feedback

rights reserved. Amazon Q For Business For Developers Amazon Q in Amazon QuickSight Amazon SageMaker Amazon Q in Amazon Connect Supply Chain GPUs Inferentia Trainium SageMaker EC2 Capacity Blocks Neuron UltraClusters EFA Nitro Amazon Bedrock G U A R D R A I L S A G E N T S C U S T O M I Z A T I O N C A P A B I L I T I E S P R O M P T E N G I N E E R I N G M O D E L E V A L U A T I O N Producers & Tuners Consumers End-User

rights reserved. What does hold for Generative AI? Regulations Model Routing Multi-modality Agentic AI AI Embedded everywhere Small Language Models Open Source Next Gen Frontier Models

rights reserved. Beyond the Hype Focus on what it’s possible today, not what it’s yet to come tomorrow LLMs are just a small component of your prod environment No single LLM can do everything, ﬂexibility is key We’re still at the early days, invest in your team first Less “prompting” and more ”engineering”

AWS_UG_BCN_Eduardo_Ordax Talk

AWS_UG_BCN_Eduardo_Ordax Talk

More Decks by Irene Aguilar

Featured

Transcript