Slide 1

Slide 1 text

© 2024, Amazon Web Services, Inc. or its affiliates. © 2024, Amazon Web Services, Inc. or its affiliates. AWS Unicorn Day S E O U L | M A R C H 2 0 , 2 0 2 4

Slide 2

Slide 2 text

© 2024, Amazon Web Services, Inc. or its affiliates. © 2024, Amazon Web Services, Inc. or its affiliates. RAG 아키텍처 개념부터 구현까지 Sungmin Kim Sr. AI/ML Specialist SA AWS

Slide 3

Slide 3 text

© 2024, Amazon Web Services, Inc. or its affiliates. Agenda Generative AI on AWS RAG Concepts Vector Embedding for Semantic Search Vector Databases in AWS Knowledge Bases for Amazon Bedrock

Slide 4

Slide 4 text

© 2024, Amazon Web Services, Inc. or its affiliates. Introduction to generative AI “A golden retriever wearing glasses and a hat in a portrait painting” > Question: What is generative AI? Generative AI is a branch of AI that focuses on creating new data. It is a subset of machine learning. The goal of generative AI is to create new data that is similar to the data that was used to train the model. Chat:

Slide 5

Slide 5 text

© 2024, Amazon Web Services, Inc. or its affiliates. How does generative AI work? Foundation model Text input Output Text generation model (also known as large language model) Image generation model Video Audio Code generation model “Summarize this article …….” [Text] “………..” “a photo of an astronaut riding a horse on mars” “A young couple walking in rain.” “Children singing nature songs” “Write Python code to sort array …” [Image] [Video] [Audio] [Code]

Slide 6

Slide 6 text

© 2024, Amazon Web Services, Inc. or its affiliates. Foundation Models 6

Slide 7

Slide 7 text

© 2024, Amazon Web Services, Inc. or its affiliates. Generative AI is powered by Foundation Models (FMs) Text generation Summarization Information extraction Q&A Chatbot Pretrain Adapt Tasks Unlabeled data Foundation model Text generation Summarization Information extraction Q&A Chatbot Train Deploy Tasks ML models … … … … Labeled data … … … …

Slide 8

Slide 8 text

© 2024, Amazon Web Services, Inc. or its affiliates. Generative AI and Foundation Models Generative AI is a type of AI that can create new content and ideas, including conversations, stories, images, videos, and music. Like all AI, generative AI is powered by ML models—very large models that are pre-trained on vast amounts of data and commonly referred to as Foundation Models (FMs). NLP – Natural Language Processing CV – Computer Vision LLM – Large Language Model FM – Foundation Model LLM CV NLP NLP – Natural Language Processing CV – Computer Vision LLM – Large Language Model FM – Foundation Model FM

Slide 9

Slide 9 text

© 2024, Amazon Web Services, Inc. or its affiliates. SageMaker Training and Inference How do I access Foundation Models on AWS? Amazon SageMaker JumpStart Amazon Bedrock Prompt / text embeddings API Layer SageMaker Endpoint Foundation Models SageMaker Jumpstart Model hub, deploy, fine-tune Accelerated Computing Trn1(n), Inf2, P4d, P5 Fine-tune API Layer Amazon Bedrock Foundation Models Prompt / text embeddings Fine-tune

Slide 10

Slide 10 text

© 2024, Amazon Web Services, Inc. or its affiliates. © 2024, Amazon Web Services, Inc. or its affiliates. SageMaker JumpStart

Slide 11

Slide 11 text

© 2024, Amazon Web Services, Inc. or its affiliates. Tasks Algorithms/models Vision ResNet, Inception, MobileNet, SSD, Faster RCNN, YOLO, and more Text Tabular Classification, regression, time-series LightGBM, CatBoost, AutoGluon, TabTransformer, XGBoost, DeepAR Audio Audio embedding TRILL, TRILLsson, TRILL-Distilled, FRILL FLAN T5 XL, AlexaTM, Stable Diffusion, BERT, RoBERTa, DistilBERT, Distillbart xsum, GPT2, ELECTRA, & More Image classification Image embedding Object detection Semantic segmentation Sentence classification Text classification Question answering Summarization Text generation, translation, Named-entity recognition SageMaker JumpStart: ML hub for SageMaker Customers 400+ algorithms and pre-trained, state-of-the-art, open-source models from PyTorch Hub, TensorFlow Hub, and Hugging Face, etc.

Slide 12

Slide 12 text

© 2024, Amazon Web Services, Inc. or its affiliates. • Search for topics or problem types, and get relevant results across all content • Browse by content type to explore solutions, models, example notebooks, blogs, and video tutorials Browse and search SageMaker JumpStart content

Slide 13

Slide 13 text

© 2024, Amazon Web Services, Inc. or its affiliates. Easy deploy experience • Inference instance type • Security Settings

Slide 14

Slide 14 text

© 2024, Amazon Web Services, Inc. or its affiliates. Easy fine-tune experience • Labeled data set path • Training instance type • Hyper-parameters & Security settings

Slide 15

Slide 15 text

© 2024, Amazon Web Services, Inc. or its affiliates. Three ways to use Foundation models through SageMaker JumpStart SageMaker Studio One-click deploy SageMaker Notebooks SageMaker Console Gated-preview

Slide 16

Slide 16 text

© 2024, Amazon Web Services, Inc. or its affiliates. © 2024, Amazon Web Services, Inc. or its affiliates. Amazon Bedrock Single API to build with generative AI

Slide 17

Slide 17 text

© 2024, Amazon Web Services, Inc. or its affiliates. Amazon Titan Summarization, image and text generation and search, Q&A Command & Embed Text generation, search, classification Jurassic-2 Contextual answers, summarization, paraphrasing Claude 3, Claude 2.1 & Claude Instant Multimodal reasoning, vision capabilities, translation, summarization, writing, coding Stable Diffusion XL 1.0 High-quality images and art Llama 2 Dialogue use cases and language tasks Mistral 7B & Mixtral 8x7B Text summarization, Q&A, Text classification, Text completion, code generation Mistral AI Amazon Bedrock supports leading foundation models

Slide 18

Slide 18 text

© 2024, Amazon Web Services, Inc. or its affiliates. Single API to build with generative AI • Pass the model ID, type of content, and body of the request • Body includes the prompt and execution parameters • Returns model response and metadata • Handles text-to-text, text-to-image, image-to-image, and more • Supports current and future Amazon Titan models, third-party models, and even fine-tuned models

Slide 19

Slide 19 text

© 2024, Amazon Web Services, Inc. or its affiliates. Bedrock core API: InvokeModel • Amazon Titan models • Third-party models • Fine-tuned models bedrock.invoke_model( modelId = model_id, contentType = "…", accept = "…", body = body)

Slide 20

Slide 20 text

© 2024, Amazon Web Services, Inc. or its affiliates. Bedrock core API: Streaming responses response = bedrock.invoke_model_with_response_stream( modelId = model_id, body = body) stream = response.get('body') if stream: for event in stream: chunk = event.get('chunk’) if chunk: print(json.loads(chunk.get('bytes').decode())) Amazon Bedrock chunk 1 chunk N … invoke with stream 1 2 • Users can start reading the response as soon as the first chunk is available • For more information, see Amazon Bedrock Developer Guide

Slide 21

Slide 21 text

© 2024, Amazon Web Services, Inc. or its affiliates. invoke_model() – Anthropic Claude

Slide 22

Slide 22 text

© 2024, Amazon Web Services, Inc. or its affiliates. invoke_model() – Titan Embeddings “This is just some input I want to get embeddings for “ [0.752, 0.271, -0.325, 0.195, 0.837, -0.945] n-dimensional vector

Slide 23

Slide 23 text

© 2024, Amazon Web Services, Inc. or its affiliates. Integrated with LangChain • New LLM and embeddings class for Amazon Bedrock • Includes code for using the LLM class in a conversation chain • Includes code for creating an embedding from text from langchain import Bedrock from langchain.embeddings import BedrockEmbeddings llm = Bedrock() print(llm(“what is generative AI?”)) pip install langchain

Slide 24

Slide 24 text

© 2024, Amazon Web Services, Inc. or its affiliates. 24

Slide 25

Slide 25 text

© 2024, Amazon Web Services, Inc. or its affiliates. SageMaker Training and Inference Two ways to access Foundation Models on AWS Amazon SageMaker JumpStart • Machine learning (ML) hub with foundation models (public and proprietary) • Built-in algorithms, and prebuilt ML solutions • Deploy FM as SageMaker Endpoint (hosting) • Fine-tuning leverages SageMaker Training jobs • Choose SageMaker managed accelerated computing instance Prompt / text embeddings API Layer SageMaker Endpoint Foundation Models SageMaker Jumpstart Model hub, deploy, fine-tune Accelerated Computing Trn1(n), Inf2, P4d, P5 Fine-tune Amazon Bedrock • The easiest way to build and scale generative AI applications with foundation models (FMs) • Access directly or fine-tune foundation model using API • Serverless • Foundation model providers – Amazon, Anthropic, AI21, Stability, and many more. API Layer Amazon Bedrock Prompt / text embedding Fine-tune Foundation Models

Slide 26

Slide 26 text

© 2024, Amazon Web Services, Inc. or its affiliates. We need to customize a Foundation Model, why? 조선왕조실록에 기록 된 세종대왕 맥북프로 던짐 사건에 대해 알려줘 Hallucination Current Knowledge 2024년 3월 20일 Amazon 주식의 종가를 알려줘 Closed-domain Knowledge 나는 Amazon 직원인데, 고객 문의에 대한 응대 매뉴얼을 알려줘 아쉽게도, 현재 제가 직접적인 매뉴얼을 제공할 수 없습니다. Amazon에 근무하는 경우, 내부 교육 자료, 팀 리더 또는 교육 담당자와 상담하여 정확한 정보와 지침을 얻을 수 있습니다. 세종대왕이 맥북 프로를 던지는 사건은 조선왕조실록에 기록되어 있습니다. 이 사건은 1443년(세종 25년) 세종이 집현전 학자들과 함께 궁중 후원에서 열린 연회에서 벌어진 일입니다. 세종은 연회 도중에 갑자기 맥북 프로를 바닥에 던져버렸습니다. 죄송합니다만, 제 현재 정보는 2022년 3월 기준이며 최신 주식 가격 정보는 제가 알 수 없습니다. 최신 정보를 확인하려면 실시간 주식 시세 사이트나 금융 뉴스를 참고하시길 바랍니다.

Slide 27

Slide 27 text

© 2024, Amazon Web Services, Inc. or its affiliates. We need to customize a Foundation Model, why? • Specific Task • Closed-domain knowledge • Current Knowledge • Improving the performance/quality • Reduce likelihood of hallucinations Foundation Model Common Crawl Wikipedia Initial Pretraining Prompt (Question) Response (Answer) Which FMs are supported by Amazon Bedrock? Radio X FM & TV

Slide 28

Slide 28 text

© 2024, Amazon Web Services, Inc. or its affiliates. Prompt Engineering, new way of using ML! Task Description Example Input 1 Example Input 2 Example Input 3 Output Indicator Generated Response Input Output Prompt LLM

Slide 29

Slide 29 text

© 2024, Amazon Web Services, Inc. or its affiliates. Movie review sentiment classifier. Review: "I loved this movie!" This review is positive. Review: "I am not sure, I think the movie was fine.” This review is neutral. Review: "This movie was a waste of time and money" This review is negative. Review: "I really had fun watching this movie” This review is Positive Input Output Model Task Description Examples Output indicator Prompt Engineering, new way of using ML!

Slide 30

Slide 30 text

© 2024, Amazon Web Services, Inc. or its affiliates. How to do Prompt Augmentation? 맥북 프로(MacBook Pro)는 애플 매킨토시 노트북 컴퓨터 라인업으로, 2006년 1월 10일 맥월드 엑스포에서 처음 공개되었다. 세종은 조선의 제4대 왕이다. 재위 기간은 1418~1450년이며, 1418년 6월에 왕세자에 책봉되었다가 8월에 태종의 양위를 받아 즉위했다. Prompt Context Generated Response

Slide 31

Slide 31 text

© 2024, Amazon Web Services, Inc. or its affiliates. Retrieval-Augmented Generation (RAG) with LLMs Prompt + Query Search Relevant Information Knowledge Sources Query LLM Endpoint Generated Text Response 1 2 Relevant Information for Enhanced Context 3 5 4 Enhanced Context Query Prompt + +

Slide 32

Slide 32 text

© 2024, Amazon Web Services, Inc. or its affiliates. How does RAG work? User Query: HDL 콜레스테롤이 뭐지? Search Relevant Information Medical Knowledge Sources Relevant Information for Enhanced Context: HDL 콜레스테롤은 쉽게 얘기하면 좋은 콜레스테롤 이라고 말할 수 있습니다. HDL 콜레스테롤 수치가 낮다면 심혈관계 위험인자가 되기 때문에 HDL 콜레스테롤 수치를 높여야 합니다. Generated Text Response: HDL 콜레스테롤은 혈액 속 남아있는 콜레스테롤을 간으로 운반하여 배설되게 하기 때문에, 혈관을 청소하는 ‘좋은 콜레스테롤'로 불려요. 1 3 4 5 2 Query: HDL 콜레스테롤이 뭐지? Enhanced Context: HDL 콜레스테롤은 쉽게 얘기하면 좋은 콜레스테롤 이라고 말할 수 있습니다. HDL 콜레스테롤 수치가 낮다면 심혈관계 위험인자가 되기 때문에 HDL 콜레스테롤 수치를 높여야 합니다. API Layer Foundation Models

Slide 33

Slide 33 text

© 2024, Amazon Web Services, Inc. or its affiliates. Search Relevant Information for enhanced Context

Slide 34

Slide 34 text

© 2024, Amazon Web Services, Inc. or its affiliates. Golf Shoes Bag Golf Shoes Bag Golf Shoes Bag Golf Shoe Case Golf Shoes Stretcher for Golf Shoes Plastic Golf Tees. Golf Shoes Golf Shoes Golf Shoes Golf Shoes golf Golf Shoes bright color golf shoes Improving Search Results with Semantic search Without semantic search With semantic search

Slide 35

Slide 35 text

© 2024, Amazon Web Services, Inc. or its affiliates. How to capture Semantic Relationships between Words? Male-Female Verb tense King Queen Man Woman walking walked swiming swam Spain Madrid Korea Seoul Japan Tokyo China Beijing Russia Moscow U.S.A Washington D.C Country-Capital

Slide 36

Slide 36 text

© 2024, Amazon Web Services, Inc. or its affiliates. Words as Vectors “I am at AWS Unicorn Day 2024“ [0.743, 0.720, -0.325, 0.195, 0.835, -0.945] n-dimensional vector

Slide 37

Slide 37 text

© 2024, Amazon Web Services, Inc. or its affiliates. What is a vector embedding? • A numerical representation of words or sentences, used in NLP • NLP models can easily perform tasks such as querying, classification, and applying machine learning algorithms on textual data “I am at AWS Unicorn Day 2024“ [0.743, 0.720, -0.325, 0.195, 0.835, -0.945] n-dimensional vector Magnitude Direction

Slide 38

Slide 38 text

© 2024, Amazon Web Services, Inc. or its affiliates. Vector similarity search Similar vectors = similar meaning and context Deliver similarity search results based on vector proximity by comparing vector distances Your favorite song Song with a high chance of being liked Amazon Music: Recommendations

Slide 39

Slide 39 text

© 2024, Amazon Web Services, Inc. or its affiliates. Euclidean (L2) Useful for counts/measurements recommendation systems Dot product Useful for collaborative filtering Cosine similarity Useful for semantic search and document classification Similarity Metrics for Vector search

Slide 40

Slide 40 text

© 2024, Amazon Web Services, Inc. or its affiliates. Generative AI models represent unstructured data as vector embeddings Text Image Embedding model Vector embeddings Audio Unstructured data

Slide 41

Slide 41 text

© 2024, Amazon Web Services, Inc. or its affiliates. Vector embedding for Semantic search ( Vector search) Text Image Embedding model Vector embeddings Audio Unstructured data Search space Semantic search ≈

Slide 42

Slide 42 text

© 2024, Amazon Web Services, Inc. or its affiliates. Vector Database as a Semantic Search Engine Text Image Embedding model Vector embeddings Vector database Audio Unstructured data Search space

Slide 43

Slide 43 text

© 2024, Amazon Web Services, Inc. or its affiliates. RAG using Vector Database D A T A I N G E S T I O N W O R K F L O W Prompt + Query Search Relevant Information Knowledge Sources Query LLM Endpoint Generated Text Response 1 2 Relevant Information for Enhanced Context 3 5 4 Enhanced Context Query Prompt + +

Slide 44

Slide 44 text

© 2024, Amazon Web Services, Inc. or its affiliates. RAG using Vector Database D A T A I N G E S T I O N W O R K F L O W

Slide 45

Slide 45 text

© 2024, Amazon Web Services, Inc. or its affiliates. invoke_model() – Titan Embeddings “This is just some input I want to get embeddings for “ [0.752, 0.271, -0.325, 0.195, 0.837, -0.945] n-dimensional vector Amazon Bedrock

Slide 46

Slide 46 text

© 2024, Amazon Web Services, Inc. or its affiliates. Document Chunking Strategies for RAG D A T A I N G E S T I O N W O R K F L O W Vector embeddings Document Chunks Embeddings model Document Vector store

Slide 47

Slide 47 text

© 2024, Amazon Web Services, Inc. or its affiliates. Document Chunking Strategies for RAG Amazon Bedrock 및 Amazon Titan을 통해 제공될 예정인 생성형 AI(GenAI; Generative AI)와 대규모 언어 모델(LLM; Large Language Model)은 개발자와 기업 이 자연어 처리 및 이해와 관련된 전통적으 로 복잡했던 문제를 해결하는 방식을 변화 시키고 있습니다. LLM이 제공하는 몇 가지 이점으로는 고객 서비스 애플리케이션에서 더 강력하고 매력적인 대화형 AI 환경을 구 축하고, 더 직관적이고 정확한 응답 결과를 통해 직원 생산성을 향상시킨다는 것입니 다. 그러나 이러한 유스케이스를 위해서는 대 화형 경험을 구현하는 GenAI 애플리케이션 이 회사 데이터에 대한 응답을 제한하여 모 델 환각(hallucinations, 잘못된 진술)을 완 화하고 최종 사용자의 콘텐츠 액세스 권한 에 따라 응답 결과를 필터링하는 두 가지 주 요 기준을 충족하는 것이 중요합니다. Sentence splitting Amazon Bedrock 및 Amazon Titan을 통해 제공될 예정인 생성형 AI(GenAI; Generative AI)와 대규모 언어 모델(LLM; Large Language Model)은 개발자와 기업 이 자연어 처리 및 이해와 관련된 전통적으 로 복잡했던 문제를 해결하는 방식을 변화 시키고 있습니다. LLM이 제공하는 몇 가지 이점으로는 고객 서비스 애플리케이션에서 더 강력하고 매력적인 대화형 AI 환경을 구 축하고, 더 직관적이고 정확한 응답 결과를 통해 직원 생산성을 향상시킨다는 것입니 다. 그러나 이러한 유스케이스를 위해서는 대 화형 경험을 구현하는 GenAI 애플리케이션 이 회사 데이터에 대한 응답을 제한하여 모 델 환각(hallucinations, 잘못된 진술)을 완 화하고 최종 사용자의 콘텐츠 액세스 권한 에 따라 응답 결과를 필터링하는 두 가지 주 요 기준을 충족하는 것이 중요합니다. Page splitting Page overlapping Amazon Bedrock 및 Amazon Titan을 통해 제공될 예정인 생성형 AI(GenAI; Generative AI)와 대규모 언어 모델(LLM; Large Language Model)은 개발자와 기업 이 자연어 처리 및 이해와 관련된 전통적으 로 복잡했던 문제를 해결하는 방식을 변화 시키고 있습니다. LLM이 제공하는 몇 가지 이점으로는 고객 서비스 애플리케이션에서 더 강력하고 매력적인 대화형 AI 환경을 구 축하고, 더 직관적이고 정확한 응답 결과를 통해 직원 생산성을 향상시킨다는 것입니 다. 그러나 이러한 유스케이스를 위해서는 대 화형 경험을 구현하는 GenAI 애플리케이션 이 회사 데이터에 대한 응답을 제한하여 모 델 환각(hallucinations, 잘못된 진술)을 완 화하고 최종 사용자의 콘텐츠 액세스 권한 에 따라 응답 결과를 필터링하는 두 가지 주 요 기준을 충족하는 것이 중요합니다.

Slide 48

Slide 48 text

© 2024, Amazon Web Services, Inc. or its affiliates. Vector Databases for RAG in AWS Amazon OpenSearch Service Amazon DocumentDB (with MongoDB compatibility) Amazon Aurora PostgreSQL Vector Search Amazon RDS for PostgreSQL Amazon MemoryDB for Redis Amazon OpenSearch Serverless

Slide 49

Slide 49 text

© 2024, Amazon Web Services, Inc. or its affiliates. Knowledge Databases for RAG in AWS Amazon Kendra Amazon OpenSearch Service Amazon DocumentDB (with MongoDB compatibility) Amazon Aurora PostgreSQL Full-text Search Vector Search Amazon RDS for PostgreSQL Amazon MemoryDB for Redis Amazon OpenSearch Serverless Hybrid Search

Slide 50

Slide 50 text

© 2024, Amazon Web Services, Inc. or its affiliates. © 2024, Amazon Web Services, Inc. or its affiliates. RAG Implementation Patterns

Slide 51

Slide 51 text

© 2024, Amazon Web Services, Inc. or its affiliates. Retrieval-Augmented Generation (RAG) with LLMs Prompt + Query Search Relevant Information Knowledge Sources Query LLM Endpoint Generated Text Response 1 2 Relevant Information for Enhanced Context 3 5 4 Enhanced Context Query Prompt + +

Slide 52

Slide 52 text

© 2024, Amazon Web Services, Inc. or its affiliates. Knowledge Databases for RAG in AWS Amazon Kendra Amazon OpenSearch Service Amazon DocumentDB (with MongoDB compatibility) Amazon Aurora PostgreSQL Full-text Search Vector Search Amazon RDS for PostgreSQL Amazon MemoryDB for Redis Amazon OpenSearch Serverless Hybrid Search

Slide 53

Slide 53 text

© 2024, Amazon Web Services, Inc. or its affiliates. Amazon Bedrock Amazon Kendra Full-text Search – Amazon Kendra Generative AI Application User Knowledge Sources 1 2 3 5 4 Prompt + Query 1 Query 2 Relevant Information for Enhanced Context 3 Generated Text Response 5 4 Prompt + Query + Enhanced Context (Text Generation) 5

Slide 54

Slide 54 text

© 2024, Amazon Web Services, Inc. or its affiliates. Amazon Bedrock Generative AI Application User 1 2 3 5 4 Prompt + Query 1 Query 2 Relevant Information for Enhanced Context 3 Generated Text Response 5 4 Prompt + Query + Enhanced Context Amazon OpenSearch Service Full-text Search – Amazon OpenSearch (Text Generation) Knowledge Sources 5

Slide 55

Slide 55 text

© 2024, Amazon Web Services, Inc. or its affiliates. Knowledge Databases for RAG in AWS Amazon Kendra Amazon OpenSearch Service Amazon DocumentDB (with MongoDB compatibility) Amazon Aurora PostgreSQL Full-text Search Vector Search Amazon RDS for PostgreSQL Amazon MemoryDB for Redis Amazon OpenSearch Serverless Hybrid Search

Slide 56

Slide 56 text

© 2024, Amazon Web Services, Inc. or its affiliates. Amazon Aurora PostgreSQL Vector Search – Amazon Aurora PostgreSQL Generative AI Application User 1 3 4 6 5 Prompt + Query 1 Query 2 Relevant Information for Enhanced Context 4 Generated Text Response 6 5 Prompt + Query + Enhanced Context Amazon Bedrock (Text Generation) Amazon Bedrock (Vector Embedding) Knowledge Sources Vector embeddings 2 3 Embedding for Query 3 Document Chunks 6

Slide 57

Slide 57 text

© 2024, Amazon Web Services, Inc. or its affiliates. Knowledge Databases for RAG in AWS Amazon DocumentDB (with MongoDB compatibility) Amazon Aurora PostgreSQL Full-text Search Vector Search Amazon RDS for PostgreSQL Amazon MemoryDB for Redis Hybrid Search Amazon Kendra Amazon OpenSearch Service Amazon OpenSearch Serverless

Slide 58

Slide 58 text

© 2024, Amazon Web Services, Inc. or its affiliates. Amazon OpenSearch Service Hybrid (= Full-text + Vector) Search – Amazon OpenSearch Generative AI Application User 1 3 4 6 5 Prompt + Query 1 Query 2 Relevant Information for Enhanced Context 4 Generated Text Response 6 5 Prompt + Query + Enhanced Context Amazon Bedrock (Text Generation) Amazon Bedrock (Vector Embedding) Vector embeddings 2 3 Embedding for Query 3 Document Chunks Knowledge Sources 6

Slide 59

Slide 59 text

© 2024, Amazon Web Services, Inc. or its affiliates. Amazon OpenSearch - Hybrid Search Full-text search Vector search Hybrid • Pre-filtering, post-filtering, and efficient filtering options • Compound queries with various scoring and boosting options • Supports up to 1000 fields with various data types, including text, numeric, Boolean, geo-location { "query": { "script_score": { "query": { "bool": { "filter": { "range": { "price": { "lt": 8 } } } } }, "script": { "lang": "knn", "source": "knn_score", "params": { "field": "my_vector", "query_value": [ 9.9, 9.9 ], "space_type": "l2" } } } } } Full-text Search Vector Search

Slide 60

Slide 60 text

© 2024, Amazon Web Services, Inc. or its affiliates. Retrieval-Augmented Generation (RAG) with LLMs Prompt + Query Search Relevant Information Knowledge Sources Query LLM Endpoint Generated Text Response 1 2 Relevant Information for Enhanced Context 3 5 4 Enhanced Context Query Prompt + +

Slide 61

Slide 61 text

© 2024, Amazon Web Services, Inc. or its affiliates. SageMaker Training and Inference Two ways to access Foundation Models on AWS Amazon SageMaker JumpStart Amazon Bedrock Prompt / text embeddings API Layer SageMaker Endpoint Foundation Models SageMaker Jumpstart Model hub, deploy, fine-tune Accelerated Computing Trn1(n), Inf2, P4d, P5 Fine-tune API Layer Amazon Bedrock Foundation Models Prompt / text embeddings Fine-tune

Slide 62

Slide 62 text

© 2024, Amazon Web Services, Inc. or its affiliates. Amazon Bedrock Amazon Kendra Full-text Search – Amazon Kendra + SageMaker Generative AI Application User Knowledge Sources 1 2 3 5 4 Prompt + Query 1 Query 2 Relevant Information for Enhanced Context 3 Generated Text Response 5 4 Prompt + Query + Enhanced Context (Text Generation) Amazon SageMaker 5

Slide 63

Slide 63 text

© 2024, Amazon Web Services, Inc. or its affiliates. Vector Search – Amazon Aurora PostgreSQL + SageMaker Generative AI Application User 1 3 4 6 5 Prompt + Query 1 Query 2 Relevant Information for Enhanced Context 4 Generated Text Response 6 5 Prompt + Query + Enhanced Context Amazon Bedrock (Text Generation) Amazon Bedrock (Vector Embedding) Vector embeddings 2 3 Embedding for Query 3 Document Chunks Amazon SageMaker Amazon SageMaker Knowledge Sources Amazon Aurora PostgreSQL 6

Slide 64

Slide 64 text

© 2024, Amazon Web Services, Inc. or its affiliates. Amazon OpenSearch Service Hybrid Search – Amazon OpenSearch + SageMaker Generative AI Application User 1 3 4 6 5 Prompt + Query 1 Query 2 Relevant Information for Enhanced Context 4 Generated Text Response 6 5 Prompt + Query + Enhanced Context Amazon Bedrock (Text Generation) Amazon Bedrock (Vector Embedding) Vector embeddings 2 3 Embedding for Query 3 Document Chunks Amazon SageMaker Amazon SageMaker Knowledge Sources 6

Slide 65

Slide 65 text

© 2024, Amazon Web Services, Inc. or its affiliates. DEMO: Generative AI application with RAG approach Amazon Kendra Index with Data Ingested User 1 6 Generative AI Application Prompt + Query 1 Query 2 Relevant Information for Enhanced Context 3 Generated Text Response 5 4 Prompt + Query + Enhanced Context Response from LLM to User 6 5 4 Amazon SageMaker (LLM for Text Generation) 2 3 (source: https://tinyurl.com/y3b7fvfh)

Slide 66

Slide 66 text

© 2024, Amazon Web Services, Inc. or its affiliates. However, when it comes to implementing RAG, there are challenges… Creating vector embeddings for large volumes of data Orchestration Managing multiple data sources Scaling retrieval mechanism Coding effort Incremental updates to vector store

Slide 67

Slide 67 text

© 2024, Amazon Web Services, Inc. or its affiliates. © 2024, Amazon Web Services, Inc. or its affiliates. Knowledge Bases for Amazon Bedrock

Slide 68

Slide 68 text

© 2024, Amazon Web Services, Inc. or its affiliates. A N S W E R Q U E S T I O N V E C T O R D A T A B A S E B E D R O C K F O U N D A T I O N M O D E L Knowledge bases for Amazon Bedrock U S E R E T R I E V A L A U G M E N T E D G E N E R A T I O N ( R A G ) Connect FMs to data sources including vector engine for Amazon OpenSearch Serverless, Amazon Aurora PostgreSQL, Pinecone, and Redis Enterprise Cloud Enable automatic data source detection Easily retrieve relevant data and augment prompt Provide citations

Slide 69

Slide 69 text

© 2024, Amazon Web Services, Inc. or its affiliates. Large Language Model Prompt augmentation Response Embeddings model Data source Vector store Embeddings model Embedding User User Input Context -0.02 0.89 -0.38 -0.53 0.95 0.17 Text Generation Workflow Data Ingestion Workflow Semantic search Document chunks End-to-End RAG Workflow

Slide 70

Slide 70 text

© 2024, Amazon Web Services, Inc. or its affiliates. RAG using Vector Database D A T A I N G E S T I O N W O R K F L O W Vector embeddings Document Chunks Embeddings model Data source Vector store

Slide 71

Slide 71 text

© 2024, Amazon Web Services, Inc. or its affiliates. RAG using Knowledge Bases for Amazon Bedrock D A T A I N G E S T I O N W O R K F L O W Vector embeddings Document Chunks Embeddings model Data source Vector store Knowledge Bases for Amazon Bedrock

Slide 72

Slide 72 text

© 2024, Amazon Web Services, Inc. or its affiliates. Data Ingestion Workflow K N O W L E D G E B A S E S F O R A M A Z O N B E D R O C K • Choose your data source (Amazon S3) • Support for incremental updates • Multiple data file formats supported • Choose your chunking strategy • Fixed chunks • No chunking • Default (200 tokens) • Choose your embedding model • Amazon Titan • Cohere • Choose your vector store • Open Search Serverless • Amazon Aurora PostgreSQL • Pinecone • Redis Enterprise Embeddings model New data Fully managed data ingestion workflow Document chunks Data source Vector store

Slide 73

Slide 73 text

© 2024, Amazon Web Services, Inc. or its affiliates. Fully managed data ingestion K N O W L E D G E B A S E S F O R A M A Z O N B E D R O C K • Choose your data source (Amazon S3) • Support for incremental updates • Multiple data file formats supported • Choose your chunking strategy • Fixed chunks • No chunking • Default (200 tokens) • Choose your embedding model • Amazon Titan • Cohere • Choose your vector store • Open Search Serverless • Amazon Aurora • Pinecone • Redis Enterprise Embeddings model New data Fully managed data ingestion workflow Document chunks Data source Vector store Automated and fully managed data ingestion using Knowledge Bases for Amazon Bedrock

Slide 74

Slide 74 text

© 2024, Amazon Web Services, Inc. or its affiliates. Retrieval and Generate Large Language Model Prompt augmentation Response Knowledge Bases for Amazon Bedrock Embeddings model Embedding User User Input Context -0.02 0.89 -0.38 -0.53 0.95 0.17 Text Generation Workflow Semantic search K N O W L E D G E B A S E S F O R A M A Z O N B E D R O C K

Slide 75

Slide 75 text

© 2024, Amazon Web Services, Inc. or its affiliates. RetrieveAndGenerate API K N O W L E D G E B A S E S F O R A M A Z O N B E D R O C K Response User User Input RetrieveAndGenerate API Generate query embedding Retrieve similar documents from knowledge bases Augment query with retrieved documents Generate response from LLM User query Generated response Fully managed RAG

Slide 76

Slide 76 text

© 2024, Amazon Web Services, Inc. or its affiliates. Customize RAG workflows using Retrieve API K N O W L E D G E B A S E S F O R A M A Z O N B E D R O C K Large Language Model Prompt augmentation Response User User Input Context Customized RAG workflow Retrieve API Generate query embedding Retrieve similar documents from knowledge bases User query Retrieved documents

Slide 77

Slide 77 text

© 2024, Amazon Web Services, Inc. or its affiliates. Amazon OpenSearch Serverless RAG using Amazon Bedrock and OpenSearch Serverless Generative AI Application User 1 3 4 6 5 Prompt + Query 1 Query 2 Relevant Information for Enhanced Context 4 Generated Text Response 6 5 Prompt + Query + Enhanced Context Amazon Bedrock (Text Generation) Amazon Bedrock (Vector Embedding) Vector embeddings 2 3 Embedding for Query 3 Document Chunks Knowledge Sources 6

Slide 78

Slide 78 text

© 2024, Amazon Web Services, Inc. or its affiliates. Knowledge Bases for Amazon Bedrock Amazon OpenSearch Serverless RAG using Knowledge Bases for Amazon Bedrock User 1 3 4 6 5 Prompt + Query 1 Query 2 Relevant Information for Enhanced Context 4 Generated Text Response 6 5 Prompt + Query + Enhanced Context Amazon Bedrock (Text Generation) Amazon Bedrock (Vector Embedding) 2 3 Embedding for Query 3 2 6 6 Generative AI Application

Slide 79

Slide 79 text

© 2024, Amazon Web Services, Inc. or its affiliates. © 2024, Amazon Web Services, Inc. or its affiliates. Knowledge Bases integration with LangChain

Slide 80

Slide 80 text

© 2024, Amazon Web Services, Inc. or its affiliates. from langchain import Bedrock from langchain.retrievers.bedrock import AmazonKnowledgeBasesRetriever from langchain.chains import RetrievalQA bedrock_client = boto3.client(“bedrock-runtime”) model_kwargs_claude = {“temperature”: 0, “top_k”: 10, “max_tokens_to_sample”: 3000} llm = Bedrock(model_id=“anthropic.claude-instant-v1”, model_kwargs=model_kwargs_claude, client=bedrock_client) retriever = AmazonKnowledgeBaseRetriever(knowledge_base_id=kb_id, retrieval_config={“vectorSearchConfiguration”: {“numberOfResults”: 3}}) qa = RetrievalQA.from_chain_type(llm=llm, chain_type=“stuff”, retriever=retriever, return_source_documents=True, chain_type_kwargs={“prompt”: claude_prompt}) answer = qa(query) print(answer)

Slide 81

Slide 81 text

© 2024, Amazon Web Services, Inc. or its affiliates. from langchain import Bedrock from langchain.retrievers.bedrock import AmazonKnowledgeBasesRetriever from langchain.chains import RetrievalQA bedrock_client = boto3.client(“bedrock-runtime”) model_kwargs_claude = {“temperature”: 0, “top_k”: 10, “max_tokens_to_sample”: 3000} llm = Bedrock(model_id=“anthropic.claude-instant-v1”, model_kwargs=model_kwargs_claude, client=bedrock_client) retriever = AmazonKnowledgeBaseRetriever(knowledge_base_id=kb_id, retrieval_config={“vectorSearchConfiguration”: {“numberOfResults”: 3}}) qa = RetrievalQA.from_chain_type(llm=llm, chain_type=“stuff”, retriever=retriever, return_source_documents=True, chain_type_kwargs={“prompt”: claude_prompt}) answer = qa(query) print(answer)

Slide 82

Slide 82 text

© 2024, Amazon Web Services, Inc. or its affiliates. © 2024, Amazon Web Services, Inc. or its affiliates. Demo - Knowledge Bases for Amazon Bedrock

Slide 83

Slide 83 text

© 2024, Amazon Web Services, Inc. or its affiliates. Demo 83 Demo - Knowledge Bases for Amazon Bedrock

Slide 84

Slide 84 text

© 2024, Amazon Web Services, Inc. or its affiliates. RAG use cases Improved content quality E.g., helps in reducing hallucinations and connecting with recent knowledge including enterprise data Contextual chatbots and question answering E.g., enhance chatbot capabilities by integrating with real-time data Personalized search E.g., searching based on user previous search history and persona Real-time data summarization E.g., retrieving and summarizing transactional data from databases, or API calls

Slide 85

Slide 85 text

© 2024, Amazon Web Services, Inc. or its affiliates. Recap Knowledge Bases for Amazon Bedrock RAG concepts Vector Database A N S W E R Q U E S T I O N V E C T O R D A T A B A S E B E D R O C K F O U N D A T I O N M O D E L Amazon SageMaker Amazon Bedrock Gen AI

Slide 86

Slide 86 text

© 2024, Amazon Web Services, Inc. or its affiliates. Resources [1] Generative AI on Amazon SageMaker Workshop https://catalog.us-east-1.prod.workshops.aws/workshops/972fd252-36e5-4eed-8608-743e84957f8e/en-US [2] Building with Amazon Bedrock and LangChain Workshop https://catalog.workshops.aws/building-with-amazon-bedrock/en-US [3] Building a Chatbot using Amazon Bedrock and Kendra Workshop https://catalog.us-east-1.prod.workshops.aws/workshops/27eb3134-4f33-4689-bb73-269e4273947a/en-US [4] RAG Architecture Recipes with sample codes https://www.awsstartup.io/startup-recipe/architecture-recipes#generative-ai

Slide 87

Slide 87 text

© 2024, Amazon Web Services, Inc. or its affiliates. © 2024, Amazon Web Services, Inc. or its affiliates. 여러분의 소중한 피드백을 기다립니다. 강연 종료 후, 강연 평가에 참여해주세요!

Slide 88

Slide 88 text

© 2024, Amazon Web Services, Inc. or its affiliates. © 2024, Amazon Web Services, Inc. or its affiliates. 감사합니다