The Ultimate RAG Showdown

https://my.prairie.cards/u/moritalous The Ultimate RAG Showdown Kendra, KB for Bedrock, etc...

https://my.prairie.cards/u/moritalous Self-introduction Name: Morita Kazuaki AWS Ambassador (2023-) AWS Top
Engineer（2020-） AWS All Certifications Engineer (2024) AWS Community Builder (2024) X / Qiita / GitHub : @moritalous 2 「Jumping deer with japanese temple」 Created by Amazon Titan Image Generator

https://my.prairie.cards/u/moritalous 3 I published a book on Bedrock (co-authored)

https://my.prairie.cards/u/moritalous What is RAG? • RAG (Retrieval-Augmented Generation) is a
technique that provides external information to generative AI to produce answers. • It helps prevent generative AI from producing “hallucinations”. 4

https://my.prairie.cards/u/moritalous What is RAG? 5 Source: Gartner https://www.gartner.co.jp/ja/newsroom/press-releases/pr-20240807-future-oriented-infra-tech-hc Peak of
inflated expectations

https://my.prairie.cards/u/moritalous I have too high expectations 6

https://my.prairie.cards/u/moritalous Compare various RAG configurations and compete against each other.
7

https://my.prairie.cards/u/moritalous Knowledge bases for Amazon Bedrock 8 Nickname: KB4AB Entry
No.1

https://my.prairie.cards/u/moritalous What is Knowledge bases for Amazon Bedrock? 9 •
Bedrock capabilities for building RAG • Can be built using only the management console • Feature updates are also active.

https://my.prairie.cards/u/moritalous Knowledge bases for Amazon Bedrock Knowledge bases for Amazon
Bedrock architecture 10 Embeddings Embeddings Answer Generation Text Extraction Chunk Split OpenSearch Serverless retrieval S3 Question Answer

https://my.prairie.cards/u/moritalous retrieve and generate answer with a single API call
def retrieve_and_generate(question: str): response = client.retrieve_and_generate( input={"text": question}, retrieveAndGenerateConfiguration={ "knowledgeBaseConfiguration": { "knowledgeBaseId": knowledgeBaseId, "modelArn": modelArn, "orchestrationConfiguration": { "queryTransformationConfiguration": {"type": "QUERY_DECOMPOSITION"} }, "retrievalConfiguration": { "vectorSearchConfiguration": {"overrideSearchType": "HYBRID"} }, }, "type": "KNOWLEDGE_BASE", }, ) return response 11 Retrieve and generate answer with a single API call (there is also an API that only performs retrieve)

https://my.prairie.cards/u/moritalous Scoring of KB4AB • Ease of environment construction: ☆☆☆
Can be constructed simply by operating the management console There is also a "Quick create" feature that automatically creates OpenSearch Serverless. • Abundant features: ☆☆ There are frequent feature updates, and recently a feature to build Advanced RAG has been added. You can easily apply selected RAG optimization methods. • Extensibility: ☆ Even when new methods or new LLMs emerge, they cannot be used immediately. • Japanese Support: ☆ The OpenSearch Serverless index created by the quick creation function does not include settings for Japanese. 12

https://my.prairie.cards/u/moritalous The Ultimate RAG Showdown 13 KB4AB Ease of environment
construction ☆☆☆ Abundant features ☆☆ Extensibility ☆ Japanese Support ☆

https://my.prairie.cards/u/moritalous Build generative AI apps with Kendra 14 Nickname: KendRAG
Entry No.2

https://my.prairie.cards/u/moritalous What is Kendra? 15 • Managed enterprise search service
• A wide range of data source connectors are available • Supports not only document searches but also FAQ-style searches

https://my.prairie.cards/u/moritalous GenAI App KendRAG architecture Answer Generation Retrieval Generate search
query 16 Bedrock Bedrock Text Extraction Chunk Split Kendra S3 Question Answer

https://my.prairie.cards/u/moritalous Process 1) Generate search query def generate_search_query(question: str): result
= bedrock_runtime.converse( modelId="cohere.command-r-plus-v1:0", additionalModelRequestFields={"search_queries_only": True}, additionalModelResponseFieldPaths=["/search_queries"], messages=[ { "role": "user", "content": [{"text": question}], } ], ) return list( map( lambda x: x["text"], result["additionalModelResponseFields"]["search_queries"], ) ) 17 A process to create a query from a user's question before searching. A function provided by Cohere Command R/R+ Example: "Which regions have Kendra and Bedrock with Claude 3.5?" - Regions where Kendra is provided - Regions where Bedrock has Claude 3.5

https://my.prairie.cards/u/moritalous Process 2) Retrieval def fetching_relevant_documents(queries: list[str]): items = []
for query in queries: response = kendra.retrieve( IndexId=kendra_index_id, QueryText=query, AttributeFilter={ "EqualsTo": {"Key": "_language_code", "Value": {"StringValue": "ja"}} }, ) items.extend( list( map( lambda x: {k: v for k, v in x.items() if k in ["Id", "DocumentId", "DocumentTitle", "Content", "DocumentURI"]}, response["ResultItems"], ) ) ) return items 18 fetching relevant documents from Kendra

https://my.prairie.cards/u/moritalous Process 3) Answer Generation def generating_response(question: str, documents: list[str]):
result = bedrock_runtime.converse( modelId="cohere.command-r-plus-v1:0", additionalModelRequestFields={"documents": documents}, messages=[ { "role": "user", "content": [{"text": question}], } ], ) return result["output"]["message"]["content"][0]["text"] 19 Processing to generate answers in Bedrock Works well with the Cohere Command R API

https://my.prairie.cards/u/moritalous Scoring of KendRAG • Ease of environment construction: ☆☆
It is necessary to build a generative AI app, but since there are many generative AI frameworks such as LangChain, it is not that difficult. • Abundant features: ☆ You can use the search function provided by Kendra. The connection with the generation AI needs to be developed. • Extensibility: ☆☆☆ It is possible to try and incorporate various RAG accuracy improvement techniques. It is also easy to change the generative AI and search database. • Japanese Support: ☆☆ Kendra officially supports Japanese 20

https://my.prairie.cards/u/moritalous The Ultimate RAG Showdown 21 KB4AB KendRAG Ease of
environment construction ☆☆☆ ☆☆ Abundant features ☆☆ ☆ Extensibility ☆ ☆☆☆ Japanese Support ☆ ☆☆

https://my.prairie.cards/u/moritalous Building RAG API with OpenSearch 22 Nickname: OpenSearchRAG Entry
No.3

https://my.prairie.cards/u/moritalous • A service that provides the open source OpenSearch
managed by AWS • Actively adding features that can be used for RAG ◦ Vector search, Neural search, Hybrid search ◦ Integration with external AI models such as Bedrock and SageMaker ◦ Text chunking ◦ Reranking ◦ Conversational search, RAG What is OpenSearch Service? 23

https://my.prairie.cards/u/moritalous Search pipeline OpenSearch Service Ingest pipeline OpenSearchRAG architecture 24
Bedrock Text Extraction Data Source Question Answer Bedrock SageMaker Bedrock Embeddings Retrieval Reranking Answer Generation Chunk Split Embeddings

https://my.prairie.cards/u/moritalous OpenSearch’s search API def search(query: str): response = client.search(
index=index_name, body={ "_source": {"exclude": ["body_chunk_embedding"]}, "query": { "hybrid": { "queries": [ {"match": {"body_chunk": {"query": query,}}}, {"nested": { "score_mode": "max", "path": "body_chunk_embedding", "query": { "neural": { "body_chunk_embedding.knn": { "query_text": query, "model_id": titan_model_id, }}},}},],}}, "ext": { "rerank": {"query_context": {"query_text": query,},}, "generative_qa_parameters": { "llm_model": "litellm", "llm_question": query, "context_size": 4, },},}, params={"search_pipeline": "hybrid-rerank-search-pipeline"}, ) 25 context = list(map(lambda x: x["_source"], response["hits"]["hits"])) for tmp in context: del tmp["body_chunk"] return { "answer": response["ext"]["retrieval_augmented_generation"]["answer"], "context": context, } By defining a search pipeline, you can get RAG results just by calling the search API.

https://my.prairie.cards/u/moritalous Scoring of OpenSearchRAG • Ease of environment construction: ☆
Constructed by combining various OpenSearch Service functions The OpenSearch documentation only explains individual functions, so construction is difficult • Abundant features: ☆☆ Actively expanding functions with RAG in mind, allowing hybrid search, reranking, chunk splitting, etc. • Extensibility: ☆ Realized within the range of functions supported by OpenSearch • Japanese Support: ☆☆ Searches tailored to Japanese can be performed using the kuromoji and Sudachi plugins 26

https://my.prairie.cards/u/moritalous The Ultimate RAG Showdown 27 KB4AB KendRAG OpenSearchRAG Ease
of environment construction ☆☆☆ ☆☆ ☆ Abundant features ☆☆ ☆ ☆☆ Extensibility ☆ ☆☆☆ ☆ Japanese Support ☆ ☆☆ ☆☆

https://my.prairie.cards/u/moritalous Accuracy evaluation of RAG 28

https://my.prairie.cards/u/moritalous Accuracy evaluation of RAG • The evaluation was carried
out using “Ragas”, a framework for quantitatively evaluating the accuracy of RAG. • The following four indicators were used: 29 https://docs.ragas.io/en/stable/concepts/metrics/index.html

https://my.prairie.cards/u/moritalous Verification conditions for accuracy evaluation • AWS What's New
articles published in 2024 in Japanese (1,267 articles in total) • Questions and answers were generated using Ragas and used as test data (200 articles) • Test data generation and evaluation were performed using GPT-4o mini 30 Ragas generated question How does billable usage now appear in Amazon CloudWatch metrics for AWS Config? Ragas generated answer Amazon CloudWatch metrics for monitoring AWS Config data usage now display only billable usage, eliminating the need to show non-billable usage, enabling you to validate your AWS Config configuration and usage, and associate billable usage with associated costs.

https://my.prairie.cards/u/moritalous Specific conditions for each architecture 31 KB4AB KendRAG OpenSearchRAG
LLM for answer generation Claude 3 Sonnet Command R+ Mistral Large 2 LLM for embeddings Titan Embeddings v2 unused Titan Embeddings v2 Other features • Hierarchical chunking • Break down queries • Hybrid search • Generate search query • Japanese Index • Sudachi analyzer • Reranking with Cohere Rerank3 • Hybrid search • Text chunking • Japanese Support

https://my.prairie.cards/u/moritalous Results announcement 32

https://my.prairie.cards/u/moritalous Accuracy evaluation results 33 KB4AB (Claude 3 Sonnet) KendRAG
(Command R+) OpenSearchRAG (Mistral Large 2) faithfulness 0.7904 0.8423 0.6197 answer_relevancy 0.6914 0.6723 0.7345 context_precision 0.9346 0.8814 0.8981 context_recall 0.9675 0.9342 0.9400

https://my.prairie.cards/u/moritalous Example of answer generation (KB4AB) 34 Ragas generated question
How does billable usage now appear in Amazon CloudWatch metrics for AWS Config? KB4AB generated answer Amazon CloudWatch metrics for AWS Config now show only billable usage. They no longer show usage for resource types that are not recorded, providing a clearer view of your billable usage. This enables you to use CloudWatch metrics to validate your AWS Config configuration and usage, and to correlate billable usage with associated costs.

https://my.prairie.cards/u/moritalous Example of answer generation (KendRAG) 35 Ragas generated question
How does billable usage now appear in Amazon CloudWatch metrics for AWS Config? KendRAG generated answer AWS Config monitors resource configuration changes by generating configuration items (CIs) for each resource type recorded. Now you can see only the usage that is billable because with this launch, Amazon CloudWatch metrics for AWS Config usage only show usage for the resource types that are recorded.

https://my.prairie.cards/u/moritalous Example of answer generation (OpenSearchRAG) 36 Ragas generated question
How does billable usage now appear in Amazon CloudWatch metrics for AWS Config? OpenSearchRAG generated answer The difference between AWS Config and Amazon CloudWatch Metrics is that AWS Config is a service for recording configuration changes of resources and auditing for compliance, while Amazon CloudWatch Metrics is a service for collecting and monitoring performance data of resources. AWS Config helps you track configuration changes of resources and check compliance, while CloudWatch Metrics helps you monitor the performance of your resources in real time.

https://my.prairie.cards/u/moritalous Accuracy evaluation considerations 37 • The evaluation results varied
widely, so it is not possible to say which configuration is best. • When I looked at the responses individually, I felt that none of the configurations performed poorly and had a certain level of performance. • The results may change depending on the evaluation conditions. “Evaluation of the evaluation method” may be necessary. ◦ LLM used for evaluation ◦ Document format used for evaluation • I hope that RAG evaluation will be possible with Bedrock's model evaluation function.

https://my.prairie.cards/u/moritalous Overall results 38 KB4AB KendRAG OpenSearchRAG Ease of environment
construction ☆☆☆ ☆☆ ☆ Abundant features ☆☆ ☆ ☆☆ Extensibility ☆ ☆☆☆ ☆ Japanese Support ☆ ☆☆ ☆☆ Accuracy ☆☆ ☆☆ ☆☆

https://my.prairie.cards/u/moritalous The verification code is published on GitHub. 39 I
had a hard time building the OpenSearchRAG configuration, so please take a look. https://github.com/moritalous/ultimate_rag_showdown

https://my.prairie.cards/u/moritalous ARIGATO 40

The Ultimate RAG Showdown

The Ultimate RAG Showdown

More Decks by moritalous

Featured

Transcript