Building RAG with Java and Semantic Kernel

Agenda 1. Chat with your private data 2. What is
RAG 3. Java RAG solution deep dive

App or Copilot agent API & SDK Azure OpenAI Words,
pdf, markdown, websites, etc. databases, datalakes, rest api, etc. Business Data & API Documents Chat with your private data

Azure AI Chat Java Reference Template All Azure AI App
templates: https://aka.ms/azai Java implementation: https://github.com/Azure-Samples/azure-search-openai- demo-java

Azure AI Chat Java Reference Template App Service | Azure
Container Apps | Azure Kubernetes Service

1. Project Setup Options: • Github Codespaces → • VS
Code with Dev Containers extension • Local Development •Java 17 •Maven 3.8.x •Azure Developer CLI •Node.js 14+ 1 •Git •Powershell 7+ (pwsh) 2. Cd in one deployment option folder • deploy/aca • deploy/aks • deploy/app-service 3. Run ‘azd auth login’ and ‘azd up’ Getting Started

What is RAG

GPT-35-turbo Working with Open AI GPT-4o ADA Chat Completion API
Embedding API prompt AI App Devs completion Azure Open AI Hosted on Azure [ 13 33 34 … ] text embedddings

Documents Chat Flow Retrieval System Azure Storage Azure Open AI
gpt-35-turbo Ask question Indexing Flow Move to cloud Admins Configure/Manage Azure AI Services Document Intelligence Users Push/Pull data Text extract Index content Generate Embeddings Search Info Generate Answer Info Retriever AI Orchestration Data Loading Data Chunking Embeddings Generation Full Text Vector Store gpt4 ada-2 Retrieval Augmented Generation flow – for private documents

Solution Deep Dive

Retrieval Augmented Generation – Building blocks Prompt Templating Documents Processing
LLM Models • Gpt-3.5-turbo • Gpt-4-turbo • Gpt-4o • ADA-002 • Embeddings • Semantic search • Handlebars • Jinja2 • Chat History • Function Calls • Planning • Agents • Built-in chains Vector Storage • Documents Loader • Document Parser • Text Chunker Orchestration

Semantic Kernel https://github.com/microsoft/semantic-kernel-java Microsoft Open Source Languages: .Net, Python, Java
AI Services abstract LLM models capabilities and interactions. Plugins are SK units of work orchestrated by the kernel.

Java RAG Implementation Options Plain Java Azure SDK • Low
level AI orchestration • Useful to understand RAG behind the scenes • Streaming support • Ton of boilerplate code Semantic kernel • Simplified AI orchestration • Common RAG building blocks abstractions • Doesn’t support streaming (yet)

Project Structure Folder Description app App code components deploy Infrastructure
code. Bicep and Azure developer CLI configuration. Indexing scripts. app/backend Chat flow - Java app/frontend Chat UI – React application app/indexer Indexing flow - Java deploy/app-service App Service. 1 App. Frontend and backend are colocated 1 springboot app. indexer runs locally. deploy/aca Azure Container Apps. Microservice architecture. 2 spring boot apps. 1 Ngnix app. deploy/aks Azure Kubernetes Service. Microservice architecture. 2 spring boot apps. 1 Ngnix app. data Preloaded pdf documents about job description, company policies and benefits

Implementing RAG Chat flow

React API client Single Page Application React and typescript api.ts
entry point https://github.com/Azure-Samples/azure-search-openai-demo-java/tree/main/app/frontend/src/api https://github.com/Azure-Samples/azure-search-openai-demo-java/tree/main/app/frontend/src/api

Chat Endpoint Rest Controller providing entry point for the RAG
chat. APPLICATION_NDJSON_VALUE based API is used for streaming response. Streaming supported only for PlainJavaChatApproach. https://github.com/Azure-Samples/azure-search-openai-demo-java/blob/main/app/backend/src/main/java/com/microsoft/openai/samples/rag/chat/controller/ChatController.java https://github.com/Azure-Samples/azure-search-openai-demo- java/blob/main/app/backend/src/main/java/com/microsoft/openai/samples/rag/chat/controller/ChatController.java

Semantic Kernel Plugins and Functions RAG plugin with semantic functions
• AnswerConversation • ExtractKeywords https://github.com/Azure-Samples/azure-search-openai-demo-java/blob/main/app/backend/src/main/java/com/microsoft/openai/samples/rag/retrieval/semantickernel/AzureAISearchPlugin.java https://github.com/Azure-Samples/azure-search-openai-demo- java/blob/main/app/backend/src/main/java/com/microsoft/openai/samples/rag/retrieval/ semantickernel/AzureAISearchPlugin.java InformationFinder plugin with native functions • searchFromConversation azure-search-openai-demo-java/app/backend/src/main/resources/semantickernel/Plugins/RAG at main · Azure-Samples/azure-search-openai-demo-java · GitHub

Prompt Template and Chat Completion configuration with SK Handlebars templating
engine • Parameters replacement. • Loops, conditional logic. • Direct call to plugin functions. Chat completion parameters: • Temperature • top_p • max_tokens https://github.com/Azure-Samples/azure-search-openai-demo- java/tree/main/app/backend/src/main/resources/semantickernel/Plugins/RAG/AnswerConversation

Azure AI Search Retriever • Azure AI Search API to
retrieve documents from the search index. • Query keywords are extracted from the whole chat conversation with additional call to Open AI. • Retrieval mode: text, vectors, hybrid. • OpenAI embedding API to convert the user's query text to an embeddings vector ( vector or hybrid) • Hybrid search improve search results mixing text search and vector search. • Can be further simplified with SK VectorStore abstraction and AzureAISearchVectoreStore implementation. No need to create an explicit database search plugin, provides features for performing similarity searches over databases https://github.com/Azure-Samples/azure-search-openai-demo-java/blob/main/app/backend/src/main/java/com/microsoft/openai/samples/rag/retrieval/AzureAISearchRetriever.java https://github.com/Azure-Samples/azure-search-openai-demo- java/blob/main/app/backend/src/main/java/com/microsoft/openai/samples/rag/retrieval/AzureAISearc hRetriever.java

AzureAI Search Retriever as SK native plugin Plain java class.
@DefineKernelFunction class method annotation. @DefineFunctionParameter method parameters annotation. https://github.com/Azure-Samples/azure-search-openai-demo- java/blob/main/app/backend/src/main/java/com/microsoft/openai/samples/rag/retrieval/semantickernel/A zureAISearchPlugin.java

RAG Flow with SK 1. Configure the kernel: • AI
service backed by AzureOpenAI client. • AnswerConversation semantic plugin function as external file. • Informationfinder native plugin function as decorated Java class. 2. Implement the chat flow: • Retrieve relevant documents using the chat conversation. Ask the kernel to trigger SearchFromConversation specific native plugin function. • Build a SK function context with retrieved sources and chat conversation. • Ask the kernel to generate an answer using AnswerConversation function from RAG plugin providing the function arguments. https://github.com/Azure-Samples/azure-search-openai-demo-java/blob/main/app/backend/src/main/java/com/microsoft/openai/samples/rag/chat/approaches/semantickernel/JavaSemanticKernelChainsChatApproach.java https://github.com/Azure-Samples/azure-search-openai-demo- java/blob/main/app/backend/src/main/java/com/microsoft/openai/samples/rag/chat/approaches/semantick ernel/JavaSemanticKernelChainsChatApproach.java

Implementing RAG Indexing flow

CLI AddCommand https://github.com/Azure-Samples/azure-search-openai-demo- java/blob/main/app/indexer/cli/src/main/java/com/microsoft/openai/samples/indexer/CLI.java PICO CLI based implementation Triggered by
deploy/app- service/scripts/prepdocs scripts AddCommand - Indexing running locally (App Service option): 1. Create Azure AI Search index fields 2. Scan local directory 3. Use DocumentProcessor to orchestrate indexing. 4. Upload file to Azure Blob Storage for citations detail

DocumentProcessor Orchestrate document processing logic 1. Parses PDF file into
pages 2. Split pages into text chunks 3. Load text chunks in Azure AI Search Index Text chunks embeddings are generated and stored in Azure AI Search index https://github.com/Azure-Samples/azure-search-openai-demo- java/blob/main/app/indexer/core/src/main/java/com/microsoft/openai/samples/indexer/DocumentProcess or.java

PDF Parser with Document Intelligence Use Azure Document Intelligence OCR
capabilities Tabular data are converted to html tables text For simple document (no tabular data) use local pdf parser ItextPdfParser.java https://github.com/Azure-Samples/azure-search-openai-demo- java/blob/main/app/indexer/core/src/main/java/com/microsoft/openai/samples/indexer/parser/Documen ntelligencePDFParser.java

Text Splitter https://github.com/Azure-Samples/azure-search-openai-demo- java/blob/main/app/indexer/core/src/main/java/com/microsoft/openai/samples/indexer/parser/Te xtSplitter.java Split page text in smaller
sections Handle sections overlaps Handle texts with html tables Section length and overlap can be configured. Default: • Section length: 1000 chars • Overlap length: 100 chars

Index Manager Create index resources in Azure AI Search Upload
index with text sections and related embeddings vector • Text chunk is stored in ‘content’ index field • Embeddings is vector stored in ‘embedding’ index field • Original file name, page numbers, category are stored as additional medatada EmbeddingService abstracts use of Azure Open AI embedding model • Embeddings requests arranged as batch to improve performance • Retries with exponential backoff policy in case of http throttling

• Custom PDF in data folder • It’s not a
delta process • Java process running locally when App Service deploy is selected. • Indexer microservice running on containers orchestrator when ACA or AKS is selected. Custom Data Ingestion/Indexing

• Intro to intelligent apps, grounding, vector databases • Prompt
engineering • Grouding LLMs • Chunking large documents • Semantic kernel • Hybrid Retrievial in Azure AI Search Learn More

Thanks

Building RAG with Java and Semantic Kernel

Building RAG with Java and Semantic Kernel

Davide Antelmo

Featured

Transcript