Building RAG with Java and Semantic Kernel

Slide 1

Slide 1 text

Slide 2

Slide 2 text

Agenda 1. Chat with your private data 2. What is RAG 3. Java RAG solution deep dive

Slide 3

Slide 3 text

App or Copilot agent API & SDK Azure OpenAI Words, pdf, markdown, websites, etc. databases, datalakes, rest api, etc. Business Data & API Documents Chat with your private data

Slide 4

Slide 4 text

Azure AI Chat Java Reference Template All Azure AI App templates: https://aka.ms/azai Java implementation: https://github.com/Azure-Samples/azure-search-openai- demo-java

Slide 5

Slide 5 text

Azure AI Chat Java Reference Template App Service | Azure Container Apps | Azure Kubernetes Service

Slide 6

Slide 6 text

1. Project Setup Options: • Github Codespaces → • VS Code with Dev Containers extension • Local Development •Java 17 •Maven 3.8.x •Azure Developer CLI •Node.js 14+ 1 •Git •Powershell 7+ (pwsh) 2. Cd in one deployment option folder • deploy/aca • deploy/aks • deploy/app-service 3. Run ‘azd auth login’ and ‘azd up’ Getting Started

Slide 7

Slide 7 text

What is RAG

Slide 8

Slide 8 text

GPT-35-turbo Working with Open AI GPT-4o ADA Chat Completion API Embedding API prompt AI App Devs completion Azure Open AI Hosted on Azure [ 13 33 34 … ] text embedddings

Slide 9

Slide 9 text

Documents Chat Flow Retrieval System Azure Storage Azure Open AI gpt-35-turbo Ask question Indexing Flow Move to cloud Admins Configure/Manage Azure AI Services Document Intelligence Users Push/Pull data Text extract Index content Generate Embeddings Search Info Generate Answer Info Retriever AI Orchestration Data Loading Data Chunking Embeddings Generation Full Text Vector Store gpt4 ada-2 Retrieval Augmented Generation flow – for private documents

Slide 10

Slide 10 text

Solution Deep Dive

Slide 11

Slide 11 text

Retrieval Augmented Generation – Building blocks Prompt Templating Documents Processing LLM Models • Gpt-3.5-turbo • Gpt-4-turbo • Gpt-4o • ADA-002 • Embeddings • Semantic search • Handlebars • Jinja2 • Chat History • Function Calls • Planning • Agents • Built-in chains Vector Storage • Documents Loader • Document Parser • Text Chunker Orchestration

Slide 12

Slide 12 text

Semantic Kernel https://github.com/microsoft/semantic-kernel-java Microsoft Open Source Languages: .Net, Python, Java AI Services abstract LLM models capabilities and interactions. Plugins are SK units of work orchestrated by the kernel.

Slide 13

Slide 13 text

Java RAG Implementation Options Plain Java Azure SDK • Low level AI orchestration • Useful to understand RAG behind the scenes • Streaming support • Ton of boilerplate code Semantic kernel • Simplified AI orchestration • Common RAG building blocks abstractions • Doesn’t support streaming (yet)

Slide 14

Slide 14 text

Project Structure Folder Description app App code components deploy Infrastructure code. Bicep and Azure developer CLI configuration. Indexing scripts. app/backend Chat flow - Java app/frontend Chat UI – React application app/indexer Indexing flow - Java deploy/app-service App Service. 1 App. Frontend and backend are colocated 1 springboot app. indexer runs locally. deploy/aca Azure Container Apps. Microservice architecture. 2 spring boot apps. 1 Ngnix app. deploy/aks Azure Kubernetes Service. Microservice architecture. 2 spring boot apps. 1 Ngnix app. data Preloaded pdf documents about job description, company policies and benefits

Slide 15

Slide 15 text

Implementing RAG Chat flow

Slide 16

Slide 16 text

React API client Single Page Application React and typescript api.ts entry point https://github.com/Azure-Samples/azure-search-openai-demo-java/tree/main/app/frontend/src/api https://github.com/Azure-Samples/azure-search-openai-demo-java/tree/main/app/frontend/src/api

Slide 17

Slide 17 text

Chat Endpoint Rest Controller providing entry point for the RAG chat. APPLICATION_NDJSON_VALUE based API is used for streaming response. Streaming supported only for PlainJavaChatApproach. https://github.com/Azure-Samples/azure-search-openai-demo-java/blob/main/app/backend/src/main/java/com/microsoft/openai/samples/rag/chat/controller/ChatController.java https://github.com/Azure-Samples/azure-search-openai-demo- java/blob/main/app/backend/src/main/java/com/microsoft/openai/samples/rag/chat/controller/ChatController.java

Slide 18

Slide 18 text

Semantic Kernel Plugins and Functions RAG plugin with semantic functions • AnswerConversation • ExtractKeywords https://github.com/Azure-Samples/azure-search-openai-demo-java/blob/main/app/backend/src/main/java/com/microsoft/openai/samples/rag/retrieval/semantickernel/AzureAISearchPlugin.java https://github.com/Azure-Samples/azure-search-openai-demo- java/blob/main/app/backend/src/main/java/com/microsoft/openai/samples/rag/retrieval/ semantickernel/AzureAISearchPlugin.java InformationFinder plugin with native functions • searchFromConversation azure-search-openai-demo-java/app/backend/src/main/resources/semantickernel/Plugins/RAG at main · Azure-Samples/azure-search-openai-demo-java · GitHub

Slide 19

Slide 19 text

Prompt Template and Chat Completion configuration with SK Handlebars templating engine • Parameters replacement. • Loops, conditional logic. • Direct call to plugin functions. Chat completion parameters: • Temperature • top_p • max_tokens https://github.com/Azure-Samples/azure-search-openai-demo- java/tree/main/app/backend/src/main/resources/semantickernel/Plugins/RAG/AnswerConversation

Slide 20

Slide 20 text

Azure AI Search Retriever • Azure AI Search API to retrieve documents from the search index. • Query keywords are extracted from the whole chat conversation with additional call to Open AI. • Retrieval mode: text, vectors, hybrid. • OpenAI embedding API to convert the user's query text to an embeddings vector ( vector or hybrid) • Hybrid search improve search results mixing text search and vector search. • Can be further simplified with SK VectorStore abstraction and AzureAISearchVectoreStore implementation. No need to create an explicit database search plugin, provides features for performing similarity searches over databases https://github.com/Azure-Samples/azure-search-openai-demo-java/blob/main/app/backend/src/main/java/com/microsoft/openai/samples/rag/retrieval/AzureAISearchRetriever.java https://github.com/Azure-Samples/azure-search-openai-demo- java/blob/main/app/backend/src/main/java/com/microsoft/openai/samples/rag/retrieval/AzureAISearc hRetriever.java

Slide 21

Slide 21 text

AzureAI Search Retriever as SK native plugin Plain java class. @DefineKernelFunction class method annotation. @DefineFunctionParameter method parameters annotation. https://github.com/Azure-Samples/azure-search-openai-demo- java/blob/main/app/backend/src/main/java/com/microsoft/openai/samples/rag/retrieval/semantickernel/A zureAISearchPlugin.java

Slide 22

Slide 22 text

RAG Flow with SK 1. Configure the kernel: • AI service backed by AzureOpenAI client. • AnswerConversation semantic plugin function as external file. • Informationfinder native plugin function as decorated Java class. 2. Implement the chat flow: • Retrieve relevant documents using the chat conversation. Ask the kernel to trigger SearchFromConversation specific native plugin function. • Build a SK function context with retrieved sources and chat conversation. • Ask the kernel to generate an answer using AnswerConversation function from RAG plugin providing the function arguments. https://github.com/Azure-Samples/azure-search-openai-demo-java/blob/main/app/backend/src/main/java/com/microsoft/openai/samples/rag/chat/approaches/semantickernel/JavaSemanticKernelChainsChatApproach.java https://github.com/Azure-Samples/azure-search-openai-demo- java/blob/main/app/backend/src/main/java/com/microsoft/openai/samples/rag/chat/approaches/semantick ernel/JavaSemanticKernelChainsChatApproach.java

Slide 23

Slide 23 text

Implementing RAG Indexing flow

Slide 24

Slide 24 text

CLI AddCommand https://github.com/Azure-Samples/azure-search-openai-demo- java/blob/main/app/indexer/cli/src/main/java/com/microsoft/openai/samples/indexer/CLI.java PICO CLI based implementation Triggered by deploy/app- service/scripts/prepdocs scripts AddCommand - Indexing running locally (App Service option): 1. Create Azure AI Search index fields 2. Scan local directory 3. Use DocumentProcessor to orchestrate indexing. 4. Upload file to Azure Blob Storage for citations detail

Slide 25

Slide 25 text

DocumentProcessor Orchestrate document processing logic 1. Parses PDF file into pages 2. Split pages into text chunks 3. Load text chunks in Azure AI Search Index Text chunks embeddings are generated and stored in Azure AI Search index https://github.com/Azure-Samples/azure-search-openai-demo- java/blob/main/app/indexer/core/src/main/java/com/microsoft/openai/samples/indexer/DocumentProcess or.java

Slide 26

Slide 26 text

PDF Parser with Document Intelligence Use Azure Document Intelligence OCR capabilities Tabular data are converted to html tables text For simple document (no tabular data) use local pdf parser ItextPdfParser.java https://github.com/Azure-Samples/azure-search-openai-demo- java/blob/main/app/indexer/core/src/main/java/com/microsoft/openai/samples/indexer/parser/Documen ntelligencePDFParser.java

Slide 27

Slide 27 text

Text Splitter https://github.com/Azure-Samples/azure-search-openai-demo- java/blob/main/app/indexer/core/src/main/java/com/microsoft/openai/samples/indexer/parser/Te xtSplitter.java Split page text in smaller sections Handle sections overlaps Handle texts with html tables Section length and overlap can be configured. Default: • Section length: 1000 chars • Overlap length: 100 chars

Slide 28

Slide 28 text

Index Manager Create index resources in Azure AI Search Upload index with text sections and related embeddings vector • Text chunk is stored in ‘content’ index field • Embeddings is vector stored in ‘embedding’ index field • Original file name, page numbers, category are stored as additional medatada EmbeddingService abstracts use of Azure Open AI embedding model • Embeddings requests arranged as batch to improve performance • Retries with exponential backoff policy in case of http throttling

Slide 29

Slide 29 text

• Custom PDF in data folder • It’s not a delta process • Java process running locally when App Service deploy is selected. • Indexer microservice running on containers orchestrator when ACA or AKS is selected. Custom Data Ingestion/Indexing

Slide 30

Slide 30 text

• Intro to intelligent apps, grounding, vector databases • Prompt engineering • Grouding LLMs • Chunking large documents • Semantic kernel • Hybrid Retrievial in Azure AI Search Learn More

Slide 31

Slide 31 text

Thanks