Slide 1

Slide 1 text

Building a RAG App to Chat with Your Data Pamela Fox, Python Cloud Advocate @pamelafox pamelafox.org

Slide 2

Slide 2 text

LLMs: Large Language Models

Slide 3

Slide 3 text

LLM: Large Language Model An LLM is a model that is so large that it achieves general-purpose language understanding and generation. Review: This movie sucks. Sentiment: negative Review: I love this movie: Sentiment: Input LLM positive Output

Slide 4

Slide 4 text

Popular LLMs Creator Models Hosts OpenAI GPT3.5, GPT4, GPT4o, ada-002, text-embedding-3 OpenAI.com Azure OpenAI, GitHub Models Microsoft Phi-3, Phi-3-mini, Phi-3.5 Azure AI Models Catalog, GitHub Models Cohere Command R, R+, Embed Mistral Nemo, Small Large Azure AI Models Catalog, GitHub Models, Google Vertex AI Meta LlaMA 3.1-8B, 70B, 405B Google Gemini Google Vertex AI Anthropic Claude 3 AWS Bedrock, Google Vertex AI https://azure.microsoft.com/products/ai-services/openai-service https://azure.microsoft.com/products/ai-model-catalog https://github.com/marketplace/models

Slide 5

Slide 5 text

Using LLMs locally https://ollama.com/ Ollama is a tool for easily running local LLMs on your computer. You can also run it from GitHub Codespaces: aka.ms/ollama-python: Ollama Python Playground

Slide 6

Slide 6 text

Using LLMs from Azure OpenAI POST https://SERVICE_NAME.openai.azure.com/openai/deployments/ DEPLOYMENT_NAME/chat/completions?api-version=2024-02-15-preview Authorization: Bearer AUTH_TOKEN Content-Type: application/json { "messages": [{"role":"system","content":"You are an AI assistant that loves emojis."}, {"role":"user","content":"What is the capital of France?"}], "max_tokens": 800, "temperature": 0.7, "frequency_penalty": 0, "presence_penalty": 0, "top_p": 0.95, "stop": null } https://github.com/pamelafox/python-openai-demos

Slide 7

Slide 7 text

The limitations of LLMs Outdated public knowledge No internal knowledge

Slide 8

Slide 8 text

Integrating domain knowledge Fine tuning Learn new skills (permanently) Retrieval Augmented Generation Learn new facts (temporarily) High cost, time

Slide 9

Slide 9 text

Retrieval Augmented Generation

Slide 10

Slide 10 text

RAG: Retrieval Augmented Generation Document Search vehicle | year | msrp | acceleration | --- | --- | --- | --- | --- | --- Prius (1st Gen) | 1997 | 24509.74 | 7.46 | Prius (2nd Gen) | 2000 | 26832.25 | 7.97 | Prius (3rd Gen) | 2009 | 24641.18 | 9.6 | Prius V | 2011 | 27272.28 | 9.51 | Prius C | 2012 | 19006.62 | 9.35 | Prius PHV | 2012 | 32095.61 | 8.82 | Prius C | 2013 | 19080.0 | 8.7 | Prius | 2013 | 24200.0 | 10.2 | Prius Plug-in | 2013 | 32000.0 | 9.17 | Large Language Model The Prius V has an acceleration of 9.51 seconds from 0 to 60 mph. User Question How fast is the Prius V?

Slide 11

Slide 11 text

RAG with Azure OpenAI GPT models POST https://SERVICE_NAME.openai.azure.com/openai/deployments/ DEPLOYMENT_NAME/chat/completions?api-version=2024-02-15-preview Authorization: Bearer {{$dotenv TOKEN}} Content-Type: application/json {"messages": [ {"role":"system", "content":"You are a helpful assistant that answers questions about cars based off a hybrid car data set. You must use the data set to answer the questions, you should not provide any info that is not in the provided sources. Sources are provided as a Markdown table." }, {"role":"user", "content": "How fast is the Prius V?\n\nSources: vehicle | year | msrp | acceleration | mpg | class\n --- | --- | --- | --- | --- | --- |\nPrius (1st Gen) | 1997 | 24509.74 | 7.46 | 41.26 | Compact|\nPrius (2nd Gen) | 2000 | 26832.25 | 7.97... }]} https://github.com/pamelafox/python-openai-demos/blob/main/retrieval_augmented_generation.py

Slide 12

Slide 12 text

The benefits of RAG Up-to-date public knowledge Access to internal knowledge

Slide 13

Slide 13 text

RAG in the wild GitHub Copilot (RAG on your VSCode workspace) Teams Copilot (RAG on your chats) Bing Copilot (RAG on the web)

Slide 14

Slide 14 text

Building a RAG app

Slide 15

Slide 15 text

Types of RAG RAG flows: • Simple RAG • Advanced RAG with Query rewriting RAG sources: • Structured data (DB tables) • Documents (PDF, MD, etc.)

Slide 16

Slide 16 text

Simple RAG Large Language Model For great hiking shoes, consider the TrekExtreme Hiking Shoes 1 or the Trailblaze Steel-Blue Hiking Shoes 2 User Question What's the best shoe for hiking? Search [101]: Name: TrekExtreme Hiking Shoes Price: 135.99 Brand: Raptor Elite Type: Footwear Description: The Trek Extreme hiking shoes by Raptor Elite are built to ensure any trail. …

Slide 17

Slide 17 text

Simple RAG (on PostgreSQL) Azure OpenAI + Azure PostgreSQL Flexible Server + Azure Container Apps Code: aka.ms/rag-postgres Demo: aka.ms/rag-postgres/demo

Slide 18

Slide 18 text

Advanced RAG with query re-writing Large Language Model For great hiking shoes, consider the TrekExtreme Hiking Shoes 1 or the Trailblaze Steel-Blue Hiking Shoes 2 User Question what's a good shoe for a mountain trale? Search [101]: Name: TrekExtreme Hiking Shoes Price: 135.99 Brand: Raptor Elite Type: Footwear Description: The Trek Extreme hiking shoes by Raptor Elite are built to ensure any trail. … mountain trail shoe Large Language Model

Slide 19

Slide 19 text

Advanced RAG (on PostgreSQL)

Slide 20

Slide 20 text

Query rewriting with function calling User Question Do you sell climbing gear cheaper than $30? Large Language Model with function calling search_database( "climbing_gear", {"column": "price", "operator" : "<", "value" : "30" } )

Slide 21

Slide 21 text

Advanced RAG with query rewriting via function calling LLM We offer 2 climbing bags for your budget: SummitStone Chalk Bag 1 Guardian Blue Chalk Bag 2 User Question Do you sell climbing gear cheaper than $30? “Do you sell…” [12]: Name: SummitStone Chalk Bag Price:29.99 Brand:Grolltex Type:Climbing Description: The SummitStone Chalk Bag in forest green is a must- have for climbers seeking adventure. … Search price < 30 LLM with function calling “Do you sell…” “climbing gear”

Slide 22

Slide 22 text

RAG on documents Search PerksPlus.pdf#page=2: Some of the lessons covered under PerksPlus include: · Skiing and snowboarding lessons · Scuba diving lessons · Surfing lessons · Horseback riding lessons These lessons provide employees with the opportunity to try new things, challenge themselves, and improve their physical skills.…. Large Language Model Yes, your company perks cover underwater activities such as scuba diving lessons 1 User Question Do my company perks cover underwater activities?

Slide 23

Slide 23 text

RAG on documents Azure OpenAI + Azure AI Search + Azure App Service Supports simple and advanced flows (Ask tab vs. Chat tab) Code: aka.ms/ragchat Demo: aka.ms/ragchat/demo

Slide 24

Slide 24 text

RAG data sources Documents (Unstructured data) PDFs, docx, pptx, md, html, images You need an ingestion process for extracting, splitting, vectorizing, and storing document chunks. You need a way to search the vectorized chunks. Database rows (Structured data) You need a way to vectorize target columns with an embedding model. You need a way to search the vectorized rows. + LLM for answering the question based off data

Slide 25

Slide 25 text

RAG components Component Examples Ingestion: Tools for processing data into a format that can be indexed and processed by LLM Azure: Document Intelligence Local: PyMuPDF, BeautifulSoup Retriever: A knowledge base that can efficiently retrieve sources that match a user query (Ideally supports both vector and full-text search) Azure: Azure AI Search, Azure CosmosDB, Local: PostgreSQL, Qdrant, Pinecone LLM: A model that can answer questions based on the query based on the provided sources, and can include citations OpenAI: GPT 3.5, GPT 4, GPT-4o Azure AI Studio: Meta Llama3, Mistral, Cohere R+ Anthropic: Claude 3.5 Google: Gemini 1.5 Orchestrator (optional): A way to organize calls to the retriever and LLM Community: Langchain, Llamaindex Microsoft: Semantic Kernel, Autogen Features Chat history, Feedback buttons, Text-to-speech, User login, File upload, Access control, etc.

Slide 26

Slide 26 text

Evaluating a RAG app

Slide 27

Slide 27 text

 Is it clear and understandable?  Is it correct? (relative to the knowledge base)  Is it formatted in the desired manner? Yes, underwater activities are included as part of the PerksPlus program. Some of the underwater activities covered under PerksPlus include scuba diving lessons [PerksPlus.pdf#page=3]. Yes, according to the information provided in the PerksPlus.pdf document, underwater activities such as scuba diving are covered under the program. Yes, the perks provided by the PerksPlus Health and Wellness Reimbursement Program cover a wide range of fitness activities, including underwater activities such as scuba diving. The program aims to support employees' physical health and overall well-being, so it includes various lessons and experiences that promote health and wellness. Scuba diving lessons are specifically mentioned as one of the activities covered under PerksPlus. Therefore, if an employee wishes to pursue scuba diving as a fitness-related activity, they can expense it through the PerksPlus program. Do the perks cover underwater activities? What's a high quality answer?

Slide 28

Slide 28 text

• Search engine • Search mode (keyword, vector, ...) • # of results returned • Search query cleaning step • Data preparation Knowledge Search Large Language Model Question • Model • Temperature • Max tokens • Message history • System prompt Factors that affect answer quality

Slide 29

Slide 29 text

1. Generate ground truth data (~200 QA pairs) 2. Evaluate with different parameters 3. Compare the metrics and answers across evaluations Batch evaluation steps

Slide 30

Slide 30 text

The ground truth data is the ideal answer for a question. It can be generated synthetically, but manual curation is necessary! Database Large Language Model results Q/A pairs Human review Step 1: Generate ground truth data

Slide 31

Slide 31 text

App endpoint Azure OpenAI azure-ai-evals SDK response + ground truth prompt metrics question gpt_coherence gpt_groundedness gpt_relevance length citation_match Compute LLM metrics and custom metrics for every question in ground truth. latency This can be run locally or in a CI workflow. Step 2: Evaluate app against ground truth

Slide 32

Slide 32 text

Results can be stored in a repo, a storage container, or a specialized tool like Azure AI Studio. Step 3: Compare metrics across evals

Slide 33

Slide 33 text

 RAG on data that you know; LLMs can be convincingly wrong.  What works better for 3 Qs doesn't always work better for 200.  Don't trust absolute metrics, trust relative metrics.  Vector search can be noisy! Use wisely.  Your model choice makes a huge difference.  Remove the fluff from prompts; it doesn't matter. Lessons learnt from evaluations

Slide 34

Slide 34 text

Resources

Slide 35

Slide 35 text

GraphRAG https://www.microsoft.com/research/project/graphrag/ RAFT (RAG + FineTuning) https://github.com/ShishirPatil/gorilla/tree/main/raft https://github.com/Azure-Samples/raft-distillation-recipe Agentic RAG https://www.youtube.com/live/aQ4yQXeB1Ss More RAG approaches

Slide 36

Slide 36 text

RAGHack streams Week 1: September 3 - 8 Week 2: September 9 - 14 3 4 5 6 7 8 TUESDAY WEDNESDAY THURSDAY FRIDAY SATURDAY SUNDAY RAG 101 .NET Azure AI Studio Python Langchain4J LangchainJS Responsible AI MongoDB Azure AI Search PostgreSQL Azure SQL GraphRAG Multi-channels 10 11 12 13 14 15 TUESDAY WEDNESDAY THURSDAY FRIDAY SATURDAY SUNDAY VSCode Agent Agentic RAG Code Interpreter AI Studio Advanced AutoGen Data Access Control Fine Tuning Model catalog Evaluations 9 MONDAY Semantic Kernel Spring AI Vision models Internationalization 9 MONDAY aka.ms/raghack/streams

Slide 37

Slide 37 text

More topics to explore  Vector embeddings https://github.com/pamelafox/visual-exploration-vectors  Fine-tuning small models https://github.com/microsoft/Phi-3CookBook  Building GPT models from scratch https://www.youtube.com/@AndrejKarpathy https://sebastianraschka.com/books/  Working with Pytorch https://pytorchstepbystep.com/ https://learn.microsoft.com/training/paths/pytorch-fundamentals/ Ping me if you're looking or other resources or code samples! @pamelafox