Building a RAG-powered AI chat app with Python and VS Code

Building a RAG-powered AI chat app with Python and VS
Code aka.ms/rag-postgres-slides

Today we’ll discuss… •LLMs: Large Language Models •RAG: Retrieval Augmented
Generation •RAG on PostgreSQL

LLM: Large Language Model An LLM is a model that
is so large that it achieves general-purpose language understanding and generation. Review: This movie sucks. Sentiment: negative Review: I love this movie: Sentiment: Input LLM positive Output

Popular LLMs Model # of Parameters Creator Uses GPT 3.5
175 B OpenAI ChatGPT, Copilots, APIs GPT 4 Undisclosed OpenAI Gemini Undisclosed Google Bard Claude 2,3 130 B Anthropic APIs LlaMA 70 B Meta OSS Mistral-7B, Mixtral 7 B Mistral AI OSS Phi-3 3.8B Microsoft Research OSS

Using OpenAI LLMs: Python response = openai.chat.completions.create( messages = [
{ "role": "system", "content": "You are a helpful assistant with very flowery language" }, { "role": "user", "content": "what python frameworks does pamela use?" } ]) for event in response: print(event.choices[0].message.content) https://github.com/pamelafox/python-openai-demos

The limitations of LLMs Outdated public knowledge No internal knowledge

Incorporating domain knowledge Prompt engineering Fine tuning Retrieval Augmented Generation
In-context learning Learn new skills (permanently) Learn new facts (temporarily)

Retrieval Augmented Generation

RAG on documents Search deploying-django-apps-to-azure.html: Since I'm on the Python
team, I've been working on templates for the top Python frameworks, Django, Flask, and FastAPI.\nMy first finished template is for a Django app with PostgreSQL database that gets deployed to Azure App Service with a PostgreSQL flexible server. … Large Language Model Pamela uses the following Python frameworks: Flask 1, FastAPI 2, Django3 User Question What python frameworks does Pamela use?

The importance of the search step Garbage in, garbage out:
If the search results don’t contain a good answer, the LLM will be unable to answer or will answer wrongly. Noisy input: If the LLM receives too much information, it may not find the correct answer amidst the noise. Source: Lost in the Middle: How Language Models Use Long Contexts, Liu et al. arXiv:2307.03172 50 55 60 65 70 75 5 15 25 Accuracy Number of documents in input context

Optimal search strategy Vector Keywords Fusion (RRF) Reranking model Vector
search is best for finding semantically related matches Keyword search is best for exact matches (proper nouns, numbers, etc) Hybrid search combines vector search and keyword search, optimally using Reciprocal-Rank-Fusion for merging results and a ML model to re-rank results after aka.ms/ragrelevance

What is the RAG searching? Documents (Unstructured data) PDFs, docx,
pptx, md, html, images You need an ingestion process for extracting, splitting, vectorizing, and storing document chunks. Azure options: Azure AI Search with Document Intelligence, OpenAI embedding models, Integrated Vectorization Database rows (Structured data) PostgreSQL, MongoDB, Qdrant, etc. You need a way to vectorize & search target columns. Azure options: • PostgreSQL+pgvector • CosmosMongoDB+vector • Container Apps services (Milvus, Qdrant, Weaviate) + OpenAI embedding models aka.ms/ragchat aka.ms/rag-postgres

RAG on PostgreSQL aka.ms/rag-postgres

RAG on database rows Search [12]: Name: Summit Climbing Harness
Description: Conquer the highest peaks with the Raptor Elite Summit Climbing Harness. This durable and lightweight harness, available in bold red, provides maximum comfort and safety while scaling tricky routes. Price:109.99 … Large Language Model Consider these products: Summit Climbing Harness1 Apex Climbing Harness2 User Question Do you sell climbing gear?

Example database schema class Item(Base): __tablename__ = "items" id: Mapped[int]
= mapped_column(primary_key=True, autoincrement=True) type: Mapped[str] = mapped_column() brand: Mapped[str] = mapped_column() name: Mapped[str] = mapped_column() description: Mapped[str] = mapped_column() price: Mapped[float] = mapped_column() embedding: Mapped[Vector] = mapped_column(Vector(1536)) Defined with SQLAlchemy ORM:

RAG with hybrid search Embedding Model Large Language Model Consider
these products: Summit Climbing Harness1 Apex Climbing Harness2 User Question Do you sell climbing gear? [[0.0014615238, - 0.015594152, - 0.0072768144, - 0.012787478,…] “Do you sell…” “Do you sell…” [12]: Name: Summit Climbing Harness Description: Conquer the highest peaks with the Raptor Elite Summit Climbing Harness. This durable and lightweight harness, available in bold red, provides maximum comfort and safety while scaling tricky routes. Price:109.99 … “Do you sell…” Hybrid Search

Step 1: Vectorize user query embedding = await openai_client.embeddings.create( model=model_name,
input=user_query )

Step 2: Hybrid search of PostgreSQL table WITH semantic_search AS
( SELECT id, RANK () OVER (ORDER BY embedding <=> %(embedding)s) AS rank FROM items ORDER BY embedding <=> %(embedding)s LIMIT 20 ), keyword_search AS ( SELECT id, RANK () OVER (ORDER BY ts_rank_cd(to_tsvector('english’, description), query) DESC) FROM items, plainto_tsquery('english', %(query)s) query WHERE to_tsvector('english', description) @@ query ORDER BY ts_rank_cd(to_tsvector('english', description), query) DESC LIMIT 20 ) SELECT COALESCE(semantic_search.id, keyword_search.id) AS id, COALESCE(1.0 / (%(k)s + semantic_search.rank), 0.0) + COALESCE(1.0 / (%(k)s + keyword_search.rank), 0.0) AS score FROM semantic_search FULL OUTER JOIN keyword_search ON semantic_search.id = keyword_search.id ORDER BY score DESC LIMIT 5 https://github.com/pgvector/pgvector-python/blob/master/examples/hybrid_search_rrf.py

RAG with filters

RAG with filters Embedding Model Large Language Model We offer
2 climbing bags for your budget: SummitStone Chalk Bag 1 Guardian Blue Chalk Bag 2 User Question Do you sell climbing gear cheaper than $30? [[0.0014615238, - 0.015594152, - 0.0072768144, - 0.012787478,…] “Do you sell…” “Do you sell…” [12]: Name: SummitStone Chalk Bag Price:29.99 Brand:Grolltex Type:Climbing Description: The SummitStone Chalk Bag in forest green is a must- have for climbers seeking adventure. … “Do you sell…” Hybrid Search price < 30

Query rewriting phase User Question Do you sell climbing gear
cheaper than $30? Large Language Model with function calling search_database( "climbing_gear", {"operator" : "<", "value" : "30" } )

RAG with filters Embedding Model LLM We offer 2 climbing
bags for your budget: SummitStone Chalk Bag 1 Guardian Blue Chalk Bag 2 User Question Do you sell climbing gear cheaper than $30? [[0.0014615238, - 0.015594152, - 0.0072768144, - 0.012787478,…] ”climbing gear” “Do you sell…” [12]: Name: SummitStone Chalk Bag Price:29.99 Brand:Grolltex Type:Climbing Description: The SummitStone Chalk Bag in forest green is a must- have for climbers seeking adventure. … Hybrid Search price < 30 LLM with function calling “Do you sell…” “climbing gear”

Get started with RAG

RAG orchestration libraries Project Languages Langchain https://www.langchain.com/ Python, TypeScript, Java
Llamaindex https://docs.llamaindex.ai/ Python, TypeScript (Microsoft) Semantic Kernel https://github.com/microsoft/semantic-kernel Python, .NET (Microsoft) PromptFlow https://github.com/microsoft/promptflow Python

RAG chat app starter repositories GitHub repository Technologies pamelafox/rag-on-postgres Python
backend, React frontend, PostgreSQL Azure-Samples/azure-search-openai-demo Python backend, React frontend, Azure AI search Azure-Samples/azure-search-openai-javascript NodeJS backend, Web components frontend, Azure AI search Azure-Samples/azure-search-openai-demo-csharp .NET backend, Blazor Web Assembly frontend, Azure AI Search Azure-Samples/azure-search-openai-demo-java Java backend, React frontend, Azure AI Search microsoft/chat-copilot .NET backend w/Semantic Kernel, React frontend, Azure AI Search

Thank you! Any questions? Find me online at: @pamelafox pamelafox.org

Building a RAG-poweredAI chat appwith Python ...

Building a RAG-powered AI chat app with Python and VS Code

Pamela Fox

More Decks by Pamela Fox

Other Decks in Technology

Featured

Transcript