Slide 1

Slide 1 text

Proprietary + Confidential Aug 2022 Retrieval Augmented Generation —————————————— From dumb implementation to serious results @[email protected]

Slide 2

Slide 2 text

Proprietary + Confidential github.com/ datastaxdevs/ conference-2024-devoxx youtube.com/watch?v=RN7thifOmkI

Slide 3

Slide 3 text

Proprietary + Confidential LangChain4j

Slide 4

Slide 4 text

Proprietary + Confidential Introduction & Naïve RAG Ingestion Techniques Retrieval Techniques Function Calling, Agentic RAG Evaluation, Security, Data lifecycle, and beyond 01 02 03 04 05 Agenda

Slide 5

Slide 5 text

Proprietary + Confidential Introduction & naïve RAG 01

Slide 6

Slide 6 text

Proprietary + Confidential No knowledge about your data Work with a limited context window Limitations of Large Language Models Knowledge limited to their cut-off date Can produce hallucinations

Slide 7

Slide 7 text

Proprietary + Confidential RAG is a pattern to let you prompt an LLM with & about your data Retrieval Augmented Generation ❶ Retrieval — User asks a question, RAG retrieves relevant info from external sources ❷ Augmentation — Retrieved information is used to augment LLM's input, providing it with context and grounding ❸ Generation — LLM generates a response based on both its internal knowledge and the retrieved information

Slide 8

Slide 8 text

Proprietary + Confidential Reduced Hallucinations Less nonsensical or irrelevant outputs Benefits of RAG Improved Accuracy More accurate & reliable responses grounded in factual information Up-to-date Information Responses as fresh as your stored data, overcoming the limitation of model knowledge Explainability Retrieved sources help explain how the LLM generated its responses

Slide 9

Slide 9 text

Proprietary + Confidential RAG LLM Vector DB vector embeddings chunks DOCS calculate split store vector + chunk ❶ INGESTION

Slide 10

Slide 10 text

Proprietary + Confidential RAG Chatbot app LLM Vector DB vector embeddings chunks DOCS calculate prompt vector embedding split calculate find similar answer context + prompt + chunks store vector + chunk ❶ INGESTION ❷ RETRIEVAL

Slide 11

Slide 11 text

Proprietary + Confidential RAG Chatbot app LLM Vector DB vector embeddings chunks DOCS calculate prompt vector embedding split calculate find similar answer context + prompt + chunks store vector + chunk ❶ INGESTION ❷ RETRIEVAL Docs Loading & parsing

Slide 12

Slide 12 text

Proprietary + Confidential RAG Chatbot app LLM Vector DB vector embeddings chunks DOCS calculate prompt vector embedding split calculate find similar answer context + prompt + chunks store vector + chunk ❶ INGESTION ❷ RETRIEVAL TOO BIG? TOO SMALL?

Slide 13

Slide 13 text

Proprietary + Confidential RAG Chatbot app LLM Vector DB vector embeddings chunks DOCS calculate prompt vector embedding split calculate find similar answer context + prompt + chunks store vector + chunk ❶ INGESTION ❷ RETRIEVAL IS chunk CONTEXT RELEVANT?

Slide 14

Slide 14 text

Proprietary + Confidential RAG Chatbot app LLM Vector DB vector embeddings chunks DOCS calculate prompt vector embedding split calculate find similar answer context + prompt + chunks store vector + chunk ❶ INGESTION ❷ RETRIEVAL DO WE EMBED & STORE ALL THE INFO?

Slide 15

Slide 15 text

Proprietary + Confidential RAG Chatbot app LLM Vector DB vector embeddings chunks DOCS calculate prompt vector embedding split calculate find similar answer context + prompt + chunks store vector + chunk ❶ INGESTION ❷ RETRIEVAL IS A QUESTION CLOSE TO ITS ANSWER?

Slide 16

Slide 16 text

Proprietary + Confidential RAG Chatbot app LLM Vector DB vector embeddings chunks DOCS calculate prompt vector embedding split calculate find similar answer context + prompt + chunks store vector + chunk ❶ INGESTION ❷ RETRIEVAL DID THE LLM REALLY INCLUDE THE ANSWER?

Slide 17

Slide 17 text

Proprietary + Confidential Ingestion techniques 02

Slide 18

Slide 18 text

Proprietary + Confidential Ingestion LLM Vector DB vector embeddings chunks DOCS calculate split store vector + chunk

Slide 19

Slide 19 text

Proprietary + Confidential Ingestion LLM Vector DB vector embeddings chunks DOCS calculate split store vector + chunk Can we improve chunking?

Slide 20

Slide 20 text

Proprietary + Confidential ● Context expansion splitting ○ parent / child, sliding window ● Hypothetical Questions ○ generate relevant questions ● Contextual retrieval ○ recent article from Anthropic ● Semantic chunking ○ find semantic boundaries Chunking techniques

Slide 21

Slide 21 text

Proprietary + Confidential Illustration with a Wikipedia article about Berlin

Slide 22

Slide 22 text

Proprietary + Confidential Berlin is the capital and largest city of Germany, both by area and by population. Its more than 3.85 million inhabitants make it the European Union's most populous city, as measured by population within city limits. The city is also one of the states of Germany, and is the third smallest state in the country in terms of area. Berlin is surrounded by the state of Brandenburg, and Brandenburg's capital Potsdam is nearby. The urban area of Berlin has a population of over 4.5 million and is therefore the most populous urban area in Germany. The Berlin-Brandenburg capital region has around 6.2 million inhabitants and is Germany's second-largest metropolitan region after the Rhine-Ruhr region, and the sixth-biggest metropolitan region by GDP in the European Union. Raw text

Slide 23

Slide 23 text

Proprietary + Confidential Berlin is the capital and largest city of Germany, both by area and by population. Its more than 3.85 million inhabitants make it the European Union's most populous city, as measured by population within city limits. The city is also one of the states of Germany, and is the third smallest state in the country in terms of area. Berlin is surrounded by the state of Brandenburg, and Brandenburg's capital Potsdam is nearby. The urban area of Berlin has a population of over 4.5 million and is therefore the most populous urban area in Germany. The Berlin-Brandenburg capital region has around 6.2 million inhabitants and is Germany's second-largest metropolitan region after the Rhine-Ruhr region, and the sixth-biggest metropolitan region by GDP in the European Union. 100 characters split

Slide 24

Slide 24 text

Proprietary + Confidential Berlin is the capital and largest city of Germany, both by area and by population. Its more than 3.85 million inhabitants make it the European Union's most populous city, as measured by population within city limits. The city is also one of the states of Germany, and is the third smallest state in the country in terms of area. Berlin is surrounded by the state of Brandenburg, and Brandenburg's capital Potsdam is nearby. The urban area of Berlin has a population of over 4.5 million and is therefore the most populous urban area in Germany. The Berlin-Brandenburg capital region has around 6.2 million inhabitants and is Germany's second-largest metropolitan region after the Rhine-Ruhr region, and the sixth-biggest metropolitan region by GDP in the European Union. 100 characters split, with 20 of overlap

Slide 25

Slide 25 text

Proprietary + Confidential Berlin is the capital and largest city of Germany, both by area and by population. Its more than 3.85 million inhabitants make it the European Union's most populous city, as measured by population within city limits. The city is also one of the states of Germany, and is the third smallest state in the country in terms of area. Berlin is surrounded by the state of Brandenburg, and Brandenburg's capital Potsdam is nearby. The urban area of Berlin has a population of over 4.5 million and is therefore the most populous urban area in Germany. The Berlin-Brandenburg capital region has around 6.2 million inhabitants and is Germany's second-largest metropolitan region after the Rhine-Ruhr region, and the sixth-biggest metropolitan region by GDP in the European Union. Chunking by sentence

Slide 26

Slide 26 text

Proprietary + Confidential Berlin is the capital and largest city of Germany, both by area and by population. Its more than 3.85 million inhabitants make it the European Union's most populous city, as measured by population within city limits. The city is also one of the states of Germany, and is the third smallest state in the country in terms of area. Berlin is surrounded by the state of Brandenburg, and Brandenburg's capital Potsdam is nearby. The urban area of Berlin has a population of over 4.5 million and is therefore the most populous urban area in Germany. The Berlin-Brandenburg capital region has around 6.2 million inhabitants and is Germany's second-largest metropolitan region after the Rhine-Ruhr region, and the sixth-biggest metropolitan region by GDP in the European Union. Parent (context) / child (embedding) chunking Embed sentence, but return context

Slide 27

Slide 27 text

Proprietary + Confidential Berlin is the capital and largest city of Germany, both by area and by population. Its more than 3.85 million inhabitants make it the European Union's most populous city, as measured by population within city limits. The city is also one of the states of Germany, and is the third smallest state in the country in terms of area. Berlin is surrounded by the state of Brandenburg, and Brandenburg's capital Potsdam is nearby. The urban area of Berlin has a population of over 4.5 million and is therefore the most populous urban area in Germany. The Berlin-Brandenburg capital region has around 6.2 million inhabitants and is Germany's second-largest metropolitan region after the Rhine-Ruhr region, and the sixth-biggest metropolitan region by GDP in the European Union. Sentence sliding window chunking Embed sentence, but return context arxiv.org/pdf/2406.00456v1

Slide 28

Slide 28 text

Proprietary + Confidential Berlin is the capital and largest city of Germany, both by area and by population. Its more than 3.85 million inhabitants make it the European Union's most populous city, as measured by population within city limits. The city is also one of the states of Germany, and is the third smallest state in the country in terms of area. Berlin is surrounded by the state of Brandenburg, and Brandenburg's capital Potsdam is nearby. The urban area of Berlin has a population of over 4.5 million and is therefore the most populous urban area in Germany. The Berlin-Brandenburg capital region has around 6.2 million inhabitants and is Germany's second-largest metropolitan region after the Rhine-Ruhr region, and the sixth-biggest metropolitan region by GDP in the European Union. Hypothetical Questions Embedding questions: ● What is the capital and largest city of Germany? ● What is the population of Berlin? ● Which state is Berlin located in? ● What is the name of the state surrounding Berlin? ● What is the name of the capital of the state surrounding Berlin? pixion.co/blog/rag-strategies-hypothetical-questions-hyde

Slide 29

Slide 29 text

Proprietary + Confidential Berlin is the capital and largest city of Germany, both by area and by population. Its more than 3.85 million inhabitants make it the European Union's most populous city, as measured by population within city limits. The city is also one of the states of Germany, and is the third smallest state in the country in terms of area. Berlin is surrounded by the state of Brandenburg, and Brandenburg's capital Potsdam is nearby. The urban area of Berlin has a population of over 4.5 million and is therefore the most populous urban area in Germany. The Berlin-Brandenburg capital region has around 6.2 million inhabitants and is Germany's second-largest metropolitan region after the Rhine-Ruhr region, and the sixth-biggest metropolitan region by GDP in the European Union. Contextual Retrieval (by Anthropic) Embed chunk “in context”: Berlin's population within its city limits is the largest in the European Union. www.anthropic.com/news/contextual-retrieval

Slide 30

Slide 30 text

Proprietary + Confidential Berlin is the capital and largest city of Germany, both by area and by population. Its more than 3.85 million inhabitants make it the European Union's most populous city, as measured by population within city limits. The city is also one of the states of Germany, and is the third smallest state in the country in terms of area. Berlin is surrounded by the state of Brandenburg, and Brandenburg's capital Potsdam is nearby. The urban area of Berlin has a population of over 4.5 million and is therefore the most populous urban area in Germany. The Berlin-Brandenburg capital region has around 6.2 million inhabitants and is Germany's second-largest metropolitan region after the Rhine-Ruhr region, and the sixth-biggest metropolitan region by GDP in the European Union. Semantic Chunking Use an embedding model, to find very dissimilar consecutive chunks. Break at the boundaries. @GregKamradt x.com/GregKamradt/status/1738276097471754735

Slide 31

Slide 31 text

Proprietary + Confidential Ingestion LLM Vector DB vector embeddings chunks DOCS calculate split store vector + chunk A better embedding model?

Slide 32

Slide 32 text

Proprietary + Confidential Ingestion LLM Vector DB vector embeddings chunks DOCS calculate split store vector + chunk A more powerful database?

Slide 33

Slide 33 text

Proprietary + Confidential Retrieval techniques 03

Slide 34

Slide 34 text

Proprietary + Confidential Retrieval Chatbot app LLM Vector DB prompt vector embedding calculate find similar answer context + prompt + chunks

Slide 35

Slide 35 text

Proprietary + Confidential Retrieval Chatbot app LLM Vector DB prompt vector embedding calculate find similar answer context + prompt + chunks Query compression

Slide 36

Slide 36 text

Proprietary + Confidential Berlin is the capital and largest city of Germany, both by area and by population. Its more than 3.85 million inhabitants make it the European Union's most populous city, as measured by population within city limits. The city is also one of the states of Germany, and is the third smallest state in the country in terms of area. Berlin is surrounded by the state of Brandenburg, and Brandenburg's capital Potsdam is nearby. The urban area of Berlin has a population of over 4.5 million and is therefore the most populous urban area in Germany. The Berlin-Brandenburg capital region has around 6.2 million inhabitants and is Germany's second-largest metropolitan region after the Rhine-Ruhr region, and the sixth-biggest metropolitan region by GDP in the European Union. Query compression: a dialog is not a question Dialogue: - What is the capital of Germany? - Where is it situated? - How many inhabitants are there? Compressed query: - How many inhabitants live in Berlin? https://docs.langchain4j.dev/tutorials/rag/#query-transformer

Slide 37

Slide 37 text

Proprietary + Confidential Retrieval Chatbot app LLM Vector DB prompt vector embedding calculate find similar answer context + prompt + chunks Query transformation

Slide 38

Slide 38 text

Proprietary + Confidential Berlin is the capital and largest city of Germany, both by area and by population. Its more than 3.85 million inhabitants make it the European Union's most populous city, as measured by population within city limits. The city is also one of the states of Germany, and is the third smallest state in the country in terms of area. Berlin is surrounded by the state of Brandenburg, and Brandenburg's capital Potsdam is nearby. The urban area of Berlin has a population of over 4.5 million and is therefore the most populous urban area in Germany. The Berlin-Brandenburg capital region has around 6.2 million inhabitants and is Germany's second-largest metropolitan region after the Rhine-Ruhr region, and the sixth-biggest metropolitan region by GDP in the European Union. Hypothetical Document Embedding User query: What is the population of Berlin? Hypothetical answer (provided by LLM): There are 3 million inhabitants in Berlin https://arxiv.org/pdf/2212.10496

Slide 39

Slide 39 text

Proprietary + Confidential Retrieval Chatbot app LLM prompt vector embedding calculate find similar answer context + prompt + chunks Vector DB Query routing web search

Slide 40

Slide 40 text

Proprietary + Confidential Retrieval Chatbot app LLM Vector DB prompt vector embedding calculate find similar answer context + prompt + chunks Metadata filtering

Slide 41

Slide 41 text

Proprietary + Confidential Retrieval Chatbot app LLM Vector DB prompt vector embedding calculate find similar answer context + prompt + chunks Reranking results

Slide 42

Slide 42 text

Proprietary + Confidential Retrieval Chatbot app LLM Vector DB prompt vector embedding calculate find similar answer context + prompt + chunks Response caching

Slide 43

Slide 43 text

Proprietary + Confidential Function calling, Agentic RAG 04 My name is AI, AI-007

Slide 44

Slide 44 text

Proprietary + Confidential Function calling Chatbot app Gemini What’s the weather like in Paris? It’s sunny in Paris! External API or service user prompt + getWeather(String) function contract call getWeather(“Paris”) for me please 󰚦 getWeather(“Paris”) {“forecast”:”sunny”} function response is {“forecast”:”sunny”} Answer: “It’s sunny in Paris!”

Slide 45

Slide 45 text

Proprietary + Confidential AI Agents Different types, and capabilities ● Reflection ○ Chain-of-Thought, self reflection & correction, self grading ● Planning ○ Create a multi-step plan of action ● Tool use ○ Multiple function calling ● Multi-agent collaboration ○ Chain several LLMs and/or RAG searches

Slide 46

Slide 46 text

Proprietary + Confidential Agentic RAG Berlin’s origins, population, geographic situation 🧠 Agentic Assistant 🧠 —————————————— 1) Identify topics 2) Create questions 3) RAG search 4) Collect answers & generate final report 🛠 History/Geography Tool 🛠 ————————————————— 1) Execute RAG search 2) Call topic assistant to summarize topic 🧠 Topic Assistant 🧠 —————————————— 1) Study topic answers 2) Create a report summary on the topic TOPICAL REPORTS FINAL REPORT Vector database TOPICAL REPORT

Slide 47

Slide 47 text

Proprietary + Confidential Evaluation, security, data lifecycle, and beyond 05

Slide 48

Slide 48 text

Proprietary + Confidential Many evaluation metrics: ● ROUGE (summarization) ● BLEU (translation) ● VIEScore (multimodal) ● Answer relevancy ● Faithfulness ● Contextual precision ● Contextual recall ● Contextual relevancy Evaluation is critical! ● Hallucination ● Tool correctness ● Bias ● Toxicity ● G-Eval ● Conversational metrics ○ Conversation completeness ○ Conversation relevancy ○ Knowledge retention RAGAS, DeepEVAL for inspiration

Slide 49

Slide 49 text

Proprietary + Confidential ● Prepare a dataset of questions and golden responses ● Use your RAG pipeline to answer those questions ● Use an LLM as a judge to gauge the quality of your RAG results, against a set of metrics LLM as Judge This generated response is correct!

Slide 50

Slide 50 text

Proprietary + Confidential OWASP Top 10 for LLM Applications

Slide 51

Slide 51 text

Proprietary + Confidential Security and Data Privacy ● Anonymize data (like Google Cloud Data Loss Prevention) ● Don’t log PII details ● Use local models when possible ● Separate tenants for compliance with data protection laws

Slide 52

Slide 52 text

Proprietary + Confidential ● Your data isn’t stale, it’s alive ● When a document is updated, ○ chunking has changed ○ old chunks need to be retired ● Chunk metadata should track document origin, last update timestamps or document versions ● Prepare an update schedule Data Lifecycle Data is staying alive

Slide 53

Slide 53 text

Proprietary + Confidential Time to conclude!

Slide 54

Slide 54 text

Proprietary + Confidential Mintaka: A complex, natural, and multilingual dataset for end-to-end question answering. arXiv preprint arXiv:2210.01613 There are easy questions… and hard ones! Type Description Example Yes/No Answer is a Yes or No Has Lady Gaga ever made a song with Ariana Grande? Comparative Compare 2 items by an attribute Is Mont Blanc taller than Mount Rainier? Generic Simple questions Where was Michael Phelps born? Intersection Requires multiple conditions Which movie was directed by Denis Villeneuve and stars Timothee Chalamet? Ordinal Based on item's position in a list Who was the last Ptolemaic ruler of Egypt? Count Answer requires counting How many astronauts have been elected to Congress? Difference Contains a negation Which Mario Kart game did Yoshi not appear in? Superlative Max or Min of given attribute Who was the youngest tribute in the Hunger Games? Multi-hop Requires multiple steps to answer Who was the quarterback of the team that won Super Bowl 50?

Slide 55

Slide 55 text

Proprietary + Confidential Lots of techniques, which one to pick?

Slide 56

Slide 56 text

Proprietary + Confidential Table of Contents (220 pages): • First Look at LangChain4j • Understanding LangChain4j • Getting Started • Accessing Models • Invoking Models • Extending Models • Processing Documents • Handling Embeddings • Retrieval-Augmented Generation • AI Services • Putting It All Together • Summary agoncal.teachable.com amazon.com/author/agoncal

Slide 57

Slide 57 text

Proprietary + Confidential Thanks for your attention (is all you need?) github.com/ datastaxdevs/ conference-2024-devoxx @[email protected]