Upgrade to Pro — share decks privately, control downloads, hide ads and more …

RAG: from dumb implementation to serious results

RAG: from dumb implementation to serious results

Embarking on your RAG journey may seem effortless, but achieving satisfying results often proves challenging. Inaccurate, incomplete, or outdated answers, suboptimal document retrieval, and poor text chunking can quickly dampen your initial enthusiasm.

In this session, we'll leverage LangChain4j to elevate your RAG implementations. We'll explore:
* Advanced Chunking Strategies: Optimize document segmentation for improved context and relevance.
* Query Refinement Techniques: Expand and compress queries to enhance retrieval accuracy.
* Metadata Filtering: Leverage metadata to pinpoint the most relevant documents.
* Document Reranking: Reorder retrieved documents for optimal result presentation.
* Data Lifecycle Management: Implement processes to maintain data freshness and relevance.
* Evaluation and Presentation: Assess the effectiveness of your RAG pipeline and deliver results that meet user expectations.

Join us as we transform your simplistic RAG experience from one of frustration to delight your users with meaningful and accurate answers.

Guillaume Laforge

October 11, 2024
Tweet

More Decks by Guillaume Laforge

Other Decks in Technology

Transcript

  1. Proprietary + Confidential Introduction & Naïve RAG Ingestion Techniques Retrieval

    Techniques Function Calling, Agentic RAG Evaluation, Security, Data lifecycle, and beyond 01 02 03 04 05 Agenda
  2. Proprietary + Confidential No knowledge about your data Work with

    a limited context window Limitations of Large Language Models Knowledge limited to their cut-off date Can produce hallucinations
  3. Proprietary + Confidential RAG is a pattern to let you

    prompt an LLM with & about your data Retrieval Augmented Generation ❶ Retrieval — User asks a question, RAG retrieves relevant info from external sources ❷ Augmentation — Retrieved information is used to augment LLM's input, providing it with context and grounding ❸ Generation — LLM generates a response based on both its internal knowledge and the retrieved information
  4. Proprietary + Confidential Reduced Hallucinations Less nonsensical or irrelevant outputs

    Benefits of RAG Improved Accuracy More accurate & reliable responses grounded in factual information Up-to-date Information Responses as fresh as your stored data, overcoming the limitation of model knowledge Explainability Retrieved sources help explain how the LLM generated its responses
  5. Proprietary + Confidential RAG LLM Vector DB vector embeddings chunks

    DOCS calculate split store vector + chunk ❶ INGESTION
  6. Proprietary + Confidential RAG Chatbot app LLM Vector DB vector

    embeddings chunks DOCS calculate prompt vector embedding split calculate find similar answer context + prompt + chunks store vector + chunk ❶ INGESTION ❷ RETRIEVAL
  7. Proprietary + Confidential RAG Chatbot app LLM Vector DB vector

    embeddings chunks DOCS calculate prompt vector embedding split calculate find similar answer context + prompt + chunks store vector + chunk ❶ INGESTION ❷ RETRIEVAL Docs Loading & parsing
  8. Proprietary + Confidential RAG Chatbot app LLM Vector DB vector

    embeddings chunks DOCS calculate prompt vector embedding split calculate find similar answer context + prompt + chunks store vector + chunk ❶ INGESTION ❷ RETRIEVAL TOO BIG? TOO SMALL?
  9. Proprietary + Confidential RAG Chatbot app LLM Vector DB vector

    embeddings chunks DOCS calculate prompt vector embedding split calculate find similar answer context + prompt + chunks store vector + chunk ❶ INGESTION ❷ RETRIEVAL IS chunk CONTEXT RELEVANT?
  10. Proprietary + Confidential RAG Chatbot app LLM Vector DB vector

    embeddings chunks DOCS calculate prompt vector embedding split calculate find similar answer context + prompt + chunks store vector + chunk ❶ INGESTION ❷ RETRIEVAL DO WE EMBED & STORE ALL THE INFO?
  11. Proprietary + Confidential RAG Chatbot app LLM Vector DB vector

    embeddings chunks DOCS calculate prompt vector embedding split calculate find similar answer context + prompt + chunks store vector + chunk ❶ INGESTION ❷ RETRIEVAL IS A QUESTION CLOSE TO ITS ANSWER?
  12. Proprietary + Confidential RAG Chatbot app LLM Vector DB vector

    embeddings chunks DOCS calculate prompt vector embedding split calculate find similar answer context + prompt + chunks store vector + chunk ❶ INGESTION ❷ RETRIEVAL DID THE LLM REALLY INCLUDE THE ANSWER?
  13. Proprietary + Confidential Ingestion LLM Vector DB vector embeddings chunks

    DOCS calculate split store vector + chunk Can we improve chunking?
  14. Proprietary + Confidential • Context expansion splitting ◦ parent /

    child, sliding window • Hypothetical Questions ◦ generate relevant questions • Contextual retrieval ◦ recent article from Anthropic • Semantic chunking ◦ find semantic boundaries Chunking techniques
  15. Proprietary + Confidential Berlin is the capital and largest city

    of Germany, both by area and by population. Its more than 3.85 million inhabitants make it the European Union's most populous city, as measured by population within city limits. The city is also one of the states of Germany, and is the third smallest state in the country in terms of area. Berlin is surrounded by the state of Brandenburg, and Brandenburg's capital Potsdam is nearby. The urban area of Berlin has a population of over 4.5 million and is therefore the most populous urban area in Germany. The Berlin-Brandenburg capital region has around 6.2 million inhabitants and is Germany's second-largest metropolitan region after the Rhine-Ruhr region, and the sixth-biggest metropolitan region by GDP in the European Union. Raw text
  16. Proprietary + Confidential Berlin is the capital and largest city

    of Germany, both by area and by population. Its more than 3.85 million inhabitants make it the European Union's most populous city, as measured by population within city limits. The city is also one of the states of Germany, and is the third smallest state in the country in terms of area. Berlin is surrounded by the state of Brandenburg, and Brandenburg's capital Potsdam is nearby. The urban area of Berlin has a population of over 4.5 million and is therefore the most populous urban area in Germany. The Berlin-Brandenburg capital region has around 6.2 million inhabitants and is Germany's second-largest metropolitan region after the Rhine-Ruhr region, and the sixth-biggest metropolitan region by GDP in the European Union. 100 characters split
  17. Proprietary + Confidential Berlin is the capital and largest city

    of Germany, both by area and by population. Its more than 3.85 million inhabitants make it the European Union's most populous city, as measured by population within city limits. The city is also one of the states of Germany, and is the third smallest state in the country in terms of area. Berlin is surrounded by the state of Brandenburg, and Brandenburg's capital Potsdam is nearby. The urban area of Berlin has a population of over 4.5 million and is therefore the most populous urban area in Germany. The Berlin-Brandenburg capital region has around 6.2 million inhabitants and is Germany's second-largest metropolitan region after the Rhine-Ruhr region, and the sixth-biggest metropolitan region by GDP in the European Union. 100 characters split, with 20 of overlap
  18. Proprietary + Confidential Berlin is the capital and largest city

    of Germany, both by area and by population. Its more than 3.85 million inhabitants make it the European Union's most populous city, as measured by population within city limits. The city is also one of the states of Germany, and is the third smallest state in the country in terms of area. Berlin is surrounded by the state of Brandenburg, and Brandenburg's capital Potsdam is nearby. The urban area of Berlin has a population of over 4.5 million and is therefore the most populous urban area in Germany. The Berlin-Brandenburg capital region has around 6.2 million inhabitants and is Germany's second-largest metropolitan region after the Rhine-Ruhr region, and the sixth-biggest metropolitan region by GDP in the European Union. Chunking by sentence
  19. Proprietary + Confidential Berlin is the capital and largest city

    of Germany, both by area and by population. Its more than 3.85 million inhabitants make it the European Union's most populous city, as measured by population within city limits. The city is also one of the states of Germany, and is the third smallest state in the country in terms of area. Berlin is surrounded by the state of Brandenburg, and Brandenburg's capital Potsdam is nearby. The urban area of Berlin has a population of over 4.5 million and is therefore the most populous urban area in Germany. The Berlin-Brandenburg capital region has around 6.2 million inhabitants and is Germany's second-largest metropolitan region after the Rhine-Ruhr region, and the sixth-biggest metropolitan region by GDP in the European Union. Parent (context) / child (embedding) chunking Embed sentence, but return context
  20. Proprietary + Confidential Berlin is the capital and largest city

    of Germany, both by area and by population. Its more than 3.85 million inhabitants make it the European Union's most populous city, as measured by population within city limits. The city is also one of the states of Germany, and is the third smallest state in the country in terms of area. Berlin is surrounded by the state of Brandenburg, and Brandenburg's capital Potsdam is nearby. The urban area of Berlin has a population of over 4.5 million and is therefore the most populous urban area in Germany. The Berlin-Brandenburg capital region has around 6.2 million inhabitants and is Germany's second-largest metropolitan region after the Rhine-Ruhr region, and the sixth-biggest metropolitan region by GDP in the European Union. Sentence sliding window chunking Embed sentence, but return context arxiv.org/pdf/2406.00456v1
  21. Proprietary + Confidential Berlin is the capital and largest city

    of Germany, both by area and by population. Its more than 3.85 million inhabitants make it the European Union's most populous city, as measured by population within city limits. The city is also one of the states of Germany, and is the third smallest state in the country in terms of area. Berlin is surrounded by the state of Brandenburg, and Brandenburg's capital Potsdam is nearby. The urban area of Berlin has a population of over 4.5 million and is therefore the most populous urban area in Germany. The Berlin-Brandenburg capital region has around 6.2 million inhabitants and is Germany's second-largest metropolitan region after the Rhine-Ruhr region, and the sixth-biggest metropolitan region by GDP in the European Union. Hypothetical Questions Embedding questions: • What is the capital and largest city of Germany? • What is the population of Berlin? • Which state is Berlin located in? • What is the name of the state surrounding Berlin? • What is the name of the capital of the state surrounding Berlin? pixion.co/blog/rag-strategies-hypothetical-questions-hyde
  22. Proprietary + Confidential Berlin is the capital and largest city

    of Germany, both by area and by population. Its more than 3.85 million inhabitants make it the European Union's most populous city, as measured by population within city limits. The city is also one of the states of Germany, and is the third smallest state in the country in terms of area. Berlin is surrounded by the state of Brandenburg, and Brandenburg's capital Potsdam is nearby. The urban area of Berlin has a population of over 4.5 million and is therefore the most populous urban area in Germany. The Berlin-Brandenburg capital region has around 6.2 million inhabitants and is Germany's second-largest metropolitan region after the Rhine-Ruhr region, and the sixth-biggest metropolitan region by GDP in the European Union. Contextual Retrieval (by Anthropic) Embed chunk “in context”: Berlin's population within its city limits is the largest in the European Union. www.anthropic.com/news/contextual-retrieval
  23. Proprietary + Confidential Berlin is the capital and largest city

    of Germany, both by area and by population. Its more than 3.85 million inhabitants make it the European Union's most populous city, as measured by population within city limits. The city is also one of the states of Germany, and is the third smallest state in the country in terms of area. Berlin is surrounded by the state of Brandenburg, and Brandenburg's capital Potsdam is nearby. The urban area of Berlin has a population of over 4.5 million and is therefore the most populous urban area in Germany. The Berlin-Brandenburg capital region has around 6.2 million inhabitants and is Germany's second-largest metropolitan region after the Rhine-Ruhr region, and the sixth-biggest metropolitan region by GDP in the European Union. Semantic Chunking Use an embedding model, to find very dissimilar consecutive chunks. Break at the boundaries. @GregKamradt x.com/GregKamradt/status/1738276097471754735
  24. Proprietary + Confidential Ingestion LLM Vector DB vector embeddings chunks

    DOCS calculate split store vector + chunk A better embedding model?
  25. Proprietary + Confidential Ingestion LLM Vector DB vector embeddings chunks

    DOCS calculate split store vector + chunk A more powerful database?
  26. Proprietary + Confidential Retrieval Chatbot app LLM Vector DB prompt

    vector embedding calculate find similar answer context + prompt + chunks
  27. Proprietary + Confidential Retrieval Chatbot app LLM Vector DB prompt

    vector embedding calculate find similar answer context + prompt + chunks Query compression
  28. Proprietary + Confidential Berlin is the capital and largest city

    of Germany, both by area and by population. Its more than 3.85 million inhabitants make it the European Union's most populous city, as measured by population within city limits. The city is also one of the states of Germany, and is the third smallest state in the country in terms of area. Berlin is surrounded by the state of Brandenburg, and Brandenburg's capital Potsdam is nearby. The urban area of Berlin has a population of over 4.5 million and is therefore the most populous urban area in Germany. The Berlin-Brandenburg capital region has around 6.2 million inhabitants and is Germany's second-largest metropolitan region after the Rhine-Ruhr region, and the sixth-biggest metropolitan region by GDP in the European Union. Query compression: a dialog is not a question Dialogue: - What is the capital of Germany? - Where is it situated? - How many inhabitants are there? Compressed query: - How many inhabitants live in Berlin? https://docs.langchain4j.dev/tutorials/rag/#query-transformer
  29. Proprietary + Confidential Retrieval Chatbot app LLM Vector DB prompt

    vector embedding calculate find similar answer context + prompt + chunks Query transformation
  30. Proprietary + Confidential Berlin is the capital and largest city

    of Germany, both by area and by population. Its more than 3.85 million inhabitants make it the European Union's most populous city, as measured by population within city limits. The city is also one of the states of Germany, and is the third smallest state in the country in terms of area. Berlin is surrounded by the state of Brandenburg, and Brandenburg's capital Potsdam is nearby. The urban area of Berlin has a population of over 4.5 million and is therefore the most populous urban area in Germany. The Berlin-Brandenburg capital region has around 6.2 million inhabitants and is Germany's second-largest metropolitan region after the Rhine-Ruhr region, and the sixth-biggest metropolitan region by GDP in the European Union. Hypothetical Document Embedding User query: What is the population of Berlin? Hypothetical answer (provided by LLM): There are 3 million inhabitants in Berlin https://arxiv.org/pdf/2212.10496
  31. Proprietary + Confidential Retrieval Chatbot app LLM prompt vector embedding

    calculate find similar answer context + prompt + chunks Vector DB Query routing web search
  32. Proprietary + Confidential Retrieval Chatbot app LLM Vector DB prompt

    vector embedding calculate find similar answer context + prompt + chunks Metadata filtering
  33. Proprietary + Confidential Retrieval Chatbot app LLM Vector DB prompt

    vector embedding calculate find similar answer context + prompt + chunks Reranking results
  34. Proprietary + Confidential Retrieval Chatbot app LLM Vector DB prompt

    vector embedding calculate find similar answer context + prompt + chunks Response caching
  35. Proprietary + Confidential Function calling Chatbot app Gemini What’s the

    weather like in Paris? It’s sunny in Paris! External API or service user prompt + getWeather(String) function contract call getWeather(“Paris”) for me please 󰚦 getWeather(“Paris”) {“forecast”:”sunny”} function response is {“forecast”:”sunny”} Answer: “It’s sunny in Paris!”
  36. Proprietary + Confidential AI Agents Different types, and capabilities •

    Reflection ◦ Chain-of-Thought, self reflection & correction, self grading • Planning ◦ Create a multi-step plan of action • Tool use ◦ Multiple function calling • Multi-agent collaboration ◦ Chain several LLMs and/or RAG searches
  37. Proprietary + Confidential Agentic RAG Berlin’s origins, population, geographic situation

    🧠 Agentic Assistant 🧠 —————————————— 1) Identify topics 2) Create questions 3) RAG search 4) Collect answers & generate final report 🛠 History/Geography Tool 🛠 ————————————————— 1) Execute RAG search 2) Call topic assistant to summarize topic 🧠 Topic Assistant 🧠 —————————————— 1) Study topic answers 2) Create a report summary on the topic TOPICAL REPORTS FINAL REPORT Vector database TOPICAL REPORT
  38. Proprietary + Confidential Many evaluation metrics: • ROUGE (summarization) •

    BLEU (translation) • VIEScore (multimodal) • Answer relevancy • Faithfulness • Contextual precision • Contextual recall • Contextual relevancy Evaluation is critical! • Hallucination • Tool correctness • Bias • Toxicity • G-Eval • Conversational metrics ◦ Conversation completeness ◦ Conversation relevancy ◦ Knowledge retention RAGAS, DeepEVAL for inspiration
  39. Proprietary + Confidential • Prepare a dataset of questions and

    golden responses • Use your RAG pipeline to answer those questions • Use an LLM as a judge to gauge the quality of your RAG results, against a set of metrics LLM as Judge This generated response is correct!
  40. Proprietary + Confidential Security and Data Privacy • Anonymize data

    (like Google Cloud Data Loss Prevention) • Don’t log PII details • Use local models when possible • Separate tenants for compliance with data protection laws
  41. Proprietary + Confidential • Your data isn’t stale, it’s alive

    • When a document is updated, ◦ chunking has changed ◦ old chunks need to be retired • Chunk metadata should track document origin, last update timestamps or document versions • Prepare an update schedule Data Lifecycle Data is staying alive
  42. Proprietary + Confidential Mintaka: A complex, natural, and multilingual dataset

    for end-to-end question answering. arXiv preprint arXiv:2210.01613 There are easy questions… and hard ones! Type Description Example Yes/No Answer is a Yes or No Has Lady Gaga ever made a song with Ariana Grande? Comparative Compare 2 items by an attribute Is Mont Blanc taller than Mount Rainier? Generic Simple questions Where was Michael Phelps born? Intersection Requires multiple conditions Which movie was directed by Denis Villeneuve and stars Timothee Chalamet? Ordinal Based on item's position in a list Who was the last Ptolemaic ruler of Egypt? Count Answer requires counting How many astronauts have been elected to Congress? Difference Contains a negation Which Mario Kart game did Yoshi not appear in? Superlative Max or Min of given attribute Who was the youngest tribute in the Hunger Games? Multi-hop Requires multiple steps to answer Who was the quarterback of the team that won Super Bowl 50?
  43. Proprietary + Confidential Table of Contents (220 pages): • First

    Look at LangChain4j • Understanding LangChain4j • Getting Started • Accessing Models • Invoking Models • Extending Models • Processing Documents • Handling Embeddings • Retrieval-Augmented Generation • AI Services • Putting It All Together • Summary agoncal.teachable.com amazon.com/author/agoncal
  44. Proprietary + Confidential Thanks for your attention (is all you

    need?) github.com/ datastaxdevs/ conference-2024-devoxx @[email protected]