Lock in $30 Savings on PRO—Offer Ends Soon! ⏳

"Talk to your data": Improving RAG solutions ba...

"Talk to your data": Improving RAG solutions based on real-world experiences

Slides for my talk at Technical Summit

Sebastian Gingter

October 15, 2024
Tweet

More Decks by Sebastian Gingter

Other Decks in Programming

Transcript

  1. ‘Talk to your data’ Improving RAG solutions based on real-world

    experiences Sebastian Gingter | Developer Consultant | Thinktecture AG | [email protected]
  2. Retrieval-augmented generation (RAG) Indexing & (Semantic) search Cleanup & Split

    Text Embedding Question Text Embedding Save Query Relevant Text Question LLM Vector DB Embedding model Embedding model Indexing / Embedding QA Improving RAG solutions based on real-world experiences `Talk to your data`
  3. 4 Vectors from your Embedding-Model 0 Improving RAG solutions based

    on real-world experiences `Talk to your data`
  4. 5 ▪ Select your Embedding Model carefully for your use

    case ▪ e.g. ▪ intfloat/multilingual-e5-large-instruct ~ 50% ▪ T-Systems-onsite/german-roberta-sentence-transformer-v2 < 70 % ▪ danielheinz/e5-base-sts-en-de > 80% hit rate ▪ Maybe fine-tuning of the embedding model might be an option ▪ As of now: Treat embedding models as exchangeable commodities! `Talk to your data` Important Improving RAG solutions based on real-world experiences
  5. 7 ▪ Loading ▪ Clean-up ▪ Splitting ▪ Embedding ▪

    Storing Steps of indexing Improving RAG solutions based on real-world experiences `Talk to your data`
  6. 8 ▪ Import documents from different sources, in different formats

    ▪ LangChain has very strong support for loading data ▪ Support for cleanup ▪ Support for splitting Loading https://python.langchain.com/docs/integrations/document_loaders Improving RAG solutions based on real-world experiences `Talk to your data`
  7. 9 ▪ HTML Tags ▪ Formatting information ▪ Normalization ▪

    lowercasing ▪ stemming, lemmatization ▪ remove punctuation & stop words ▪ Enrichment ▪ tagging ▪ keywords, categories ▪ metadata Clean-up Improving RAG solutions based on real-world experiences `Talk to your data`
  8. 10 ▪ Document is too large / too much content

    / not concise enough Splitting (Text Segmentation) ▪ by size (text length) ▪ by character (\n\n) ▪ by paragraph, sentence, words (until small enough) ▪ by size (tokens) ▪ overlapping chunks (token-wise) Improving RAG solutions based on real-world experiences `Talk to your data`
  9. 12 ▪ Indexing Vector-Databases Splitted (smaller) parts Embedding- Model Embedding

    𝑎 𝑏 𝑐 … Vector- Database Document Metadata: Reference to original document Improving RAG solutions based on real-world experiences `Talk to your data`
  10. 14 Retrieval Embedding- Model Embedding 𝑎 𝑏 𝑐 … Vector-

    Database “What is the name of the teacher?” Query Doc. 1: 0.86 Doc. 2: 0.84 Doc. 3: 0.79 Weighted result … (Answer generation) Improving RAG solutions based on real-world experiences `Talk to your data`
  11. 15 Indexing II Not good enough? Improving RAG solutions based

    on real-world experiences `Talk to your data`
  12. 16 Not good enough? ? Improving RAG solutions based on

    real-world experiences `Talk to your data`
  13. 17 ▪ Semantic search is just search ▪ It’s just

    as good as your embeddings ▪ Garbage in -> garbage out Not good enough? Improving RAG solutions based on real-world experiences `Talk to your data`
  14. 18 ▪ Search for a hypothetical Document HyDE (Hypothetical Document

    Embedddings) LLM, e.g. GPT-3.5-turbo Embedding 𝑎 𝑏 𝑐 … Vector- Database Doc. 3: 0.86 Doc. 2: 0.81 Doc. 1: 0.81 Weighted result Hypothetical Document Embedding- Model Write a company policy that contains all information which will answer the given question: {QUERY} “What should I do, if I missed the last train?” Query https://arxiv.org/abs/2212.10496 Improving RAG solutions based on real-world experiences `Talk to your data`
  15. 19 ▪ Downside of HyDE: ▪ Each request needs to

    be transformed through an LLM (slow & expensive) ▪ A lot of requests will probably be very similar to each other ▪ Each time a different hyp. document is generated, even for an extremely similar request ▪ Leads to very different results each time ▪ Idea: Alternative indexing ▪ Transform the document, not the query What else? Improving RAG solutions based on real-world experiences `Talk to your data`
  16. 20 Alternative Indexing HyQE: Hypothetical Question Embedding LLM, e.g. GPT-3.5-turbo

    Transformed document Write 3 questions, which are answered by the following document. Chunk of Document Embedding- Model Embedding 𝑎 𝑏 𝑐 … Vector- Database Metadata: content of original chunk Improving RAG solutions based on real-world experiences `Talk to your data`
  17. 21 ▪ Retrieval Alternative Indexing Embedding- Model Embedding 𝑎 𝑏

    𝑐 … Vector- Database Doc. 3: 0.89 Doc. 1: 0.86 Doc. 2: 0.76 Weighted result Original document from metadata “What should I do, if I missed the last train?” Query Improving RAG solutions based on real-world experiences `Talk to your data`
  18. Comparing Embeddings TALK TO YOUR DATA DEMO Improving RAG solutions

    based on real-world experiences `Talk to your data`
  19. Retrieval-augmented generation (RAG) Indexing & (Semantic) search Cleanup & Split

    Text Embedding Question Text Embedding Save Query Relevant Text Question LLM Vector DB Embedding model Embedding model Indexing / Embedding QA Improving RAG solutions based on real-world experiences `Talk to your data`
  20. 25 ▪ Tune text cleanup, segmentation, splitting ▪ HyDE or

    HyQE or alternative indexing ▪ How many questions? ▪ With or without summary ▪ Other approaches ▪ Only generate summary ▪ Extract “Intent” from user input and search by that ▪ Transform document and query to a common search embedding ▪ HyKSS: Hybrid Keyword and Semantic Search https://www.deg.byu.edu/papers/HyKSS.pdf Recap: Not good enough? Improving RAG solutions based on real-world experiences `Talk to your data`
  21. 26 ▪ Semantic search is a first and fast Generative

    AI business use-case ▪ Quality of results depend heavily on data quality and preparation pipeline ▪ Always evaluate approaches with your own data & queries ▪ The actual / final approach is more involved as it seems on the first glance ▪ RAG pattern can will produce breathtaking good results Conclusion Improving RAG solutions based on real-world experiences `Talk to your data`