Text Embedding Question Text Embedding Save Query Relevant Text Question LLM Vector DB Embedding model Embedding model Indexing / Embedding QA Improving RAG solutions based on real-world experiences `Talk to your data`
case ▪ e.g. ▪ intfloat/multilingual-e5-large-instruct ~ 50% ▪ T-Systems-onsite/german-roberta-sentence-transformer-v2 < 70 % ▪ danielheinz/e5-base-sts-en-de > 80% hit rate ▪ Maybe fine-tuning of the embedding model might be an option ▪ As of now: Treat embedding models as exchangeable commodities! `Talk to your data` Important Improving RAG solutions based on real-world experiences
▪ LangChain has very strong support for loading data ▪ Support for cleanup ▪ Support for splitting Loading https://python.langchain.com/docs/integrations/document_loaders Improving RAG solutions based on real-world experiences `Talk to your data`
/ not concise enough Splitting (Text Segmentation) ▪ by size (text length) ▪ by character (\n\n) ▪ by paragraph, sentence, words (until small enough) ▪ by size (tokens) ▪ overlapping chunks (token-wise) Improving RAG solutions based on real-world experiences `Talk to your data`
Database “What is the name of the teacher?” Query Doc. 1: 0.86 Doc. 2: 0.84 Doc. 3: 0.79 Weighted result … (Answer generation) Improving RAG solutions based on real-world experiences `Talk to your data`
Embedddings) LLM, e.g. GPT-3.5-turbo Embedding 𝑎 𝑏 𝑐 … Vector- Database Doc. 3: 0.86 Doc. 2: 0.81 Doc. 1: 0.81 Weighted result Hypothetical Document Embedding- Model Write a company policy that contains all information which will answer the given question: {QUERY} “What should I do, if I missed the last train?” Query https://arxiv.org/abs/2212.10496 Improving RAG solutions based on real-world experiences `Talk to your data`
be transformed through an LLM (slow & expensive) ▪ A lot of requests will probably be very similar to each other ▪ Each time a different hyp. document is generated, even for an extremely similar request ▪ Leads to very different results each time ▪ Idea: Alternative indexing ▪ Transform the document, not the query What else? Improving RAG solutions based on real-world experiences `Talk to your data`
Transformed document Write 3 questions, which are answered by the following document. Chunk of Document Embedding- Model Embedding 𝑎 𝑏 𝑐 … Vector- Database Metadata: content of original chunk Improving RAG solutions based on real-world experiences `Talk to your data`
𝑐 … Vector- Database Doc. 3: 0.89 Doc. 1: 0.86 Doc. 2: 0.76 Weighted result Original document from metadata “What should I do, if I missed the last train?” Query Improving RAG solutions based on real-world experiences `Talk to your data`
Text Embedding Question Text Embedding Save Query Relevant Text Question LLM Vector DB Embedding model Embedding model Indexing / Embedding QA Improving RAG solutions based on real-world experiences `Talk to your data`
HyQE or alternative indexing ▪ How many questions? ▪ With or without summary ▪ Other approaches ▪ Only generate summary ▪ Extract “Intent” from user input and search by that ▪ Transform document and query to a common search embedding ▪ HyKSS: Hybrid Keyword and Semantic Search https://www.deg.byu.edu/papers/HyKSS.pdf Recap: Not good enough? Improving RAG solutions based on real-world experiences `Talk to your data`
AI business use-case ▪ Quality of results depend heavily on data quality and preparation pipeline ▪ Always evaluate approaches with your own data & queries ▪ The actual / final approach is more involved as it seems on the first glance ▪ RAG pattern can will produce breathtaking good results Conclusion Improving RAG solutions based on real-world experiences `Talk to your data`