Slide 1

Slide 1 text

‘Talk to your data’ Improving RAG solutions based on real-world experiences Sebastian Gingter | Developer Consultant | Thinktecture AG | [email protected]

Slide 2

Slide 2 text

2 Introduction Improving RAG solutions based on real-world experiences `Talk to your data`

Slide 3

Slide 3 text

Retrieval-augmented generation (RAG) Indexing & (Semantic) search Cleanup & Split Text Embedding Question Text Embedding Save Query Relevant Text Question LLM Vector DB Embedding model Embedding model Indexing / Embedding QA Improving RAG solutions based on real-world experiences `Talk to your data`

Slide 4

Slide 4 text

4 Vectors from your Embedding-Model 0 Improving RAG solutions based on real-world experiences `Talk to your data`

Slide 5

Slide 5 text

5 ▪ Select your Embedding Model carefully for your use case ▪ e.g. ▪ intfloat/multilingual-e5-large-instruct ~ 50% ▪ T-Systems-onsite/german-roberta-sentence-transformer-v2 < 70 % ▪ danielheinz/e5-base-sts-en-de > 80% hit rate ▪ Maybe fine-tuning of the embedding model might be an option ▪ As of now: Treat embedding models as exchangeable commodities! `Talk to your data` Important Improving RAG solutions based on real-world experiences

Slide 6

Slide 6 text

6 Indexing Improving RAG solutions based on real-world experiences `Talk to your data`

Slide 7

Slide 7 text

7 ▪ Loading ▪ Clean-up ▪ Splitting ▪ Embedding ▪ Storing Steps of indexing Improving RAG solutions based on real-world experiences `Talk to your data`

Slide 8

Slide 8 text

8 ▪ Import documents from different sources, in different formats ▪ LangChain has very strong support for loading data ▪ Support for cleanup ▪ Support for splitting Loading https://python.langchain.com/docs/integrations/document_loaders Improving RAG solutions based on real-world experiences `Talk to your data`

Slide 9

Slide 9 text

9 ▪ HTML Tags ▪ Formatting information ▪ Normalization ▪ lowercasing ▪ stemming, lemmatization ▪ remove punctuation & stop words ▪ Enrichment ▪ tagging ▪ keywords, categories ▪ metadata Clean-up Improving RAG solutions based on real-world experiences `Talk to your data`

Slide 10

Slide 10 text

10 ▪ Document is too large / too much content / not concise enough Splitting (Text Segmentation) ▪ by size (text length) ▪ by character (\n\n) ▪ by paragraph, sentence, words (until small enough) ▪ by size (tokens) ▪ overlapping chunks (token-wise) Improving RAG solutions based on real-world experiences `Talk to your data`

Slide 11

Slide 11 text

11 Splitting (Semantic chunking) Improving RAG solutions based on real-world experiences `Talk to your data`

Slide 12

Slide 12 text

12 ▪ Indexing Vector-Databases Splitted (smaller) parts Embedding- Model Embedding 𝑎 𝑏 𝑐 … Vector- Database Document Metadata: Reference to original document Improving RAG solutions based on real-world experiences `Talk to your data`

Slide 13

Slide 13 text

13 Retrieval (Search) Improving RAG solutions based on real-world experiences `Talk to your data`

Slide 14

Slide 14 text

14 Retrieval Embedding- Model Embedding 𝑎 𝑏 𝑐 … Vector- Database “What is the name of the teacher?” Query Doc. 1: 0.86 Doc. 2: 0.84 Doc. 3: 0.79 Weighted result … (Answer generation) Improving RAG solutions based on real-world experiences `Talk to your data`

Slide 15

Slide 15 text

15 Indexing II Not good enough? Improving RAG solutions based on real-world experiences `Talk to your data`

Slide 16

Slide 16 text

16 Not good enough? ? Improving RAG solutions based on real-world experiences `Talk to your data`

Slide 17

Slide 17 text

17 ▪ Semantic search is just search ▪ It’s just as good as your embeddings ▪ Garbage in -> garbage out Not good enough? Improving RAG solutions based on real-world experiences `Talk to your data`

Slide 18

Slide 18 text

18 ▪ Search for a hypothetical Document HyDE (Hypothetical Document Embedddings) LLM, e.g. GPT-3.5-turbo Embedding 𝑎 𝑏 𝑐 … Vector- Database Doc. 3: 0.86 Doc. 2: 0.81 Doc. 1: 0.81 Weighted result Hypothetical Document Embedding- Model Write a company policy that contains all information which will answer the given question: {QUERY} “What should I do, if I missed the last train?” Query https://arxiv.org/abs/2212.10496 Improving RAG solutions based on real-world experiences `Talk to your data`

Slide 19

Slide 19 text

19 ▪ Downside of HyDE: ▪ Each request needs to be transformed through an LLM (slow & expensive) ▪ A lot of requests will probably be very similar to each other ▪ Each time a different hyp. document is generated, even for an extremely similar request ▪ Leads to very different results each time ▪ Idea: Alternative indexing ▪ Transform the document, not the query What else? Improving RAG solutions based on real-world experiences `Talk to your data`

Slide 20

Slide 20 text

20 Alternative Indexing HyQE: Hypothetical Question Embedding LLM, e.g. GPT-3.5-turbo Transformed document Write 3 questions, which are answered by the following document. Chunk of Document Embedding- Model Embedding 𝑎 𝑏 𝑐 … Vector- Database Metadata: content of original chunk Improving RAG solutions based on real-world experiences `Talk to your data`

Slide 21

Slide 21 text

21 ▪ Retrieval Alternative Indexing Embedding- Model Embedding 𝑎 𝑏 𝑐 … Vector- Database Doc. 3: 0.89 Doc. 1: 0.86 Doc. 2: 0.76 Weighted result Original document from metadata “What should I do, if I missed the last train?” Query Improving RAG solutions based on real-world experiences `Talk to your data`

Slide 22

Slide 22 text

Comparing Embeddings TALK TO YOUR DATA DEMO Improving RAG solutions based on real-world experiences `Talk to your data`

Slide 23

Slide 23 text

23 Conclusion Improving RAG solutions based on real-world experiences `Talk to your data`

Slide 24

Slide 24 text

Retrieval-augmented generation (RAG) Indexing & (Semantic) search Cleanup & Split Text Embedding Question Text Embedding Save Query Relevant Text Question LLM Vector DB Embedding model Embedding model Indexing / Embedding QA Improving RAG solutions based on real-world experiences `Talk to your data`

Slide 25

Slide 25 text

25 ▪ Tune text cleanup, segmentation, splitting ▪ HyDE or HyQE or alternative indexing ▪ How many questions? ▪ With or without summary ▪ Other approaches ▪ Only generate summary ▪ Extract “Intent” from user input and search by that ▪ Transform document and query to a common search embedding ▪ HyKSS: Hybrid Keyword and Semantic Search https://www.deg.byu.edu/papers/HyKSS.pdf Recap: Not good enough? Improving RAG solutions based on real-world experiences `Talk to your data`

Slide 26

Slide 26 text

26 ▪ Semantic search is a first and fast Generative AI business use-case ▪ Quality of results depend heavily on data quality and preparation pipeline ▪ Always evaluate approaches with your own data & queries ▪ The actual / final approach is more involved as it seems on the first glance ▪ RAG pattern can will produce breathtaking good results Conclusion Improving RAG solutions based on real-world experiences `Talk to your data`

Slide 27

Slide 27 text

Thank you! Sebastian Gingter https://thinktecture.com/sebastian-gingter https://github.com/thinktecture-labs/talk-to-your-data