Upgrade to Pro — share decks privately, control downloads, hide ads and more …

AI’s Secret Weapon: Turning Documents into Know...

AI’s Secret Weapon: Turning Documents into Knowledge (CWYD)

This talk is about Chat with your documents (CWYD) and how to turn documents into knowledge. This talk handles basic about RAG and semantic search. Further it compares full document vs chunked RAG. To also ensure the quality a short detour to data-driven AI development is done.

Debut presentation at https://www.meetup.com/cloud-native-night/events/302568956

Alexander Eimer

September 05, 2024
Tweet

More Decks by Alexander Eimer

Other Decks in Technology

Transcript

  1. About me • Senior Software Engineer @QAware GmbH • Currently

    working in an AI / RAG project at a big german corporate • Platform Engineer that shifted to AI • Passionate about delivering quality in software and platforms QAware | 2
  2. QAware | 4 Bernd moves out of his flat and

    needs to know what are things to do by contract when he leaves his old flat. 🏚 ➡ 🏡
  3. What is “Chat with your documents”? QAware | 8 Using

    an LLM to get answers and findings from documents (like txt, pdf, docx or any other format with prosa text)
  4. Why do we want CWYD? QAware | 9 Find, summarize

    and polish information It will fundamentally change the way we interact with information as the information gathering is faster and easier than ever before
  5. What is RAG - Retrieval Augmented Generation? ▪ Retrieval: Using

    some kind of search to find the relevant information for a given question. ▪ Augmented: The Language Model has facts, that it can refer to. ▪ Generation: Injecting those into the prompts to the Language Model alongside the question, so it hopefully has facts to refer to. QAware | 11
  6. Asking Wikipedia 🙈 Prompts often contain a few examples. Examples

    can be automatically retrieved from a database with document retrieval, sometimes using a vector database. Given a query, a document retriever is called to retrieve the most relevant. Wikipedia (en) - Prompt Engineering QAware | 12
  7. Chatbots and AI Assistants: Variants and stages ChatGPT or other

    Chatbots with common knowledge ChatGPT with organization-specific knowledge Specialized AI Assistant ▪ Retrieval Augment Generation ▪ Transfer Learning ▪ Self trained models ▪ Processautomation effort benefit ▪ Easy and cheap ▪ Needs guidelines on Data protection & compliance
  8. Why shouldn’t I train a LLM instead? QAware | 15

    • Bad retrieval of detailed information • Inflexible as training needs time • Access rights can’t be guaranteed • Expensive to train
  9. RAG example: rental agreement QAware | 16 What are my

    obligations when I want to move out of my flat? ai_model.complete(“ {{ system_prompt }} {{ chat_history }} {{ document }} {{ user_input }} “) You need to paint the walls in white color. Also, you need to make sure that all stains in the floor are removed and everything is swept clean before you leave.
  10. RAG example: Actual prompt QAware | 17 ai_model.complete(“ You are

    a friendly ChatBot called QAbot. You never get rude or disrespectful. --- Your chat history so far was: N/A --- The user has the following document as further input: Rental agreement This is a rental agreement between [...] --- The userinput is: What are my obligations when I want to move out of my flat? “)
  11. RAG example: rental agreement QAware | 18 What are my

    obligations when I want to move out of my flat? You need to paint the walls in white color. Also, you need to make sure that all stains in the floor are removed and everything is swept clean before you leave. Thanks, but do I also need to return the key or can I just throw it away? According to the rental agreement in paragraph 8 you need to return all keys in person to the landlord.
  12. RAG example: Second request QAware | 19 ai_model.complete(“ You are

    a friendly ChatBot called QAbot. You never get rude or disrespectful. --- Your chat history so far was: USER: What are my obligations when I want to move out of my flat? BOT: You need to paint the walls in white color. Also, you need to make sure that all stains in the floor are removed and everything is swept clean before you leave. --- The user has the following document as further input: Rental agreement This is a rental agreement between [...] --- The userinput is: Thanks, but do I also need to return the key or can I just throw it away? “)
  13. There are two base types QAware | 21 Full context

    LLM has full document in context Chunks / parts LLM has parts of the document • LLM gets the whole document as input • Only possible from GPT-4-turbo on as recall was too bad before • Allows “real” research as all information are present • ⇒ Allows summarization • Possibly expensive • This is usually meant if people talk about RAG • Need to find relevant parts of the document with external component • No “real” research possible • ⇒ Can’t summarize document • Challenge: Find correct chunks • Possibly cheaper
  14. Full document only possible since good recall QAware | 23

    https://www.databricks.com/blog/long-context-rag-performance-llms
  15. Retrieval methods QAware | 24 • Retrieval Augmented Generation (RAG)

    • Retrieval methods ◦ Keyword search ◦ Semantic search ◦ APIs ▪ Confluence search ▪ Google ▪ Notion ▪ … ◦ Knowledge graphs ◦ HyDE ◦ RAPTOR ◦ … ⬅ most used
  16. Chunking • Cut input text into (useful) chunks ◦ Recursive

    chunking ◦ Semantic chunking • Each chunk should have one “semantics” ⇒ not mixing multiple topics • Chunks are given to the prompt ⇒ RAG QAware | 27
  17. Chunking (simplified example) QAware unites people who share a passion

    for coding. We have a deep expertise in software engineering, and a keen eye for perfectly tailored solutions. Since 2005, we’ve been boosting corporations, medium-sized companies, and startups with exceptional quality and productivity in agile software engineering. Our journey as cloud pioneers began in 2010. We are asked when legacy applications need to be analyzed, stabilized or remediated. When they need to go lightweight in the cloud. And we are the ones who build new systems with craft pride, inventiveness and passion. You benefit from excellent code. From outstanding quality. From forward-thinking innovation. And from insights into the real levers for more impact. QAware | 28
  18. Chunking (simplified example) QAware unites people who share a passion

    for coding. We have a deep expertise in software engineering, and a keen eye for perfectly tailored solutions. Since 2005, we’ve been boosting corporations, medium-sized companies, and startups with exceptional quality and productivity in agile software engineering. Our journey as cloud pioneers began in 2010. We are asked when legacy applications need to be analyzed, stabilized or remediated. When they need to go lightweight in the cloud. And we are the ones who build new systems with craft pride, inventiveness and passion. You benefit from excellent code. From outstanding quality. From forward-thinking innovation. And from insights into the real levers for more impact. QAware | 29
  19. What is semantic “similarity”? QAware | 33 Semantic similarity is

    a metric defined over a set of documents or terms, where the idea of distance between items is based on the likeness of their meaning or semantic content as opposed to lexicographical similarity. Wikipedia (en) - semantic similarity
  20. It’s all about vectors [...] • Semantic can be represented

    as a big vector • embedding models calculate (deterministically) vector for text-input ◦ ada02 is very often used (vector length: 1536) ◦ text-embedding-3-large is newer (vector length: 3072) • Cosine similarity as search-function ◦ Vectors “close together” have a same semantic ◦ Quite fast calculatable QAware | 34
  21. What is a vector? (simplified) • It’s very hard to

    explain • Naïvely spoken: one property of a vector is one category • Input can be more than one word QAware | 35 Y X
  22. What is a vector? (simplified) QAware | 36 Input Vector

    Color Yellow Car BMW House Apple -0.7 0.3 -1.0 -1.0 -1.0 BMW x5 0.2 0.1 0.9 1.0 -1.0 Renault 0.2 0.3 0.9 -0.8 -1.0 Traffic Light 0.5 0.8 0.4 -0.2 -1.0 Heating system -0.9 -1.0 -0.9 -0.5 0.8 Garage -0.8 -0.9 0.8 0.2 0.8
  23. Vector DBs • Vector databases store vectors with (meta)data •

    Search functions on vector • Examples ◦ Weaviate ◦ Chroma ◦ pg_vector (Postgres) ◦ Cassandra QAware | 37
  24. Chunking + Semantics + Vectors + DB Bring it all

    together to get RAG: • Chunk to get usable text-parts to be used in the prompt • Define semantics from embedding model • Represent semantic value as vector • Store and find vectors with chunks in DB QAware | 38
  25. SaaS Example: Azure AI Search QAware | 39 Phase 2:

    Retrieval Phase 1: Ingestion https://learn.microsoft.com/de-de/azure/search/vector-search-overview
  26. Full text vs Chunks QAware | 41 Which variant to

    choose? ⇒ It depends… It’s a consideration between • Money ◦ Full context uses probably more input-token ◦ Semantic search needs embedding and a database • Time ◦ Shorter input requests get answered faster ⇒ Let’s see if context-caching is thing soon?! ◦ Running a semantic search can also take some time • Effort ◦ Full context can be used by everybody ⇒ only needs the LLM ◦ Storing and finding chunks is more complex • Document size ◦ Full context only works if the document is smaller than the LLM max input context ◦ Any size of document can be chunked
  27. Research has not been idle: RAG Survey Paper Dec 2023

    QAware | 43 What is RAG? Naive RAG
  28. Why testing AI / RAG systems? QAware | 45 How

    do you know • …your system prompt is good? • …that the change on the prompt is improving the answers? • …that the correct information are retrieved? • …that the answers are friendly and not aggressive? • …that over time the answer-quality does not regress?
  29. How to test AI/RAG systems? QAware | 47 • Choose

    the metrics to check • Create a set of test-questions and ground truths • Setup a(nother) LLM to rate the answers • Enable the app to return the chunks • Run the test (regularly)
  30. Which metrics can be used? QAware | 48 https://freeplay.ai/blog/using-llms-to-automatically-evaluate-rag-prompts-pipelines •

    Faithfulness How much halluzination? • Context Relevance How good was the retrieval? • Answer Relevance How good was the answer in relation to the question? • … There are many more
  31. Create a test set QAware | 49 Test set content

    • Question A question for the app to answer • Answer The answer the app’s LLM returned for the question • Contexts The used chunks given to the app’s LLM • Ground truth The correct answer for the question
  32. Example test data QAware | 50 data_samples = { 'question':

    [ 'When was the first super bowl?', 'Who won the most super bowls?'], 'answer': [ 'The first superbowl was held on January 15, 1967', 'The most super bowls have been won by The New England Patriots'], 'contexts' : [ ['The Super Bowl....season since 1966,','replacing the NFL...in February.'], ['The Green Bay Packers...Green Bay, Wisconsin.','The Packers compete...Football Conference']], 'ground_truth': [ 'The first superbowl was held on January 15, 1967', 'The New England Patriots have won the Super Bowl a record six times'] } https://docs.ragas.io/en/stable/howtos/applications/data_preparation.html
  33. Frameworks QAware | 52 • RAGAS https://docs.ragas.io Framework to build

    your own tests • promptfoo https://www.promptfoo.dev UI driven all-in application
  34. It does exists (... or not)! QAware | 54 Well…

    only if there is enough time and money. So it is an academically solved issue! Based on big context LLMs and advanced RAG technology the perfect search exists.
  35. QAware | 56 Bernd chose to use the full document

    approach and is happy now. It was easy to use and it worked ad-hoc.
  36. Key learnings • In the long term, RAG or similar

    technologies will revolutionise the way we search and research information. • Retrieval is a difficult topic.Especially semantic search is difficult to implement in practice. • Research is strong. With RAG, the solution space has exploded. But most of the Advanced techniques won’t survive. • Following and evaluating the innovations is a full-time job.
  37. Q&A

  38. qaware.de QAware GmbH Mainz Rheinstraße 4 C 55116 Mainz Tel.

    +49 6131 21569-0 [email protected] twitter.com/qaware linkedin.com/company/qaware-gmbh xing.com/companies/qawaregmbh slideshare.net/qaware github.com/qaware