AI’s Secret Weapon: Turning Documents into Knowledge (CWYD)

qaware.de AI’s Secret Weapon: Turning Documents into Knowledge Alexander Eimer
[email protected] @aeimer Cloud Native Night 05.09.2024

About me • Senior Software Engineer @QAware GmbH • Currently
working in an AI / RAG project at a big german corporate • Platform Engineer that shifted to AI • Passionate about delivering quality in software and platforms QAware | 2

QAware | 3 This is Bernd

QAware | 4 Bernd moves out of his ﬂat and
needs to know what are things to do by contract when he leaves his old ﬂat. 🏚 ➡ 🏡

QAware | 5 Bernd has an idea! AI is the
salvation!

Is it that easy? QAware | 6

QAware | 7 🤔 ❓

What is “Chat with your documents”? QAware | 8 Using
an LLM to get answers and ﬁndings from documents (like txt, pdf, docx or any other format with prosa text)

Why do we want CWYD? QAware | 9 Find, summarize
and polish information It will fundamentally change the way we interact with information as the information gathering is faster and easier than ever before

Bigger picture – RAG QAware | 10

What is RAG - Retrieval Augmented Generation? ▪ Retrieval: Using
some kind of search to ﬁnd the relevant information for a given question. ▪ Augmented: The Language Model has facts, that it can refer to. ▪ Generation: Injecting those into the prompts to the Language Model alongside the question, so it hopefully has facts to refer to. QAware | 11

Asking Wikipedia 🙈 Prompts often contain a few examples. Examples
can be automatically retrieved from a database with document retrieval, sometimes using a vector database. Given a query, a document retriever is called to retrieve the most relevant. Wikipedia (en) - Prompt Engineering QAware | 12

RAG – information sources QAware | 13

Chatbots and AI Assistants: Variants and stages ChatGPT or other
Chatbots with common knowledge ChatGPT with organization-speciﬁc knowledge Specialized AI Assistant ▪ Retrieval Augment Generation ▪ Transfer Learning ▪ Self trained models ▪ Processautomation effort beneﬁt ▪ Easy and cheap ▪ Needs guidelines on Data protection & compliance

Why shouldn’t I train a LLM instead? QAware | 15
• Bad retrieval of detailed information • Inﬂexible as training needs time • Access rights can’t be guaranteed • Expensive to train

RAG example: rental agreement QAware | 16 What are my
obligations when I want to move out of my ﬂat? ai_model.complete(“ {{ system_prompt }} {{ chat_history }} {{ document }} {{ user_input }} “) You need to paint the walls in white color. Also, you need to make sure that all stains in the ﬂoor are removed and everything is swept clean before you leave.

RAG example: Actual prompt QAware | 17 ai_model.complete(“ You are
a friendly ChatBot called QAbot. You never get rude or disrespectful. --- Your chat history so far was: N/A --- The user has the following document as further input: Rental agreement This is a rental agreement between [...] --- The userinput is: What are my obligations when I want to move out of my flat? “)

RAG example: rental agreement QAware | 18 What are my
obligations when I want to move out of my ﬂat? You need to paint the walls in white color. Also, you need to make sure that all stains in the ﬂoor are removed and everything is swept clean before you leave. Thanks, but do I also need to return the key or can I just throw it away? According to the rental agreement in paragraph 8 you need to return all keys in person to the landlord.

RAG example: Second request QAware | 19 ai_model.complete(“ You are
a friendly ChatBot called QAbot. You never get rude or disrespectful. --- Your chat history so far was: USER: What are my obligations when I want to move out of my flat? BOT: You need to paint the walls in white color. Also, you need to make sure that all stains in the floor are removed and everything is swept clean before you leave. --- The user has the following document as further input: Rental agreement This is a rental agreement between [...] --- The userinput is: Thanks, but do I also need to return the key or can I just throw it away? “)

CWYD variants

There are two base types QAware | 21 Full context
LLM has full document in context Chunks / parts LLM has parts of the document • LLM gets the whole document as input • Only possible from GPT-4-turbo on as recall was too bad before • Allows “real” research as all information are present • ⇒ Allows summarization • Possibly expensive • This is usually meant if people talk about RAG • Need to ﬁnd relevant parts of the document with external component • No “real” research possible • ⇒ Can’t summarize document • Challenge: Find correct chunks • Possibly cheaper

CWYD variants QAware | 22 ‼ Retrieval?

Full document only possible since good recall QAware | 23
https://www.databricks.com/blog/long-context-rag-performance-llms

Retrieval methods QAware | 24 • Retrieval Augmented Generation (RAG)
• Retrieval methods ◦ Keyword search ◦ Semantic search ◦ APIs ▪ Conﬂuence search ▪ Google ▪ Notion ▪ … ◦ Knowledge graphs ◦ HyDE ◦ RAPTOR ◦ … ⬅ most used

Semantic search

RAG with semantic search QAware | 26 Phase 1: Ingestion
Phase 2: Retrieval

Chunking • Cut input text into (useful) chunks ◦ Recursive
chunking ◦ Semantic chunking • Each chunk should have one “semantics” ⇒ not mixing multiple topics • Chunks are given to the prompt ⇒ RAG QAware | 27

Chunking (simpliﬁed example) QAware unites people who share a passion
for coding. We have a deep expertise in software engineering, and a keen eye for perfectly tailored solutions. Since 2005, we’ve been boosting corporations, medium-sized companies, and startups with exceptional quality and productivity in agile software engineering. Our journey as cloud pioneers began in 2010. We are asked when legacy applications need to be analyzed, stabilized or remediated. When they need to go lightweight in the cloud. And we are the ones who build new systems with craft pride, inventiveness and passion. You beneﬁt from excellent code. From outstanding quality. From forward-thinking innovation. And from insights into the real levers for more impact. QAware | 28

Chunking (simpliﬁed example) QAware unites people who share a passion
for coding. We have a deep expertise in software engineering, and a keen eye for perfectly tailored solutions. Since 2005, we’ve been boosting corporations, medium-sized companies, and startups with exceptional quality and productivity in agile software engineering. Our journey as cloud pioneers began in 2010. We are asked when legacy applications need to be analyzed, stabilized or remediated. When they need to go lightweight in the cloud. And we are the ones who build new systems with craft pride, inventiveness and passion. You beneﬁt from excellent code. From outstanding quality. From forward-thinking innovation. And from insights into the real levers for more impact. QAware | 29

Typical chunking problem QAware | 30

Chunking: Key Learning Chunk sizing is hard! QAware | 31

QAware | 32 ✅ ⬅ ⬆ ⬇

What is semantic “similarity”? QAware | 33 Semantic similarity is
a metric deﬁned over a set of documents or terms, where the idea of distance between items is based on the likeness of their meaning or semantic content as opposed to lexicographical similarity. Wikipedia (en) - semantic similarity

It’s all about vectors [...] • Semantic can be represented
as a big vector • embedding models calculate (deterministically) vector for text-input ◦ ada02 is very often used (vector length: 1536) ◦ text-embedding-3-large is newer (vector length: 3072) • Cosine similarity as search-function ◦ Vectors “close together” have a same semantic ◦ Quite fast calculatable QAware | 34

What is a vector? (simpliﬁed) • It’s very hard to
explain • Naïvely spoken: one property of a vector is one category • Input can be more than one word QAware | 35 Y X

What is a vector? (simpliﬁed) QAware | 36 Input Vector
Color Yellow Car BMW House Apple -0.7 0.3 -1.0 -1.0 -1.0 BMW x5 0.2 0.1 0.9 1.0 -1.0 Renault 0.2 0.3 0.9 -0.8 -1.0 Traffic Light 0.5 0.8 0.4 -0.2 -1.0 Heating system -0.9 -1.0 -0.9 -0.5 0.8 Garage -0.8 -0.9 0.8 0.2 0.8

Vector DBs • Vector databases store vectors with (meta)data •
Search functions on vector • Examples ◦ Weaviate ◦ Chroma ◦ pg_vector (Postgres) ◦ Cassandra QAware | 37

Chunking + Semantics + Vectors + DB Bring it all
together to get RAG: • Chunk to get usable text-parts to be used in the prompt • Deﬁne semantics from embedding model • Represent semantic value as vector • Store and ﬁnd vectors with chunks in DB QAware | 38

SaaS Example: Azure AI Search QAware | 39 Phase 2:
Retrieval Phase 1: Ingestion https://learn.microsoft.com/de-de/azure/search/vector-search-overview

What to choose?

Full text vs Chunks QAware | 41 Which variant to
choose? ⇒ It depends… It’s a consideration between • Money ◦ Full context uses probably more input-token ◦ Semantic search needs embedding and a database • Time ◦ Shorter input requests get answered faster ⇒ Let’s see if context-caching is thing soon?! ◦ Running a semantic search can also take some time • Effort ◦ Full context can be used by everybody ⇒ only needs the LLM ◦ Storing and ﬁnding chunks is more complex • Document size ◦ Full context only works if the document is smaller than the LLM max input context ◦ Any size of document can be chunked

Research has not been idle: RAG Survey Paper Dec 2023
QAware | 42

Research has not been idle: RAG Survey Paper Dec 2023
QAware | 43 What is RAG? Naive RAG

Data-driven development

Why testing AI / RAG systems? QAware | 45 How
do you know • …your system prompt is good? • …that the change on the prompt is improving the answers? • …that the correct information are retrieved? • …that the answers are friendly and not aggressive? • …that over time the answer-quality does not regress?

AI is not deterministic — unit tests don’t work QAware
| 46 ➡ LLM as a judge

How to test AI/RAG systems? QAware | 47 • Choose
the metrics to check • Create a set of test-questions and ground truths • Setup a(nother) LLM to rate the answers • Enable the app to return the chunks • Run the test (regularly)

Which metrics can be used? QAware | 48 https://freeplay.ai/blog/using-llms-to-automatically-evaluate-rag-prompts-pipelines •
Faithfulness How much halluzination? • Context Relevance How good was the retrieval? • Answer Relevance How good was the answer in relation to the question? • … There are many more

Create a test set QAware | 49 Test set content
• Question A question for the app to answer • Answer The answer the app’s LLM returned for the question • Contexts The used chunks given to the app’s LLM • Ground truth The correct answer for the question

Example test data QAware | 50 data_samples = { 'question':
[ 'When was the first super bowl?', 'Who won the most super bowls?'], 'answer': [ 'The first superbowl was held on January 15, 1967', 'The most super bowls have been won by The New England Patriots'], 'contexts' : [ ['The Super Bowl....season since 1966,','replacing the NFL...in February.'], ['The Green Bay Packers...Green Bay, Wisconsin.','The Packers compete...Football Conference']], 'ground_truth': [ 'The first superbowl was held on January 15, 1967', 'The New England Patriots have won the Super Bowl a record six times'] } https://docs.ragas.io/en/stable/howtos/applications/data_preparation.html

Flow when testing QAware | 51

Frameworks QAware | 52 • RAGAS https://docs.ragas.io Framework to build
your own tests • promptfoo https://www.promptfoo.dev UI driven all-in application

The perfect search

It does exists (... or not)! QAware | 54 Well…
only if there is enough time and money. So it is an academically solved issue! Based on big context LLMs and advanced RAG technology the perfect search exists.

Wrap up

QAware | 56 Bernd chose to use the full document
approach and is happy now. It was easy to use and it worked ad-hoc.

Key learnings • In the long term, RAG or similar
technologies will revolutionise the way we search and research information. • Retrieval is a difficult topic.Especially semantic search is difficult to implement in practice. • Research is strong. With RAG, the solution space has exploded. But most of the Advanced techniques won’t survive. • Following and evaluating the innovations is a full-time job.

qaware.de QAware GmbH Mainz Rheinstraße 4 C 55116 Mainz Tel.
+49 6131 21569-0 [email protected] twitter.com/qaware linkedin.com/company/qaware-gmbh xing.com/companies/qawaregmbh slideshare.net/qaware github.com/qaware

AI’s Secret Weapon: Turning Documents into Know...

AI’s Secret Weapon: Turning Documents into Knowledge (CWYD)

More Decks by Alexander Eimer

Other Decks in Technology

Featured

Transcript