Rag from the trenches

Bringing your PoC RAG down to the production (bumpy) road.
RAG from the trenches

Follow us on LinkedIn linkedin.com/comp a ny/neosperience-he a lth

everyone knows what a RAG is.

Thank you.

everyone knows what a RAG is. does ?

Retrieval Augmented Generation

Bounding RAG to your data RAG does LLM “grounding” -
Provide relevant data to the application - Secure private data (compliance, con fi dentiality) - Gemini has 1M token context, so it’s all done? - 1M can still be short - the haystack e ff ect  (https://arxiv.org/html/2407.01370v1)

embeddings

How to split your data into pieces? Chunking • fi
xed-size chunking (with overlapping) • Semantic chunking • Language Model-Based chunking

Optimized embeddings for various task types, such as document retrieval,
question and answer, and fact veri fi cation. Task types are labels that optimize the embeddings that the model generates based on your intended use case. - textembedding-gecko@003 - text-embedding-004 - text-multilingual-embedding-002 Task types can improve the quality of embeddings generated by an embeddings model. Task types

retrieval

Use both keyword and semantic search Search the knowledge base
• Vector search captures the semantic meaning of the query • Keyword search identi fi es exact matches for speci fi c terms • Enhances RAG in domain-speci fi c applications (medicine, legal, …) • Improves accuracy of standard RAG increasing focus

Embeddings and Metadata stored alongside MongoDB Atlas Vector Search •
Uni fi ed Query Interface • Store vector embeddings alongside the original data and metadata in Atlas • Instant synchronization • Data isolation by design within the same organization

Perform hybrid queries MongoDB Atlas Vector Search • Create an
Atlas Vector Search Index on embeddings fi eld • Store vector embeddings in MongoDB Atlas • Supports popular frameworks: • LangChain • LlamaIndex • Semantic Kernel • Haystack • Spring AI • Use pipelines to build complex queries • $vectorSearch must be the fi rst stage   of any pipeline where it appears.

Enhance RAG accuracy with a traditional fi ltering step Filtering
• Use a small and fast model to extract metadata • Gemma 2B performs fast metadata extraction from documents • Filtering results based on metadata (pre and post) • Parse incoming queries and match them against labeled metadata

How to generate metadata for fi ltering? Metadata Extraction •
GLiNER Generalized Linear Named Entity Recognizer) to tag and label chunks to either label and fi lter out not relevant tokens (not-matching chunks are not generated) • Leverage Gemini 1.5 Pro capabilities • build a document summary • multi-modal processing   to map spatial meaning

Enhance RAG accuracy with a traditional fi ltering step Query
rewriting • Query tagging and fi ltering • GLiNER • Rewrite-Retrieve-Read • Gemma 2B • Trainable Rewrite-Retrieve-Read • DSPy

Fine-tune text-embeddings • Domain-speci fi c models (legal, med, etc.)
• Selecting Appropriate Models (not every model are equals) • Fine-Tuning with Positive and Negative Object Pairs

Remove irrelevant information Autocut • Retrieve and score similarity •
Identify and Cut O f

Enhance RAG accuracy with a traditional fi ltering step Re-ranking
• Overfetch • Apply re-ranker model • reorder results

putting all togheter

Thank you. now it’s true!

Rag from the trenches

Rag from the trenches

Aletheia

More Decks by Aletheia

Other Decks in Technology

Featured

Transcript

Bringing your PoC RAG down to the production (bumpy) road.

Follow us on LinkedIn linkedin.com/comp a ny/neosperience-he a lth

everyone knows what a RAG is.

Thank you.

everyone knows what a RAG is. does ?

Retrieval Augmented Generation

Bounding RAG to your data RAG does LLM “grounding” -

embeddings

How to split your data into pieces? Chunking • fi

Optimized embeddings for various task types, such as document retrieval,

retrieval

Use both keyword and semantic search Search the knowledge base

Embeddings and Metadata stored alongside MongoDB Atlas Vector Search •

Perform hybrid queries MongoDB Atlas Vector Search • Create an

Enhance RAG accuracy with a traditional fi ltering step Filtering

How to generate metadata for fi ltering? Metadata Extraction •

Enhance RAG accuracy with a traditional fi ltering step Query

Fine-tune text-embeddings • Domain-speci fi c models (legal, med, etc.)

Remove irrelevant information Autocut • Retrieve and score similarity •

Enhance RAG accuracy with a traditional fi ltering step Re-ranking

putting all togheter

Thank you. now it’s true!