Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Rag from the trenches

Aletheia
October 15, 2024

Rag from the trenches

Aletheia

October 15, 2024
Tweet

More Decks by Aletheia

Other Decks in Technology

Transcript

  1. Bounding RAG to your data RAG does LLM “grounding” -

    Provide relevant data to the application - Secure private data (compliance, con fi dentiality) - Gemini has 1M token context, so it’s all done? - 1M can still be short - the haystack e ff ect
 (https://arxiv.org/html/2407.01370v1)
  2. How to split your data into pieces? Chunking • fi

    xed-size chunking (with overlapping) • Semantic chunking • Language Model-Based chunking
  3. Optimized embeddings for various task types, such as document retrieval,

    question and answer, and fact veri fi cation. Task types are labels that optimize the embeddings that the model generates based on your intended use case. - textembedding-gecko@003 - text-embedding-004 - text-multilingual-embedding-002 Task types can improve the quality of embeddings generated by an embeddings model. Task types
  4. Use both keyword and semantic search Search the knowledge base

    • Vector search captures the semantic meaning of the query • Keyword search identi fi es exact matches for speci fi c terms • Enhances RAG in domain-speci fi c applications (medicine, legal, …) • Improves accuracy of standard RAG increasing focus
  5. Embeddings and Metadata stored alongside MongoDB Atlas Vector Search •

    Uni fi ed Query Interface • Store vector embeddings alongside the original data and metadata in Atlas • Instant synchronization • Data isolation by design within the same organization
  6. Perform hybrid queries MongoDB Atlas Vector Search • Create an

    Atlas Vector Search Index on embeddings fi eld • Store vector embeddings in MongoDB Atlas • Supports popular frameworks: • LangChain • LlamaIndex • Semantic Kernel • Haystack • Spring AI • Use pipelines to build complex queries • $vectorSearch must be the fi rst stage 
 of any pipeline where it appears.
  7. Enhance RAG accuracy with a traditional fi ltering step Filtering

    • Use a small and fast model to extract metadata • Gemma 2B performs fast metadata extraction from documents • Filtering results based on metadata (pre and post) • Parse incoming queries and match them against labeled metadata
  8. How to generate metadata for fi ltering? Metadata Extraction •

    GLiNER Generalized Linear Named Entity Recognizer) to tag and label chunks to either label and fi lter out not relevant tokens (not-matching chunks are not generated) • Leverage Gemini 1.5 Pro capabilities • build a document summary • multi-modal processing 
 to map spatial meaning
  9. Enhance RAG accuracy with a traditional fi ltering step Query

    rewriting • Query tagging and fi ltering • GLiNER • Rewrite-Retrieve-Read • Gemma 2B • Trainable Rewrite-Retrieve-Read • DSPy
  10. Fine-tune text-embeddings • Domain-speci fi c models (legal, med, etc.)

    • Selecting Appropriate Models (not every model are equals) • Fine-Tuning with Positive and Negative Object Pairs
  11. Enhance RAG accuracy with a traditional fi ltering step Re-ranking

    • Overfetch • Apply re-ranker model • reorder results