From Hybrid Retrieval to RAG With Haystack and OpenSearch

From Hybrid Retrieval to RAG With Haystack and OpenSearch Open
NLP Meetup Bilge Yücel & David Tippett October 19th, 2023

David Tippett Senior Developer Advocate AWS • Twitter: @dtaivpp •
Linkedin: David Tippett • GitHub: @dtaivpp Bilge Yücel Developer Advocate deepset • Twitter: @bilgeycl • Linkedin: Bilge Yucel • GitHub: @bilgeyucel

Agenda 1 Search Applications and Retrieval 2 Keyword-based vs Semantic
Search 3 Hybrid Retrieval with Haystack and OpenSearch 4 Hybrid Retrieval to RAG 5 Q&A

Search and Retrieval

Documents Documents Search and Retrieval

Documents Documents Search Applications Find the right document - Document
Search

Documents Documents Search Applications Find the right document - Document
Search Answer questions - Extractive QA - Generative QA

Document Retrieval

Documents Document Retrieval

Documents Document Retrieval 1. Inference takes time. Relevant search results
reduces the amount of content a model has to look through 2. Relevant search results provides LLMs with the relevant context with which to generate an answer

What is Haystack? • Fully open-source framework built in Python
for custom LLM applications • Provides tools that developers need to build state-of-the-art NLP systems • Building blocks: Pipelines & Components

What is Haystack? • Fully open-source framework built in Python
for custom LLM applications

So what is OpenSearch? data cluster manager coordinating data dashboards
data OpenSearch Cluster

OpenSearch’s Community 10,000+ Stars 320M+ Downloads 22,700+ Pull Requests 500+
Contributors

Documents What kind of retrieval should you use? 🦄 What
kind of queries do you expect? 🦄 What kind of data do you have?

Documents Keyword Search vs Semantic Search: Wild West Documents Source:
https://opensearch.org/blog/semantic-search-solutions/

Keyword-based Search vs Semantic Retrieval • Keyword-based Retrieval ◦ Sparse
Vectors ◦ TF-IDF, BM25 ◦ “Exact” match ◦ Fast, lightweight, effective • Semantic Retrieval ◦ Dense Embeddings ◦ Trained models: OpenAI, Cohere, Sentence Transformers… ◦ Captures semantic similarity

Examples: Haystack Tutorials

Keyword-based vs Semantic Keyword-based Semantic

What if I have a domain specific data and want
better performance? 🤔 Train your own embedding model ⭐ Hybrid Retrieval ⭐

What is Hybrid Retrieval? • It’s the combination of sparse
and dense retrievers: Hybrid Retrieval Keyword-based Search Semantic Search

Hybrid Retrieval on Haystack Website

Indexing Pipeline

Hybrid Retrieval Pipeline

Hybrid Retrieval Pipeline: query clauses Using OpenSearch, we can set
custom fields for the BM25Retriever: • fields • fuzziness • operator

Recap 1 Search Applications and Retrieval ✅ 2 Keyword-based vs
Semantic Search ✅ 3 Hybrid Retrieval with Haystack and OpenSearch ✅ 4 Hybrid Retrieval to RAG 5 Q&A

What is Retrieval Augmented Generation (RAG)?

RAG Pipeline with Hybrid Retrieval

RAG Pipeline with Hybrid Retrieval Hybrid retrieval

RAG Pipeline with Hybrid Retrieval • Connection to LLMs •
Receives input • Sends a prompt • Returns a response

OpenSearch Roadmap 2.9 Released: July 24, 2023 - Pre-Filtering in
FAISS - ML Connectors - Search Pipelines - Byte Sized Vectors 2.10 Released: Sept 25, 2023 - Conversational Memory - RAG Pipelines - Hybrid Search 2.11 Released: Oct 18, 2023 - Multi-model Search - Neural Search Default Models - Model Statistics API - Sparse Retrieval

46 Resources and where to find us Follow 👇 Haystack
on Twitter Discover 👇 What’s coming in Haystack 2.0 Join 👇 Haystack Community on Discord

47 Bilge Yücel Bilge Yücel

RAG Pipeline with Hybrid Retrieval • SentenceTransformersRanker • DiversityRanker •
LostInTheMiddleRanker • RecentnessRanker • CohereRanker

DiversityRanker Hybrid Retrieval

DiversityRanker Context Window Hybrid Retrieval

DiversityRanker Diversity Ranker Context Window Context Window Hybrid Retrieval

DiversityRanker Diversity Ranker Context Window Context Window Retrieval

From Hybrid Retrieval to RAG With Haystack and ...

From Hybrid Retrieval to RAG With Haystack and OpenSearch

More Decks by Bilge Yücel

Other Decks in Programming

Featured

Transcript