Upgrade to Pro — share decks privately, control downloads, hide ads and more …

From Hybrid Retrieval to RAG With Haystack and ...

From Hybrid Retrieval to RAG With Haystack and OpenSearch

Slide deck of the presentation from David Tippett form OpenSearch and Bilge Yucel from deepset at Open NLP Meetup https://www.meetup.com/open-nlp-meetup/events/296555345/

In their talk, David and Bilge explore the benefits of hybrid search, including hybrid search for retrieval augmented generative pipelines (RAG). They explain how they have made use of hybrid search technique on Haystack website itself to create a search application for tutorials by leveraging custom Haystack pipelines and OpenSearch. Additionally, they showcase building flexible search and LLM applications with open source tools and models.

Recoding of the session: https://youtu.be/Gn5SV8q887s?si=qVUIUCoA6w7M8URB

Bilge Yücel

October 19, 2023
Tweet

More Decks by Bilge Yücel

Other Decks in Programming

Transcript

  1. From Hybrid Retrieval to RAG With Haystack and OpenSearch Open

    NLP Meetup Bilge Yücel & David Tippett October 19th, 2023
  2. David Tippett Senior Developer Advocate AWS • Twitter: @dtaivpp •

    Linkedin: David Tippett • GitHub: @dtaivpp Bilge Yücel Developer Advocate deepset • Twitter: @bilgeycl • Linkedin: Bilge Yucel • GitHub: @bilgeyucel
  3. Agenda 1 Search Applications and Retrieval 2 Keyword-based vs Semantic

    Search 3 Hybrid Retrieval with Haystack and OpenSearch 4 Hybrid Retrieval to RAG 5 Q&A
  4. Documents Documents Search Applications Find the right document - Document

    Search Answer questions - Extractive QA - Generative QA
  5. Documents Document Retrieval 1. Inference takes time. Relevant search results

    reduces the amount of content a model has to look through 2. Relevant search results provides LLMs with the relevant context with which to generate an answer
  6. What is Haystack? • Fully open-source framework built in Python

    for custom LLM applications • Provides tools that developers need to build state-of-the-art NLP systems • Building blocks: Pipelines & Components
  7. Documents What kind of retrieval should you use? 🦄 What

    kind of queries do you expect? 🦄 What kind of data do you have?
  8. Documents Keyword Search vs Semantic Search: Wild West Documents Source:

    https://opensearch.org/blog/semantic-search-solutions/
  9. Keyword-based Search vs Semantic Retrieval • Keyword-based Retrieval ◦ Sparse

    Vectors ◦ TF-IDF, BM25 ◦ “Exact” match ◦ Fast, lightweight, effective • Semantic Retrieval ◦ Dense Embeddings ◦ Trained models: OpenAI, Cohere, Sentence Transformers… ◦ Captures semantic similarity
  10. What if I have a domain specific data and want

    better performance? 🤔 Train your own embedding model ⭐ Hybrid Retrieval ⭐
  11. What is Hybrid Retrieval? • It’s the combination of sparse

    and dense retrievers: Hybrid Retrieval Keyword-based Search Semantic Search
  12. Hybrid Retrieval Pipeline: query clauses Using OpenSearch, we can set

    custom fields for the BM25Retriever: • fields • fuzziness • operator
  13. Recap 1 Search Applications and Retrieval ✅ 2 Keyword-based vs

    Semantic Search ✅ 3 Hybrid Retrieval with Haystack and OpenSearch ✅ 4 Hybrid Retrieval to RAG 5 Q&A
  14. RAG Pipeline with Hybrid Retrieval • Connection to LLMs •

    Receives input • Sends a prompt • Returns a response
  15. OpenSearch Roadmap 2.9 Released: July 24, 2023 - Pre-Filtering in

    FAISS - ML Connectors - Search Pipelines - Byte Sized Vectors 2.10 Released: Sept 25, 2023 - Conversational Memory - RAG Pipelines - Hybrid Search 2.11 Released: Oct 18, 2023 - Multi-model Search - Neural Search Default Models - Model Statistics API - Sparse Retrieval
  16. 46 Resources and where to find us Follow 👇 Haystack

    on Twitter Discover 👇 What’s coming in Haystack 2.0 Join 👇 Haystack Community on Discord
  17. RAG Pipeline with Hybrid Retrieval • SentenceTransformersRanker • DiversityRanker •

    LostInTheMiddleRanker • RecentnessRanker • CohereRanker