Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Finding the needle: a deep dive into the rewriting of Haystack

Finding the needle: a deep dive into the rewriting of Haystack

Haystack is an open-source framework. With Haystack you can compose various NLP tools to build applications, with a particular focus on Large Language Models. Haystack was built before the “ChatGPT revolution”. Same as many others in this industry, we had to question all the existing assumptions in order to adapt, and we had to do it fast.

In this talk, we'll explore the motivations behind the refactoring, the challenges we faced, and the outcomes achieved through this intensive process. From rethinking many of the original abstractions, all the way up to growing a vibrant community of users and contributors, we’ll share the key strategies and techniques employed during this journey.

Whether you're a seasoned open-source contributor or a curious enthusiast, this talk promises to uncover valuable insights and lessons learned from the evolution of Haystack.

Massimiliano Pippi

June 10, 2024

More Decks by Massimiliano Pippi

Other Decks in Programming


  1. What is Haystack? Haystack is an open-source framework for building

    production-ready AI applications It lets you quickly try out the latest AI models while being flexible and easy to use Think chatbots, document search, question-answering TL;DR: Anything that makes use of modern language models and AI techniques..
  2. Question Answering Given the documents, answer the question. Documents: {{

    documents}} Question: {{ question }} Summarization Summarize the following text. Text: {{ text}} Question Generation Given the following document, generate some questions Document: {{ document}} Question:
  3. Text Embeddings Retrieval Vector Databases Prompting & LLMs Maybe some

    classifiers.. Maybe reranking..? .....and on and on and on.. To Build AI Applications
  4. Haystack was first officially released in 2020 (read: before the

    LLM boom) Semantic search, extractive QA, retrieval, even preparing and writing documents into a vector database Components (nodes) and pipelines It allowed users to combine their desired language models with their data source 👍 The pipeline-component structure is a great abstraction for building composable LLM application 👎 There were some assumptions that were made, as well as some design mishaps A little bit of History
  5. Haystack 1.x 1. 2. 3. 4. We lived, we learned,

    we evolved 💪 Special-General Mixture Overexposure Information Leakage Conjoined Components
  6. Component Component Component Pipeline Haystack 2.0 Pipelines & Components ✅

    A component is responsible of ONE. THING. ONLY. Descriptive components UNIFORM INTERFACE Pipelines that are able to branch and cycle 📚 Announced March 11, 2024
  7. A simple interface for the creation of components. User defined

    number of inputs and outputs. Components 📚 Announced March 11, 2024
  8. A simple interface for the creation of components. User defined

    number of inputs and outputs. Components Query documents 📚 Announced March 11, 2024
  9. 📍 Ballroom A Everything is a graph, including LLM Applications

    (and that’s handy) 📆 May 18th, 1:30-2:15PM @TUANACELIK /IN/TUANACELIK Tuana Çelik TUANACELIK
  10. Exhibit Hall A, Booth 330 📍Find us at 🥂 AI

    Happy Hour with DataStax 📍The Standard Market and Pint House 📆 May17th, 6PM