Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building a Petabyte-Scale Vector Store

Building a Petabyte-Scale Vector Store

A laptop proof of concept won’t cut it for this impending era of generative AI. Let’s dig into the mechanics of building and using a petabyte-scale vector store and the future of handling data in generative AI models. This talk will focus on the work in the Apache Cassandra® project to develop a vector store capable of handling petabytes of data, discussing why this capacity is critical for future AI applications. I will also connect how this pertains to the exciting new generation of AI technologies like Improved Large Language Models (LLMs), Retrieval-Augmented Generation (RAG), and Forward-Looking Active Retrieval Augmented Generation (FLARE) that all contribute to the growing need for such scalable solutions. Finally, we’ll discuss the importance of planning for future scalability and how to effectively manage AI agents in this new age of data.

Cedrick Lunven

October 20, 2023
Tweet

Other Decks in Programming

Transcript

  1. ©2023 DataStax. – All rights reserved
 Building a Petabyte-scale Vector

    Store Cédrick Lunven
 Software Engineer, Developer Advocate 
 Developer Tools & Generative AI, DataStax
 

  2. ©2023 DataStax. – All rights reserved
 Cédrick Lunven clunven clunven

    clun ❖ Trainer
 ❖ Public Speaker
 
 ❖ Developer Tooling (sdk, cli)
 ❖ Developer Apps
 ❖ Team Lead
 ❖ Creator of ff4j (ff4j.org)
 ❖ Helping with Langchain4j
 Software Engineer
  3. ©2023 DataStax. – All rights reserved
 Agenda 3 Retrieval Augmented

    Generation (RAG) principles
 Apache Cassandra™ as a Vector Database
 Introduction to the jVector
 Integrating the ecosystem
 Road to Real-Time generative AI

  4. ©2023 DataStax. – All rights reserved
 Are (effective) prompts good

    enough ? 7
 Prompt Context: I am XXX i want to YYY in order to ZZZ (Objectives) Tasks: I want you to create this and that because… Roles and Persona: You are an assistant, We are targeting AI developers Constraints: Format, Style, Must have, Forbidden, if you do not know, say you do not know. LLM Chat completion API Rest API Conversation Agent API Rest API
  5. ©2023 DataStax. – All rights reserved
 Aside: LLM parameters 8


    • Temperature
 Controls “creativity” of generations.
 Temp = 0, always use highest probability next token
 Temp = 1 max creativity
 
 • Top K
 Limit selection of the next token to the top K matches by probability
 
 • Top P
 Select tokens based on the sum of their probabilities

  6. ©2023 DataStax. – All rights reserved
 Few Shot Learning 9


    Providing LLMs with a small number of examples is enough for them to learn specific tasks.
  7. ©2023 DataStax. – All rights reserved
 Self Consistency 11
 How

    does it work in practice?
 
 • Task needs to have a correct answer
 • Different reasoning paths need to be explored
 • Effectively achieved by increasing temperature, top p, and top k parameters with LLMs
 • Need to sample 5-20 reasoning paths
 Sampling multiple reasoning paths can produce better results. March 2022
  8. ©2023 DataStax. – All rights reserved
 Least to most prompting

    12
 LLMs can solve more simple problems and combine them to get a final solution. May 2022
  9. ©2023 DataStax. – All rights reserved
 Research and revise 13


    Ask the LLM to review the generation, retrieve documents to verify facts, and make edits. Oct 2022
  10. ©2023 DataStax. – All rights reserved
 Lost in the middle

    14
 Relevant information in the middle of long input contexts gets ignored. July 2023
  11. ©2023 DataStax. – All rights reserved
 Limitations of “LLM only”

    mode 16
 LLM Chat completion API Rest API Conversation Agent API Rest API • LLM …..can be outdated • LLM …..Does not know your data • LLM …..is not tuned = hard steerability • LLM …..Hallucinating if not properly prompted • LLM …..works with limited Input windows (tokens)
  12. ©2023 DataStax. – All rights reserved
 Retrieval-Augmented Generation 17
 Prompt

    Context: I am XXX i want to YYY in order to ZZZ (Objectives) Tasks: I want you to create this and that because… Roles and Persona: You are an assistant, We are targeting AI developers Constraints: Format, Style, Must have, Forbidden, if you do not know, say you do not know. LLM Chat completion API Rest API Conversation Agent API Rest API Unstructured Data, issued from SEMANTIC SEARCH New stuff, piece of text than will give more information to the LLM to specialize the response
  13. ©2023 DataStax. – All rights reserved
 18
 LLM Chat completion

    API Rest API Sentence Transformer Embeddings API Data Ingestion: Vectorization Your DATA Your Website Vector Space PROMPT vector
 Conversation Agent API Rest API DOCUMENT SPLITTER
  14. ©2023 DataStax. – All rights reserved
 Vector DATABASE 19
 LLM

    Chat completion API Rest API Sentence Transformer Embeddings API Semantic Search Your DATA Your Website Vector Space PROMPT vector
 SIMILARITY SEARCH (ANN) Conversation Agent API Rest API DOCUMENT SPLITTER
  15. ©2023 DataStax. – All rights reserved
 Vector DATABASE 20
 LLM

    Chat completion API Rest API Sentence Transformer Embeddings API Retrieval-Augmented Generation Your DATA Your Website Vector Space PROMPT vector
 SIMILARITY SEARCH PROMPT+RAG ANSWER Conversation Agent API Rest API DOCUMENT SPLITTER
  16. ©2023 DataStax. – All rights reserved
 Retrieval and generation are

    not the same 21
 Seems obvious, but… Standard RAG examples generally treat both steps exactly the same. Retrieval: • Important to find the correct/best matching documents • Varying size of embedded text will affect how documents cluster in the embedding space Generation: • Models support large context windows • Are good at summarizing, extracting facts and data from long documents
  17. ©2023 DataStax. – All rights reserved
 FLARE: Forward-Looking Active REtrieval

    23
 23
 Query Knowledge base for question (Vector Search)
 1 Extract more questions from knowledge base (LLM)
 2 Query knowledge base with all questions 
 (Vector Search)
 3 Summarize and check answer 
 (LLM)
 4 Repeat 1–4 until answer looks good
 23
 https://arxiv.org/abs/2305.06983
  18. ©2023 DataStax. – All rights reserved
 Vector Database ? 24


    Vector DATABASE Vector Space SIMILARITY SEARCH • Handling A LOT of vector (chunking)
 • Effective Search Algorithms
 • Performant, Resilient
 • Dynamic (vector sizes)
 
 • Meta Data Filtering
 • Keyword search
 • Semantic Caching
 • Chat History
 • Key Value cache

  19. ©2023 DataStax. – All rights reserved
 Nosql Distributed database 1

    Installation = 1 NODE ✔ Capacity = ~ 2-4TB ✔ Throughput = LOTS Tx/sec/core Communication: ✔ Gossiping ✔ No Master (peer-to-peer) DataCenter (DC) | Ring
  20. ©2023 DataStax. – All rights reserved
 network sensor temperature forest

    f001 92 forest f002 88 volcano v001 210 sea s001 45 sea s002 50 home h001 72 road r001 105 road r002 110 ice i001 35 car c001 69 dog d001 40 car c002 70 sensors_by_network Partition Key Primary Key Nosql Distributed database
  21. ©2023 DataStax. – All rights reserved
 forest f001 92 f002

    88 sea s001 45 s002 50 c002 70 car c001 69 volcano v001 210 home h001 72 road r002 110 r001 105 ice i001 35 dog d001 40 Distributed
  22. ©2023 DataStax. – All rights reserved
 Apache Cassandra High Availability

    Always On Every second of downtime translates into lost revenue 
 Linear Scalability Hyper Scalability Millions of operations per day, hour, or second
 Global Distribution Data Everywhere On-premises, hybrid, multi-cloud, centralized, or edge 
 Low Latency Faster Pace Every millisecond of latency has consequence 

  23. ©2023 DataStax. – All rights reserved
 • Scale-Out Capabilities: No

    upper limits
 • Garbage Collection: Pruning obsolete index information
 • Effective Use of Disk: Enabling high throughput
 • Composability: Predicates, term-based searches. Aka Hybrid Search
 • Concurrency: Non-blocking, multi-threaded index construction
 https://thenewstack.io/5-hard-problems-in-vector-search-and-how-cassandra-solves-them/ 5 Hard Problems We’re Solving 31

  24. ©2023 DataStax. – All rights reserved
 CREATE TABLE IF NOT

    EXISTS vsearch.products ( id int PRIMARY KEY, name TEXT, description TEXT, item_vector VECTOR<FLOAT, 5> //5-dimensional embedding ); New Data Model 33

  25. ©2023 DataStax. – All rights reserved
 CREATE CUSTOM INDEX IF

    NOT EXISTS ann_index ON vsearch.products(item_vector) USING 'StorageAttachedIndex'; Creating a Vector Search Index 34

  26. ©2023 DataStax. – All rights reserved
 SELECT * FROM vsearch.products

    ORDER BY item_vector ANN OF [0.15, 0.1, 0.1, 0.35, 0.55] LIMIT 1; id | description | item_vector | name ----+-------------------------------------------------+-----------------------------+--------------------- 5 | A deep learning display that controls your mood | [0.1, 0.05, 0.08, 0.3, 0.6] | Vision Vector Frame Searching for Neighbors 35

  27. ©2023 DataStax. – All rights reserved
 ©2023 DataStax. – All

    rights reserved
 Composes with partitioning
 SELECT * FROM demo WHERE partition_id = ? ORDER BY embedding ANN OF ? LIMIT 100
 Composes with other 
 SAI indexes
 SELECT * FROM demo WHERE (c1 = ? AND c2 = ?) OR c3 = ? ORDER BY embedding ANN OF ? LIMIT 5
 Global ANN 
 everywhere
 SELECT * FROM demo ORDER BY embedding ANN OF ? LIMIT 10
 Integration 36

  28. ©2023 DataStax. – All rights reserved
 HNSW - Hierarchical Navigable

    Small World 39
 As seen in:
 
 • Lucene
 ◦ Elastic
 ◦ Solr
 ◦ Mongo
 • Weaviate
 • Qdrant
 • Astra (June 2023)
 • …

  29. ©2023 DataStax. – All rights reserved
 Build around LLM &

    embeddings 45
 LLM Embeddings Your GenAI application
 quite a lot of logic to handle all this …
 Prompt templating
 Memory of the past/context mgmt
 Domain-specific knowledge
 Retrieval-Augmented Generation
 Caching
 Prompt versioning/mgmt
 Storage (vector or otherwise)
 Reranking (e.g. MMR)
 Agents
 . . .
 Doc ingestion
 "Chain of thought"
 Any Any
  30. ©2023 DataStax. – All rights reserved
 ©2023 DataStax. – All

    rights reserved
 LangChain
 Messy, bloated: but it's everywhere
 Broader coverage of everything around LLMs
 "must know"
 Top Frameworks (Python ecosystem) 46
 LlamaIndex
 More considerate in its growth
 Mainly data ingestion and storage/retrieval
 "should know"
 Semantic Kernel
 Still a lesser player
 Auto-planner feature
 Possibly the best-structured
 "Sem… what?"
 …
 Probably not worth checking right now
 current Github star count ~ 60k TILs mentioning framework 27 GH forks, StackOverflow tell the same story
  31. ©2023 DataStax. – All rights reserved
 Introducing Vector Search with

    AI Apps Ai Powered Application ChatBot | AI Assistant | Copilot Text Generation Search Engine Multi Model Similarity Search ChatMemory Semantic Cache Meta-Data Filtering KV Cache Prompt Template RAG LLM History 🦜🔗Langchain 🦙 Llama index Semantic Kernel vector vector<float, X> metadata_s map<text, text> body_blob text keys Cassandra Use Cases Queries Tables & Indexes LLM Chat completion API Rest API Conversation Agent API Rest API Embeddings API Rest API
  32. ©2023 DataStax. – All rights reserved
 CREATE TABLE $name (

    row_id text PRIMARY KEY, attributes_blob text, body_blob text, metadata_s map<text, text>, vector vector<float, $dim> ) CREATE TABLE $name ( partition_id text, row_id text, attributes_blob text, body_blob text, metadata_s map<text, text>, vector vector<float, 1536>, PRIMARY KEY (partition_id, row_id) ) CREATE TABLE BaseTable ( row_id text PRIMARY KEY, body_blob text, metadata_s map<text, text>, ) LLM Providers Embeddings API Rest API Chat completion API Rest API Prompt Template Map data into prompts PlainCassandraTable VectorStore Store embeddings as knowledge for LLM MetadataVectorCassandraTable VectorStore + Partition aware Store embeddings as knowledge for LLM ClusteredMetadataVectorCassandraTable create put search create put search connection VectorStore ChatMemory Semantic Cache Meta-Data Filtering KV Cache 🦜🔗Langchain adapters adapters adapters CASSIO: Dedicated models for your Queries
  33. ©2023 DataStax. – All rights reserved
 GITPOD My First Application

    Genai-Demo Text Generation Astra DB VECTOR OpenAi LLM Providers Embeddings API Rest API Chat completion API Rest API OpenAI JAVA CLIENT com.theokanning.openai-gpt3-java CREATE TABLE philosophers ( partition_id text, row_id text, attributes_blob text, body_blob text, metadata_s map<text, text>, vector vector<float, 1536>, PRIMARY KEY (partition_id, row_id) ) astra-sdk-vector MetadataVectorCassandraTable Cassandra Driers Module Vector
  34. ©2023 DataStax. – All rights reserved
 GenAI Application My Second

    Application Genai-Demo-week2 Text Generation Astra DB VECTOR CREATE TABLE philosophers ( partition_id text, row_id text, attributes_blob text, body_blob text, metadata_s map<text, text>, vector vector<float, 1536>, PRIMARY KEY (partition_id, row_id) ) Cassandra Drivers Module Vector VertexAI LLM Provider Embeddings API Rest API Chat completion API Rest API langchain4j-cassandra Support of SDK Vector Langchain4j astra-sdk-vector MetadataVectorCassandraTable Langchain4j-vertex-ai
  35. ©2023 DataStax. – All rights reserved
 What is LangStream? 53


    What is it ?
 • Framework for developing generative AI applications
 • Runtime environment for generative AI applications 
 • Data integration platform to bring relevant and recent data to gen AI
 • Powered by proven technology: Kubernetes, Kafka, Kafka Connect
 

  36. ©2023 DataStax. – All rights reserved
 LangStream 54
 Kubernetes Kafka

    Agent chat completion Vector DB AI Services Agent OSS embedding LangStream LangStream Operator
 DB Control Pane and API WebSock et Gateway