Building a Petabyte-Scale Vector Store

©2023 DataStax. – All rights reserved  Building a Petabyte-scale Vector
Store Cédrick Lunven  Software Engineer, Developer Advocate   Developer Tools & Generative AI, DataStax   

©2023 DataStax. – All rights reserved  Cédrick Lunven clunven clunven
clun ❖ Trainer  ❖ Public Speaker    ❖ Developer Tooling (sdk, cli)  ❖ Developer Apps  ❖ Team Lead  ❖ Creator of ﬀ4j (ﬀ4j.org)  ❖ Helping with Langchain4j  Software Engineer

©2023 DataStax. – All rights reserved  Agenda 3 Retrieval Augmented
Generation (RAG) principles  Apache Cassandra™ as a Vector Database  Introduction to the jVector  Integrating the ecosystem  Road to Real-Time generative AI 

©2023 DataStax. – All rights reserved  What is an LLM
? 5 

©2023 DataStax. – All rights reserved  Are (effective) prompts good
enough ? 7  Prompt Context: I am XXX i want to YYY in order to ZZZ (Objectives) Tasks: I want you to create this and that because… Roles and Persona: You are an assistant, We are targeting AI developers Constraints: Format, Style, Must have, Forbidden, if you do not know, say you do not know. LLM Chat completion API Rest API Conversation Agent API Rest API

©2023 DataStax. – All rights reserved  Aside: LLM parameters 8 
• Temperature  Controls “creativity” of generations.  Temp = 0, always use highest probability next token  Temp = 1 max creativity    • Top K  Limit selection of the next token to the top K matches by probability    • Top P  Select tokens based on the sum of their probabilities 

©2023 DataStax. – All rights reserved  Few Shot Learning 9 
Providing LLMs with a small number of examples is enough for them to learn speciﬁc tasks.

©2023 DataStax. – All rights reserved  Chain of Thought (CoT)
10  LLMs can reason! Jan 2022

©2023 DataStax. – All rights reserved  Self Consistency 11  How
does it work in practice?    • Task needs to have a correct answer  • Diﬀerent reasoning paths need to be explored  • Eﬀectively achieved by increasing temperature, top p, and top k parameters with LLMs  • Need to sample 5-20 reasoning paths  Sampling multiple reasoning paths can produce better results. March 2022

©2023 DataStax. – All rights reserved  Least to most prompting
12  LLMs can solve more simple problems and combine them to get a ﬁnal solution. May 2022

©2023 DataStax. – All rights reserved  Research and revise 13 
Ask the LLM to review the generation, retrieve documents to verify facts, and make edits. Oct 2022

©2023 DataStax. – All rights reserved  Lost in the middle
14  Relevant information in the middle of long input contexts gets ignored. July 2023

©2023 DataStax. – All rights reserved  Limitations of “LLM only”
mode 16  LLM Chat completion API Rest API Conversation Agent API Rest API • LLM …..can be outdated • LLM …..Does not know your data • LLM …..is not tuned = hard steerability • LLM …..Hallucinating if not properly prompted • LLM …..works with limited Input windows (tokens)

©2023 DataStax. – All rights reserved  Retrieval-Augmented Generation 17  Prompt
Context: I am XXX i want to YYY in order to ZZZ (Objectives) Tasks: I want you to create this and that because… Roles and Persona: You are an assistant, We are targeting AI developers Constraints: Format, Style, Must have, Forbidden, if you do not know, say you do not know. LLM Chat completion API Rest API Conversation Agent API Rest API Unstructured Data, issued from SEMANTIC SEARCH New stuff, piece of text than will give more information to the LLM to specialize the response

©2023 DataStax. – All rights reserved  18  LLM Chat completion
API Rest API Sentence Transformer Embeddings API Data Ingestion: Vectorization Your DATA Your Website Vector Space PROMPT vector  Conversation Agent API Rest API DOCUMENT SPLITTER

©2023 DataStax. – All rights reserved  Vector DATABASE 19  LLM
Chat completion API Rest API Sentence Transformer Embeddings API Semantic Search Your DATA Your Website Vector Space PROMPT vector  SIMILARITY SEARCH (ANN) Conversation Agent API Rest API DOCUMENT SPLITTER

©2023 DataStax. – All rights reserved  Vector DATABASE 20  LLM
Chat completion API Rest API Sentence Transformer Embeddings API Retrieval-Augmented Generation Your DATA Your Website Vector Space PROMPT vector  SIMILARITY SEARCH PROMPT+RAG ANSWER Conversation Agent API Rest API DOCUMENT SPLITTER

©2023 DataStax. – All rights reserved  Retrieval and generation are
not the same 21  Seems obvious, but… Standard RAG examples generally treat both steps exactly the same. Retrieval: • Important to find the correct/best matching documents • Varying size of embedded text will affect how documents cluster in the embedding space Generation: • Models support large context windows • Are good at summarizing, extracting facts and data from long documents

©2023 DataStax. – All rights reserved  Some genAI apps only
need retrieval 22  Shop App

©2023 DataStax. – All rights reserved  FLARE: Forward-Looking Active REtrieval
23  23  Query Knowledge base for question (Vector Search)  1 Extract more questions from knowledge base (LLM)  2 Query knowledge base with all questions   (Vector Search)  3 Summarize and check answer   (LLM)  4 Repeat 1–4 until answer looks good  23  https://arxiv.org/abs/2305.06983

©2023 DataStax. – All rights reserved  Vector Database ? 24 
Vector DATABASE Vector Space SIMILARITY SEARCH • Handling A LOT of vector (chunking)  • Eﬀective Search Algorithms  • Performant, Resilient  • Dynamic (vector sizes)    • Meta Data Filtering  • Keyword search  • Semantic Caching  • Chat History  • Key Value cache 

©2023 DataStax. – All rights reserved  Apache Cassandra™ as a
vector database 25 

©2023 DataStax. – All rights reserved  Apache Cassandra® Undisputed Leader
of Scale and Reliability 

©2023 DataStax. – All rights reserved  Nosql Distributed database 1
Installation = 1 NODE ✔ Capacity = ~ 2-4TB ✔ Throughput = LOTS Tx/sec/core Communication: ✔ Gossiping ✔ No Master (peer-to-peer) DataCenter (DC) | Ring

©2023 DataStax. – All rights reserved  network sensor temperature forest
f001 92 forest f002 88 volcano v001 210 sea s001 45 sea s002 50 home h001 72 road r001 105 road r002 110 ice i001 35 car c001 69 dog d001 40 car c002 70 sensors_by_network Partition Key Primary Key Nosql Distributed database

©2023 DataStax. – All rights reserved  forest f001 92 f002
88 sea s001 45 s002 50 c002 70 car c001 69 volcano v001 210 home h001 72 road r002 110 r001 105 ice i001 35 dog d001 40 Distributed

©2023 DataStax. – All rights reserved  Apache Cassandra High Availability
Always On Every second of downtime translates into lost revenue   Linear Scalability Hyper Scalability Millions of operations per day, hour, or second  Global Distribution Data Everywhere On-premises, hybrid, multi-cloud, centralized, or edge   Low Latency Faster Pace Every millisecond of latency has consequence  

©2023 DataStax. – All rights reserved  • Scale-Out Capabilities: No
upper limits  • Garbage Collection: Pruning obsolete index information  • Eﬀective Use of Disk: Enabling high throughput  • Composability: Predicates, term-based searches. Aka Hybrid Search  • Concurrency: Non-blocking, multi-threaded index construction  https://thenewstack.io/5-hard-problems-in-vector-search-and-how-cassandra-solves-them/ 5 Hard Problems We’re Solving 31 

©2023 DataStax. – All rights reserved  CREATE TABLE IF NOT
EXISTS vsearch.products ( id int PRIMARY KEY, name TEXT, description TEXT, item_vector VECTOR<FLOAT, 5> //5-dimensional embedding ); New Data Model 33 

©2023 DataStax. – All rights reserved  CREATE CUSTOM INDEX IF
NOT EXISTS ann_index ON vsearch.products(item_vector) USING 'StorageAttachedIndex'; Creating a Vector Search Index 34 

©2023 DataStax. – All rights reserved  SELECT * FROM vsearch.products
ORDER BY item_vector ANN OF [0.15, 0.1, 0.1, 0.35, 0.55] LIMIT 1; id | description | item_vector | name ----+-------------------------------------------------+-----------------------------+--------------------- 5 | A deep learning display that controls your mood | [0.1, 0.05, 0.08, 0.3, 0.6] | Vision Vector Frame Searching for Neighbors 35 

©2023 DataStax. – All rights reserved  ©2023 DataStax. – All
rights reserved  Composes with partitioning  SELECT * FROM demo WHERE partition_id = ? ORDER BY embedding ANN OF ? LIMIT 100  Composes with other   SAI indexes  SELECT * FROM demo WHERE (c1 = ? AND c2 = ?) OR c3 = ? ORDER BY embedding ANN OF ? LIMIT 5  Global ANN   everywhere  SELECT * FROM demo ORDER BY embedding ANN OF ? LIMIT 10  Integration 36 

©2023 DataStax. – All rights reserved  Build around LLM &
embeddings 45  LLM Embeddings Your GenAI application  quite a lot of logic to handle all this …  Prompt templating  Memory of the past/context mgmt  Domain-speciﬁc knowledge  Retrieval-Augmented Generation  Caching  Prompt versioning/mgmt  Storage (vector or otherwise)  Reranking (e.g. MMR)  Agents  . . .  Doc ingestion  "Chain of thought"  Any Any

©2023 DataStax. – All rights reserved  ©2023 DataStax. – All
rights reserved  LangChain  Messy, bloated: but it's everywhere  Broader coverage of everything around LLMs  "must know"  Top Frameworks (Python ecosystem) 46  LlamaIndex  More considerate in its growth  Mainly data ingestion and storage/retrieval  "should know"  Semantic Kernel  Still a lesser player  Auto-planner feature  Possibly the best-structured  "Sem… what?"  …  Probably not worth checking right now  current Github star count ~ 60k TILs mentioning framework 27 GH forks, StackOverflow tell the same story

©2023 DataStax. – All rights reserved  Introducing Vector Search with
AI Apps Ai Powered Application ChatBot | AI Assistant | Copilot Text Generation Search Engine Multi Model Similarity Search ChatMemory Semantic Cache Meta-Data Filtering KV Cache Prompt Template RAG LLM History 🦜🔗Langchain 🦙 Llama index Semantic Kernel vector vector<float, X> metadata_s map<text, text> body_blob text keys Cassandra Use Cases Queries Tables & Indexes LLM Chat completion API Rest API Conversation Agent API Rest API Embeddings API Rest API

©2023 DataStax. – All rights reserved  CREATE TABLE $name (
row_id text PRIMARY KEY, attributes_blob text, body_blob text, metadata_s map<text, text>, vector vector<float, $dim> ) CREATE TABLE $name ( partition_id text, row_id text, attributes_blob text, body_blob text, metadata_s map<text, text>, vector vector<float, 1536>, PRIMARY KEY (partition_id, row_id) ) CREATE TABLE BaseTable ( row_id text PRIMARY KEY, body_blob text, metadata_s map<text, text>, ) LLM Providers Embeddings API Rest API Chat completion API Rest API Prompt Template Map data into prompts PlainCassandraTable VectorStore Store embeddings as knowledge for LLM MetadataVectorCassandraTable VectorStore + Partition aware Store embeddings as knowledge for LLM ClusteredMetadataVectorCassandraTable create put search create put search connection VectorStore ChatMemory Semantic Cache Meta-Data Filtering KV Cache 🦜🔗Langchain adapters adapters adapters CASSIO: Dedicated models for your Queries

©2023 DataStax. – All rights reserved  GITPOD My First Application
Genai-Demo Text Generation Astra DB VECTOR OpenAi LLM Providers Embeddings API Rest API Chat completion API Rest API OpenAI JAVA CLIENT com.theokanning.openai-gpt3-java CREATE TABLE philosophers ( partition_id text, row_id text, attributes_blob text, body_blob text, metadata_s map<text, text>, vector vector<float, 1536>, PRIMARY KEY (partition_id, row_id) ) astra-sdk-vector MetadataVectorCassandraTable Cassandra Driers Module Vector

©2023 DataStax. – All rights reserved  GenAI Application My Second
Application Genai-Demo-week2 Text Generation Astra DB VECTOR CREATE TABLE philosophers ( partition_id text, row_id text, attributes_blob text, body_blob text, metadata_s map<text, text>, vector vector<float, 1536>, PRIMARY KEY (partition_id, row_id) ) Cassandra Drivers Module Vector VertexAI LLM Provider Embeddings API Rest API Chat completion API Rest API langchain4j-cassandra Support of SDK Vector Langchain4j astra-sdk-vector MetadataVectorCassandraTable Langchain4j-vertex-ai

©2023 DataStax. – All rights reserved  What is LangStream? 53 
What is it ?  • Framework for developing generative AI applications  • Runtime environment for generative AI applications   • Data integration platform to bring relevant and recent data to gen AI  • Powered by proven technology: Kubernetes, Kafka, Kafka Connect   

©2023 DataStax. – All rights reserved  LangStream 54  Kubernetes Kafka
Agent chat completion Vector DB AI Services Agent OSS embedding LangStream LangStream Operator  DB Control Pane and API WebSock et Gateway

Building a Petabyte-Scale Vector Store

Building a Petabyte-Scale Vector Store

More Decks by Cedrick Lunven

Other Decks in Programming

Featured

Transcript