Slide 1

Slide 1 text

©2023 DataStax. – All rights reserved
 Building a Petabyte-scale Vector Store Cédrick Lunven
 Software Engineer, Developer Advocate 
 Developer Tools & Generative AI, DataStax
 


Slide 2

Slide 2 text

©2023 DataStax. – All rights reserved
 Cédrick Lunven clunven clunven clun ❖ Trainer
 ❖ Public Speaker
 
 ❖ Developer Tooling (sdk, cli)
 ❖ Developer Apps
 ❖ Team Lead
 ❖ Creator of ff4j (ff4j.org)
 ❖ Helping with Langchain4j
 Software Engineer

Slide 3

Slide 3 text

©2023 DataStax. – All rights reserved
 Agenda 3 Retrieval Augmented Generation (RAG) principles
 Apache Cassandra™ as a Vector Database
 Introduction to the jVector
 Integrating the ecosystem
 Road to Real-Time generative AI


Slide 4

Slide 4 text

©2023 DataStax. – All rights reserved
 Retrieval Augmented Generation 4


Slide 5

Slide 5 text

©2023 DataStax. – All rights reserved
 What is an LLM ? 5


Slide 6

Slide 6 text

©2023 DataStax. – All rights reserved
 6


Slide 7

Slide 7 text

©2023 DataStax. – All rights reserved
 Are (effective) prompts good enough ? 7
 Prompt Context: I am XXX i want to YYY in order to ZZZ (Objectives) Tasks: I want you to create this and that because… Roles and Persona: You are an assistant, We are targeting AI developers Constraints: Format, Style, Must have, Forbidden, if you do not know, say you do not know. LLM Chat completion API Rest API Conversation Agent API Rest API

Slide 8

Slide 8 text

©2023 DataStax. – All rights reserved
 Aside: LLM parameters 8
 ● Temperature
 Controls “creativity” of generations.
 Temp = 0, always use highest probability next token
 Temp = 1 max creativity
 
 ● Top K
 Limit selection of the next token to the top K matches by probability
 
 ● Top P
 Select tokens based on the sum of their probabilities


Slide 9

Slide 9 text

©2023 DataStax. – All rights reserved
 Few Shot Learning 9
 Providing LLMs with a small number of examples is enough for them to learn specific tasks.

Slide 10

Slide 10 text

©2023 DataStax. – All rights reserved
 Chain of Thought (CoT) 10
 LLMs can reason! Jan 2022

Slide 11

Slide 11 text

©2023 DataStax. – All rights reserved
 Self Consistency 11
 How does it work in practice?
 
 ● Task needs to have a correct answer
 ● Different reasoning paths need to be explored
 ● Effectively achieved by increasing temperature, top p, and top k parameters with LLMs
 ● Need to sample 5-20 reasoning paths
 Sampling multiple reasoning paths can produce better results. March 2022

Slide 12

Slide 12 text

©2023 DataStax. – All rights reserved
 Least to most prompting 12
 LLMs can solve more simple problems and combine them to get a final solution. May 2022

Slide 13

Slide 13 text

©2023 DataStax. – All rights reserved
 Research and revise 13
 Ask the LLM to review the generation, retrieve documents to verify facts, and make edits. Oct 2022

Slide 14

Slide 14 text

©2023 DataStax. – All rights reserved
 Lost in the middle 14
 Relevant information in the middle of long input contexts gets ignored. July 2023

Slide 15

Slide 15 text

©2023 DataStax. – All rights reserved
 Not good enough….. 15


Slide 16

Slide 16 text

©2023 DataStax. – All rights reserved
 Limitations of “LLM only” mode 16
 LLM Chat completion API Rest API Conversation Agent API Rest API • LLM …..can be outdated • LLM …..Does not know your data • LLM …..is not tuned = hard steerability • LLM …..Hallucinating if not properly prompted • LLM …..works with limited Input windows (tokens)

Slide 17

Slide 17 text

©2023 DataStax. – All rights reserved
 Retrieval-Augmented Generation 17
 Prompt Context: I am XXX i want to YYY in order to ZZZ (Objectives) Tasks: I want you to create this and that because… Roles and Persona: You are an assistant, We are targeting AI developers Constraints: Format, Style, Must have, Forbidden, if you do not know, say you do not know. LLM Chat completion API Rest API Conversation Agent API Rest API Unstructured Data, issued from SEMANTIC SEARCH New stuff, piece of text than will give more information to the LLM to specialize the response

Slide 18

Slide 18 text

©2023 DataStax. – All rights reserved
 18
 LLM Chat completion API Rest API Sentence Transformer Embeddings API Data Ingestion: Vectorization Your DATA Your Website Vector Space PROMPT vector
 Conversation Agent API Rest API DOCUMENT SPLITTER

Slide 19

Slide 19 text

©2023 DataStax. – All rights reserved
 Vector DATABASE 19
 LLM Chat completion API Rest API Sentence Transformer Embeddings API Semantic Search Your DATA Your Website Vector Space PROMPT vector
 SIMILARITY SEARCH (ANN) Conversation Agent API Rest API DOCUMENT SPLITTER

Slide 20

Slide 20 text

©2023 DataStax. – All rights reserved
 Vector DATABASE 20
 LLM Chat completion API Rest API Sentence Transformer Embeddings API Retrieval-Augmented Generation Your DATA Your Website Vector Space PROMPT vector
 SIMILARITY SEARCH PROMPT+RAG ANSWER Conversation Agent API Rest API DOCUMENT SPLITTER

Slide 21

Slide 21 text

©2023 DataStax. – All rights reserved
 Retrieval and generation are not the same 21
 Seems obvious, but… Standard RAG examples generally treat both steps exactly the same. Retrieval: ● Important to find the correct/best matching documents ● Varying size of embedded text will affect how documents cluster in the embedding space Generation: ● Models support large context windows ● Are good at summarizing, extracting facts and data from long documents

Slide 22

Slide 22 text

©2023 DataStax. – All rights reserved
 Some genAI apps only need retrieval 22
 Shop App

Slide 23

Slide 23 text

©2023 DataStax. – All rights reserved
 FLARE: Forward-Looking Active REtrieval 23
 23
 Query Knowledge base for question (Vector Search)
 1 Extract more questions from knowledge base (LLM)
 2 Query knowledge base with all questions 
 (Vector Search)
 3 Summarize and check answer 
 (LLM)
 4 Repeat 1–4 until answer looks good
 23
 https://arxiv.org/abs/2305.06983

Slide 24

Slide 24 text

©2023 DataStax. – All rights reserved
 Vector Database ? 24
 Vector DATABASE Vector Space SIMILARITY SEARCH ● Handling A LOT of vector (chunking)
 ● Effective Search Algorithms
 ● Performant, Resilient
 ● Dynamic (vector sizes)
 
 ● Meta Data Filtering
 ● Keyword search
 ● Semantic Caching
 ● Chat History
 ● Key Value cache


Slide 25

Slide 25 text

©2023 DataStax. – All rights reserved
 Apache Cassandra™ as a vector database 25


Slide 26

Slide 26 text

©2023 DataStax. – All rights reserved
 Apache Cassandra® Undisputed Leader of Scale and Reliability


Slide 27

Slide 27 text

©2023 DataStax. – All rights reserved
 Nosql Distributed database 1 Installation = 1 NODE ✔ Capacity = ~ 2-4TB ✔ Throughput = LOTS Tx/sec/core Communication: ✔ Gossiping ✔ No Master (peer-to-peer) DataCenter (DC) | Ring

Slide 28

Slide 28 text

©2023 DataStax. – All rights reserved
 network sensor temperature forest f001 92 forest f002 88 volcano v001 210 sea s001 45 sea s002 50 home h001 72 road r001 105 road r002 110 ice i001 35 car c001 69 dog d001 40 car c002 70 sensors_by_network Partition Key Primary Key Nosql Distributed database

Slide 29

Slide 29 text

©2023 DataStax. – All rights reserved
 forest f001 92 f002 88 sea s001 45 s002 50 c002 70 car c001 69 volcano v001 210 home h001 72 road r002 110 r001 105 ice i001 35 dog d001 40 Distributed

Slide 30

Slide 30 text

©2023 DataStax. – All rights reserved
 Apache Cassandra High Availability Always On Every second of downtime translates into lost revenue 
 Linear Scalability Hyper Scalability Millions of operations per day, hour, or second
 Global Distribution Data Everywhere On-premises, hybrid, multi-cloud, centralized, or edge 
 Low Latency Faster Pace Every millisecond of latency has consequence 


Slide 31

Slide 31 text

©2023 DataStax. – All rights reserved
 ● Scale-Out Capabilities: No upper limits
 ● Garbage Collection: Pruning obsolete index information
 ● Effective Use of Disk: Enabling high throughput
 ● Composability: Predicates, term-based searches. Aka Hybrid Search
 ● Concurrency: Non-blocking, multi-threaded index construction
 https://thenewstack.io/5-hard-problems-in-vector-search-and-how-cassandra-solves-them/ 5 Hard Problems We’re Solving 31


Slide 32

Slide 32 text

©2023 DataStax. – All rights reserved


Slide 33

Slide 33 text

©2023 DataStax. – All rights reserved
 CREATE TABLE IF NOT EXISTS vsearch.products ( id int PRIMARY KEY, name TEXT, description TEXT, item_vector VECTOR //5-dimensional embedding ); New Data Model 33


Slide 34

Slide 34 text

©2023 DataStax. – All rights reserved
 CREATE CUSTOM INDEX IF NOT EXISTS ann_index ON vsearch.products(item_vector) USING 'StorageAttachedIndex'; Creating a Vector Search Index 34


Slide 35

Slide 35 text

©2023 DataStax. – All rights reserved
 SELECT * FROM vsearch.products ORDER BY item_vector ANN OF [0.15, 0.1, 0.1, 0.35, 0.55] LIMIT 1; id | description | item_vector | name ----+-------------------------------------------------+-----------------------------+--------------------- 5 | A deep learning display that controls your mood | [0.1, 0.05, 0.08, 0.3, 0.6] | Vision Vector Frame Searching for Neighbors 35


Slide 36

Slide 36 text

©2023 DataStax. – All rights reserved
 ©2023 DataStax. – All rights reserved
 Composes with partitioning
 SELECT * FROM demo WHERE partition_id = ? ORDER BY embedding ANN OF ? LIMIT 100
 Composes with other 
 SAI indexes
 SELECT * FROM demo WHERE (c1 = ? AND c2 = ?) OR c3 = ? ORDER BY embedding ANN OF ? LIMIT 5
 Global ANN 
 everywhere
 SELECT * FROM demo ORDER BY embedding ANN OF ? LIMIT 10
 Integration 36


Slide 37

Slide 37 text

©2023 DataStax. – All rights reserved
 Not good enough….. What about the real similarity search 37


Slide 38

Slide 38 text

©2023 DataStax. – All rights reserved
 An introduction to jVector 38


Slide 39

Slide 39 text

©2023 DataStax. – All rights reserved
 HNSW - Hierarchical Navigable Small World 39
 As seen in:
 
 ● Lucene
 ○ Elastic
 ○ Solr
 ○ Mongo
 ● Weaviate
 ● Qdrant
 ● Astra (June 2023)
 ● …


Slide 40

Slide 40 text

©2023 DataStax. – All rights reserved
 Disk ANN and Product Quantization

Slide 41

Slide 41 text

©2023 DataStax. – All rights reserved
 Concurrency World 41


Slide 42

Slide 42 text

©2023 DataStax. – All rights reserved
 42


Slide 43

Slide 43 text

©2023 DataStax. – All rights reserved


Slide 44

Slide 44 text

©2023 DataStax. – All rights reserved
 Integrating with the ecosystem 44


Slide 45

Slide 45 text

©2023 DataStax. – All rights reserved
 Build around LLM & embeddings 45
 LLM Embeddings Your GenAI application
 quite a lot of logic to handle all this …
 Prompt templating
 Memory of the past/context mgmt
 Domain-specific knowledge
 Retrieval-Augmented Generation
 Caching
 Prompt versioning/mgmt
 Storage (vector or otherwise)
 Reranking (e.g. MMR)
 Agents
 . . .
 Doc ingestion
 "Chain of thought"
 Any Any

Slide 46

Slide 46 text

©2023 DataStax. – All rights reserved
 ©2023 DataStax. – All rights reserved
 LangChain
 Messy, bloated: but it's everywhere
 Broader coverage of everything around LLMs
 "must know"
 Top Frameworks (Python ecosystem) 46
 LlamaIndex
 More considerate in its growth
 Mainly data ingestion and storage/retrieval
 "should know"
 Semantic Kernel
 Still a lesser player
 Auto-planner feature
 Possibly the best-structured
 "Sem… what?"
 …
 Probably not worth checking right now
 current Github star count ~ 60k TILs mentioning framework 27 GH forks, StackOverflow tell the same story

Slide 47

Slide 47 text

©2023 DataStax. – All rights reserved
 Introducing Vector Search with AI Apps Ai Powered Application ChatBot | AI Assistant | Copilot Text Generation Search Engine Multi Model Similarity Search ChatMemory Semantic Cache Meta-Data Filtering KV Cache Prompt Template RAG LLM History 🦜🔗Langchain 🦙 Llama index Semantic Kernel vector vector metadata_s map body_blob text keys Cassandra Use Cases Queries Tables & Indexes LLM Chat completion API Rest API Conversation Agent API Rest API Embeddings API Rest API

Slide 48

Slide 48 text

©2023 DataStax. – All rights reserved
 CREATE TABLE $name ( row_id text PRIMARY KEY, attributes_blob text, body_blob text, metadata_s map, vector vector ) CREATE TABLE $name ( partition_id text, row_id text, attributes_blob text, body_blob text, metadata_s map, vector vector, PRIMARY KEY (partition_id, row_id) ) CREATE TABLE BaseTable ( row_id text PRIMARY KEY, body_blob text, metadata_s map, ) LLM Providers Embeddings API Rest API Chat completion API Rest API Prompt Template Map data into prompts PlainCassandraTable VectorStore Store embeddings as knowledge for LLM MetadataVectorCassandraTable VectorStore + Partition aware Store embeddings as knowledge for LLM ClusteredMetadataVectorCassandraTable create put search create put search connection VectorStore ChatMemory Semantic Cache Meta-Data Filtering KV Cache 🦜🔗Langchain adapters adapters adapters CASSIO: Dedicated models for your Queries

Slide 49

Slide 49 text

©2023 DataStax. – All rights reserved
 GITPOD My First Application Genai-Demo Text Generation Astra DB VECTOR OpenAi LLM Providers Embeddings API Rest API Chat completion API Rest API OpenAI JAVA CLIENT com.theokanning.openai-gpt3-java CREATE TABLE philosophers ( partition_id text, row_id text, attributes_blob text, body_blob text, metadata_s map, vector vector, PRIMARY KEY (partition_id, row_id) ) astra-sdk-vector MetadataVectorCassandraTable Cassandra Driers Module Vector

Slide 50

Slide 50 text

©2023 DataStax. – All rights reserved
 GenAI Application My Second Application Genai-Demo-week2 Text Generation Astra DB VECTOR CREATE TABLE philosophers ( partition_id text, row_id text, attributes_blob text, body_blob text, metadata_s map, vector vector, PRIMARY KEY (partition_id, row_id) ) Cassandra Drivers Module Vector VertexAI LLM Provider Embeddings API Rest API Chat completion API Rest API langchain4j-cassandra Support of SDK Vector Langchain4j astra-sdk-vector MetadataVectorCassandraTable Langchain4j-vertex-ai

Slide 51

Slide 51 text

©2023 DataStax. – All rights reserved
 Not good enough….. Demo & Code 51


Slide 52

Slide 52 text

©2023 DataStax. – All rights reserved
 Real-Time Generative AI 52


Slide 53

Slide 53 text

©2023 DataStax. – All rights reserved
 What is LangStream? 53
 What is it ?
 ● Framework for developing generative AI applications
 ● Runtime environment for generative AI applications 
 ● Data integration platform to bring relevant and recent data to gen AI
 ● Powered by proven technology: Kubernetes, Kafka, Kafka Connect
 


Slide 54

Slide 54 text

©2023 DataStax. – All rights reserved
 LangStream 54
 Kubernetes Kafka Agent chat completion Vector DB AI Services Agent OSS embedding LangStream LangStream Operator
 DB Control Pane and API WebSock et Gateway

Slide 55

Slide 55 text

©2023 DataStax. – All rights reserved
 Not good enough ??? Let us talk :) 55


Slide 56

Slide 56 text

©2023 DataStax. – All rights reserved