Ruby On RAG - Building AI Use Cases for Fun and Profit

Slide 1

Slide 1 text

Ruby On RAG Building AI Use Cases for Fun and Proﬁt 1

Slide 2

Slide 2 text

I am Landon Gray Founder & AI Engineer @ Identus Consulting Hello! 2

Slide 3

Slide 3 text

Overview ◉ Fun ○ What is RAG ○ What Problem does it solve ○ How it works? ■ Indexing ■ Retrieval & Generation ◉ Proﬁt - Practical ○ Demo 3

Slide 4

Slide 4 text

What is Rag? 4

Slide 5

Slide 5 text

“ Retrieval Augmented Generation (RAG) is way to augment the LLM knowledge with additional information. 5

Slide 6

Slide 6 text

Typical LLM Query-Response Cycle 6

Slide 7

Slide 7 text

Hallucinations (making stuff up) 7

Slide 8

Slide 8 text

Non Relevant Data 8

Slide 9

Slide 9 text

9 How do we deal with this?

Slide 10

Slide 10 text

Try Adding Documents Limited Context Window!

Slide 11

Slide 11 text

Token Token can be thought of as a piece of a word. Context Window & Tokens Context Window The maximum number of tokens that can be used in a single request, inclusive of both input and output tokens. 11 https://platform.openai.com/docs/models https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them

Slide 12

Slide 12 text

Rule of Thumb 1 token~= 4 chars in English 100 tokens ~= 75 words Context Window & Tokens 12 https://platform.openai.com/docs/models https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them Example 6 tokens = “Ruby is a ﬁne programming language” GPT4-o Context Window = 128,000 tokens ~= 96,000 words

Slide 13

Slide 13 text

96,000 Words Visualized: To Kill a Mockingbird Approx. 100,000 words 13

Slide 14

Slide 14 text

96,000 Words Visualized: Ender’s Game Approx. 100,000 words 14

Slide 15

Slide 15 text

96,000 Words Visualized: 1984 Approx. 88,000 words 15

Slide 16

Slide 16 text

16 How we calculate the token size of various documents?

Slide 17

Slide 17 text

OpenAI - Tokenizer OpenAI has a tool to help us understand how many tokens a piece of text might contain. 17

Slide 18

Slide 18 text

tiktoken_ruby tiktoken_ruby is a fast BPE (Byte Pair Encoding) tokenizer for use with OpenAI's models. 18

Slide 19

Slide 19 text

Limited Context Window

Slide 20

Slide 20 text

Indexing

Slide 21

Slide 21 text

Chunking 21

Slide 22

Slide 22 text

Chunking (Text Splitting)

Slide 23

Slide 23 text

Pass chunks to the LLM Which Chunks?

Slide 24

Slide 24 text

24 We’ll get back to that…

Slide 25

Slide 25 text

Embeddings 25

Slide 26

Slide 26 text

Embedding = Array of Floats 26

Slide 27

Slide 27 text

Embedding = Vector = Array of Floats 27

Slide 28

Slide 28 text

Chunks to Array of Floats (Embedding) Chunks Embedding

Slide 29

Slide 29 text

“ Vector embeddings are a way to convert words and sentences and other data into numbers that capture their meaning and relationships. 29

Slide 30

Slide 30 text

Effecient Semantic Search 30

Slide 31

Slide 31 text

31 Cake Water Milk Cookie Relationships?

Slide 32

Slide 32 text

32 Cake Water Milk Cookie Drink Dessert

Slide 33

Slide 33 text

33 Cake Water Milk Cookie Drink Dessert Water Milk Cake Cookie

Slide 34

Slide 34 text

34 Drink Dessert Water Milk Cake Cookie Meaning Maintained!

Slide 35

Slide 35 text

Store 35

Slide 36

Slide 36 text

Store our Embeddings

Slide 37

Slide 37 text

Quick Recap - Indexing Chunking Embedding Store (Vector Database)

Slide 38

Slide 38 text

38 Time to get back to that thing…

Slide 39

Slide 39 text

Pass chunks to the LLM Which Chunks?

Slide 40

Slide 40 text

Pass chunks to the LLM

Slide 41

Slide 41 text

Similarity Search 41

Slide 42

Slide 42 text

Observations 42

Slide 43

Slide 43 text

Slide 44

Slide 44 text

Slide 45

Slide 45 text

Graph our data 45

Slide 46

Slide 46 text

Slide 47

Slide 47 text

47 Vector Store (Vector Space)

Slide 48

Slide 48 text

Slide 49

Slide 49 text

Slide 50

Slide 50 text

Slide 51

Slide 51 text

Slide 52

Slide 52 text

Slide 53

Slide 53 text

Similarity Search 53

Slide 54

Slide 54 text

Slide 55

Slide 55 text

Find 3 chunks similar to Star Wars

Slide 56

Slide 56 text

Slide 57

Slide 57 text

Slide 58

Slide 58 text

Find 3 chunks similar to Star Wars

Slide 59

Slide 59 text

Find 3 chunks similar to Star Wars

Slide 60

Slide 60 text

Find 3 chunks similar to Star Wars

Slide 61

Slide 61 text

Retrieval & Generation

Slide 62

Slide 62 text

Retrieval & Generation

Slide 63

Slide 63 text

Quick Recap: Similarity Search

Slide 64

Slide 64 text

Generate a prompt Query: Search Results:

Slide 65

Slide 65 text

Clariﬁcation: Query vs Prompt Prompt Final text input ingest by the LLM. Which often contains the Query Query Input text generated by some human or system.