Slide 1

Slide 1 text

Let’s Build A Custom ChatGPT Like Bot With AI APIs Tamar Twena-Stern

Slide 2

Slide 2 text

Tamar Twena-Stern twitter @sterntwena linkedin /tamarstern tamar.twena@gmail.com

Slide 3

Slide 3 text

Writing A Question - Answering Bot - Custom (Chat) GPT - On Pre Defined Knowledge Base With AI APIs

Slide 4

Slide 4 text

● LLMs and how they work ● LLMs building blocks ● Architecture to write Custom GPT + Motives ● Code + Demo ● Taking up the scale

Slide 5

Slide 5 text

What Is An LLM And How It Works?

Slide 6

Slide 6 text

Large language models (LLMs) are machine learning models that can comprehend and generate human language text. They work by analyzing massive data sets of language.

Slide 7

Slide 7 text

To Understand How LLM Works In High Level - We Need To Understand It’s Layers

Slide 8

Slide 8 text

Layer 1 - Machine Learning Algorithms

Slide 9

Slide 9 text

Layer 2 - Deep Learning - Neural Networks - Models With Large Amount Of Parameters And Complicated relations Animal Probability Cat 0.1 Dog 0.9 Neurons

Slide 10

Slide 10 text

To Solve Text Problems - Embedding Words - Turn A Sentence To A Sequence Of Numeric Inputs

Slide 11

Slide 11 text

Layer 3 - The LLMs

Slide 12

Slide 12 text

Layer 3 - LLM Classification Problem To Predict Next Word The child likes to play with the Word Probability zebra 0.1 ball 0.7 shoe 0.05

Slide 13

Slide 13 text

LLMs Are Generative AI - Generate Text By Feeding The Next Word

Slide 14

Slide 14 text

ChatGPT - 175B Neurons Human Brain - 100B Neurons Fun Fact

Slide 15

Slide 15 text

LLM Building Blocks That You Should Know

Slide 16

Slide 16 text

LLM Tokens nm, Node.js is an open source, cross platform JavaScript runtime environment that allows developers to run code outside the web browser.

Slide 17

Slide 17 text

Temperature - LLM Setting to Guess Words With Smaller Probablity The child likes to play with the Word Probability zebra 0.1 ball 0.7 shoe 0.05

Slide 18

Slide 18 text

LLM Response With Low Temperature

Slide 19

Slide 19 text

High Temperature Leads To “Confused” Response

Slide 20

Slide 20 text

When Using An LLM , You Need To Give It A Prompt prompt - “You are a helpful assistant that help developers to find bugs in their code”

Slide 21

Slide 21 text

LLM can hallucinate - Generate a grammatically correct answer with incorrect data

Slide 22

Slide 22 text

The Prompt Can Be Used To Avoid Hellucinations prompt - “You are a helpful assistant that help developers to find bugs in their code. If you don’t know the answer, say you don’t know. Never make up an answer”

Slide 23

Slide 23 text

LLM Context Window

Slide 24

Slide 24 text

The Custom (Chat) GPT Architecture

Slide 25

Slide 25 text

I Want To Integrate My Bot With My Custom Data - How To Do That ?

Slide 26

Slide 26 text

Option 1 : Fine Tune The LLM And Train It With The Context Data

Slide 27

Slide 27 text

Why It Is Not The Best Approach ? ● Expensive, long ● Risks "catastrophic forgetting" where the model loses knowledge from its pre training ● Risk of generating irrelevant output ● Requires access to large amount of training data

Slide 28

Slide 28 text

If I could send data to the LLM on each question ?

Slide 29

Slide 29 text

In old generation LLM the context window is limited, but in newer models the context window is huge

Slide 30

Slide 30 text

Why Transfering All The Data To The Context Window Is Not A Good Idea ? LLM Data

Slide 31

Slide 31 text

Big Effort To Make A CTRL+F In A Text Document

Slide 32

Slide 32 text

RAG Architecture To Feed The Context Window With The Accurate Data For Each Question

Slide 33

Slide 33 text

Step 1 - RAG Data Ingestion Pipeline

Slide 34

Slide 34 text

Step 2 - RAG - Question-Answer Bot Architecture

Slide 35

Slide 35 text

Let’s Start To See Some Code !

Slide 36

Slide 36 text

LLM Providers That You can Use ● GPT 3.5, GPT4, GPT4-Turbo,GPT-4o ● Gemini, Palm ● Llama

Slide 37

Slide 37 text

We will develop our bot with OpenAI model And Langchain - a framework for developing applications powered by large language models (LLMs).

Slide 38

Slide 38 text

Initialize OpenAI Model import { ChatOpenAI } from "@langchain/openai"; // initialize the LLM const llm = new ChatOpenAI({ model: "gpt-4o" });

Slide 39

Slide 39 text

RAG Data Ingestion Pipeline

Slide 40

Slide 40 text

Load Knowledge Base Documents With LangChain Loaders import { DirectoryLoader } from "langchain/document_loaders/fs/directory"; import { PDFLoader } from "@langchain/community/document_loaders/fs/pdf"; import { CSVLoader } from "@langchain/community/document_loaders/fs/csv"; import { TextLoader } from "langchain/document_loaders/fs/text"; import { KNOWLEDGE_BASE_PATH } from "./consts"; const loadDocuments = async () => { const directoryLoader = new DirectoryLoader(KNOWLEDGE_BASE_PATH, { ".pdf": (path) => new PDFLoader(path), ".txt": (path) => new TextLoader(path), ".csv": (path) => new CSVLoader(path), }); const loadedDocs = await directoryLoader.load(); return loadedDocs; } export default loadDocuments;

Slide 41

Slide 41 text

Split Documents With LangChain Text Splitter import { RecursiveCharacterTextSplitter } from "langchain/text_splitter"; const splitDocuments = async (loadedDocs) => { const textSplitter = new RecursiveCharacterTextSplitter({ chunkSize: 1500, chunkOverlap: 100 }); const splitDocs = await textSplitter.splitDocuments(loadedDocs); return splitDocs; } export default splitDocuments;

Slide 42

Slide 42 text

RAG Data Ingestion Pipeline

Slide 43

Slide 43 text

Chroma DB is an open-source vector store used for storing and retrieving vector embeddings to be used later by large language models. version: '3.9' networks: net: driver: bridge services: chromadb: image: chromadb/chroma:latest volumes: - :/chroma/chroma environment: - IS_PERSISTENT=TRUE - PERSIST_DIRECTORY=/chroma/chroma # this is the default path, change it as needed - ANONYMIZED_TELEMETRY=${ANONYMIZED_TELEMETRY:-TRUE} ports: - 8000:8000 networks: - net

Slide 44

Slide 44 text

Saving Text Chunks To Chroma DB import { OpenAIEmbeddings } from "@langchain/openai"; import { Chroma } from "@langchain/community/vectorstores/chroma"; import {KNOWLEDGE_BASE_DB_COLLECTION_NAME} from './consts'; const storeAsVectors = async (splitDocs) => { try { // Embedding model const embeddingsModel = new OpenAIEmbeddings( { model: "text-embedding-3-small" }); const sectionLength = Math.ceil(splitDocs.length / 5); for (let i = 0; i < 5; i++) { const docsToInsert = splitDocs.slice((i * sectionLength), ((i + 1) * sectionLength)); await Chroma.fromDocuments(docsToInsert, embeddingsModel, { collectionName: KNOWLEDGE_BASE_DB_COLLECTION_NAME, }); } } catch (e) { console.error("Error when inserting documents:", e); } } export default storeAsVectors;

Slide 45

Slide 45 text

RAG - Question-Answer Bot Architecture

Slide 46

Slide 46 text

Context Retriever - To Retrieve The Accurate Data From The DB import { OpenAIEmbeddings } from "@langchain/openai"; import { Chroma } from "@langchain/community/vectorstores/chroma"; import { KNOWLEDGE_BASE_DB_COLLECTION_NAME } from "../dataInjest/consts"; const generateContextRetreiver = async () => { const vectorStore = await Chroma.fromExistingCollection(new OpenAIEmbeddings({ model: "text-embedding-3-small" }), {collectionName: KNOWLEDGE_BASE_DB_COLLECTION_NAME}); const retriever = vectorStore.asRetriever({ k: 6, searchType: "similarity" }); return retriever; } export default generateContextRetreiver;

Slide 47

Slide 47 text

RAG - Question-Answer Bot Architecture

Slide 48

Slide 48 text

Prompt Creation import { PromptTemplate } from "@langchain/core/prompts"; const createPrompt = async ()=> { const template = `You are an assistant for question-answering tasks. Use the following pieces context to answer the question. If you don't know the answer, say that you don't know. Never make up an answer. Context: {context} Question: {question}`; return PromptTemplate.fromTemplate(template); } export default createPrompt;

Slide 49

Slide 49 text

Bot Wrapped In An Express Server const app = express(); const port = 8080; app.use(bodyParser.json()); // Request to answer question with llm app.post('/api/question', async (req, res) => { const userQuestion = req.body.userQuestion; if (!userQuestion) { return res.status(400).json({ error: 'User question required' }); } const retrievedChunks = await retriever.invoke(userQuestion); const response = await ragChain.invoke({ question: userQuestion.trim(), context: retrievedChunks, }); res.status(200).json({ answer: response }); }); app.listen(port, () => { console.log(`Server is running on http://localhost:${port}`); });

Slide 50

Slide 50 text

Live Demo - Lets See The System In Action

Slide 51

Slide 51 text

How To Implement This Architecture To Support Large Scale ?

Slide 52

Slide 52 text

Scale The API Layer

Slide 53

Slide 53 text

In High Scale - Improve The Indexing And Search Method In The Context retriever import { OpenAIEmbeddings } from "@langchain/openai"; import { Chroma } from "@langchain/community/vectorstores/chroma"; import { KNOWLEDGE_BASE_DB_COLLECTION_NAME } from "../dataInjest/consts"; const generateContextRetreiver = async () => { const vectorStore = await Chroma.fromExistingCollection(new OpenAIEmbeddings({ model: "text-embedding-3-small" }), {collectionName: KNOWLEDGE_BASE_DB_COLLECTION_NAME}); const retriever = vectorStore.asRetriever({ k: 6, searchType: "similarity" }); return retriever; } export default generateContextRetreiver;

Slide 54

Slide 54 text

RAG Data Ingestion Pipeline

Slide 55

Slide 55 text

In The Code You Can See That Loading, Splitting And Save Is Done In Memory import loadDocuments from './dataLoader.js'; import splitDocuments from './dataSplitter.js'; import storeAsVectors from './saveDocumentsToDB.js'; const injestDosuments = async () => { const loadedDocuments = await loadDocuments(); const splitDocs = await splitDocuments(loadedDocuments); await storeAsVectors(splitDocs); } export default injestDosuments;

Slide 56

Slide 56 text

Async Iterators To Load And Split The Data In Chunks async function* getDirectoriesAndFiles(filePath) { const entries = await fs.readdir(filePath, { withFileTypes: true }); for (const entry of entries) { yield entry.name; if (entry.isDirectory()) { yield* getDirectoriesAndFiles(path.join(filePath, entry.name)); } } } async function loadAndSplitData(rootPath) { for await (const fileSystemEntry of getDirectoriesAndFiles(rootPath)) { // Do Document Loading, splitting and save Here } }

Slide 57

Slide 57 text

Suggested Architecture For Ingestion Pipeline With S3 Bucket

Slide 58

Slide 58 text

Tamar Twena-Stern twitter @sterntwena linkedin /tamarstern tamar.twena@gmail.com