Let's Build A ChatGPT Like Bot With AI APIs

Let’s Build A Custom ChatGPT Like Bot With AI APIs
Tamar Twena-Stern

Tamar Twena-Stern twitter @sterntwena linkedin /tamarstern [email protected]

Writing A Question - Answering Bot - Custom (Chat) GPT
- On Pre Defined Knowledge Base With AI APIs

• LLMs and how they work • LLMs building blocks
• Architecture to write Custom GPT + Motives • Code + Demo • Taking up the scale

What Is An LLM And How It Works?

Large language models (LLMs) are machine learning models that can
comprehend and generate human language text. They work by analyzing massive data sets of language.

To Understand How LLM Works In High Level - We
Need To Understand It’s Layers

Layer 1 - Machine Learning Algorithms

Layer 2 - Deep Learning - Neural Networks - Models
With Large Amount Of Parameters And Complicated relations Animal Probability Cat 0.1 Dog 0.9 Neurons

To Solve Text Problems - Embedding Words - Turn A
Sentence To A Sequence Of Numeric Inputs

Layer 3 - The LLMs

Layer 3 - LLM Classification Problem To Predict Next Word
The child likes to play with the Word Probability zebra 0.1 ball 0.7 shoe 0.05

LLMs Are Generative AI - Generate Text By Feeding The
Next Word

ChatGPT - 175B Neurons Human Brain - 100B Neurons Fun
Fact

LLM Building Blocks That You Should Know

LLM Tokens nm, Node.js is an open source, cross platform
JavaScript runtime environment that allows developers to run code outside the web browser.

Temperature - LLM Setting to Guess Words With Smaller Probablity
The child likes to play with the Word Probability zebra 0.1 ball 0.7 shoe 0.05

LLM Response With Low Temperature

High Temperature Leads To “Confused” Response

When Using An LLM , You Need To Give It
A Prompt prompt - “You are a helpful assistant that help developers to find bugs in their code”

LLM can hallucinate - Generate a grammatically correct answer with
incorrect data

The Prompt Can Be Used To Avoid Hellucinations prompt -
“You are a helpful assistant that help developers to find bugs in their code. If you don’t know the answer, say you don’t know. Never make up an answer”

LLM Context Window

The Custom (Chat) GPT Architecture

I Want To Integrate My Bot With My Custom Data
- How To Do That ?

Option 1 : Fine Tune The LLM And Train It
With The Context Data

Why It Is Not The Best Approach ? • Expensive,
long • Risks "catastrophic forgetting" where the model loses knowledge from its pre training • Risk of generating irrelevant output • Requires access to large amount of training data

If I could send data to the LLM on each
question ?

In old generation LLM the context window is limited, but
in newer models the context window is huge

Why Transfering All The Data To The Context Window Is
Not A Good Idea ? LLM Data

Big Effort To Make A CTRL+F In A Text Document

RAG Architecture To Feed The Context Window With The Accurate
Data For Each Question

Step 1 - RAG Data Ingestion Pipeline

Step 2 - RAG - Question-Answer Bot Architecture

Let’s Start To See Some Code !

LLM Providers That You can Use • GPT 3.5, GPT4,
GPT4-Turbo,GPT-4o • Gemini, Palm • Llama

We will develop our bot with OpenAI model And Langchain
- a framework for developing applications powered by large language models (LLMs).

Initialize OpenAI Model import { ChatOpenAI } from "@langchain/openai"; //
initialize the LLM const llm = new ChatOpenAI({ model: "gpt-4o" });

RAG Data Ingestion Pipeline

Load Knowledge Base Documents With LangChain Loaders import { DirectoryLoader
} from "langchain/document_loaders/fs/directory"; import { PDFLoader } from "@langchain/community/document_loaders/fs/pdf"; import { CSVLoader } from "@langchain/community/document_loaders/fs/csv"; import { TextLoader } from "langchain/document_loaders/fs/text"; import { KNOWLEDGE_BASE_PATH } from "./consts"; const loadDocuments = async () => { const directoryLoader = new DirectoryLoader(KNOWLEDGE_BASE_PATH, { ".pdf": (path) => new PDFLoader(path), ".txt": (path) => new TextLoader(path), ".csv": (path) => new CSVLoader(path), }); const loadedDocs = await directoryLoader.load(); return loadedDocs; } export default loadDocuments;

Split Documents With LangChain Text Splitter import { RecursiveCharacterTextSplitter }
from "langchain/text_splitter"; const splitDocuments = async (loadedDocs) => { const textSplitter = new RecursiveCharacterTextSplitter({ chunkSize: 1500, chunkOverlap: 100 }); const splitDocs = await textSplitter.splitDocuments(loadedDocs); return splitDocs; } export default splitDocuments;

Chroma DB is an open-source vector store used for storing
and retrieving vector embeddings to be used later by large language models. version: '3.9' networks: net: driver: bridge services: chromadb: image: chromadb/chroma:latest volumes: - <PATH_ON_YOUR_LOCAL_MACHINE>:/chroma/chroma environment: - IS_PERSISTENT=TRUE - PERSIST_DIRECTORY=/chroma/chroma # this is the default path, change it as needed - ANONYMIZED_TELEMETRY=${ANONYMIZED_TELEMETRY:-TRUE} ports: - 8000:8000 networks: - net

Saving Text Chunks To Chroma DB import { OpenAIEmbeddings }
from "@langchain/openai"; import { Chroma } from "@langchain/community/vectorstores/chroma"; import {KNOWLEDGE_BASE_DB_COLLECTION_NAME} from './consts'; const storeAsVectors = async (splitDocs) => { try { // Embedding model const embeddingsModel = new OpenAIEmbeddings( { model: "text-embedding-3-small" }); const sectionLength = Math.ceil(splitDocs.length / 5); for (let i = 0; i < 5; i++) { const docsToInsert = splitDocs.slice((i * sectionLength), ((i + 1) * sectionLength)); await Chroma.fromDocuments(docsToInsert, embeddingsModel, { collectionName: KNOWLEDGE_BASE_DB_COLLECTION_NAME, }); } } catch (e) { console.error("Error when inserting documents:", e); } } export default storeAsVectors;

RAG - Question-Answer Bot Architecture

Context Retriever - To Retrieve The Accurate Data From The
DB import { OpenAIEmbeddings } from "@langchain/openai"; import { Chroma } from "@langchain/community/vectorstores/chroma"; import { KNOWLEDGE_BASE_DB_COLLECTION_NAME } from "../dataInjest/consts"; const generateContextRetreiver = async () => { const vectorStore = await Chroma.fromExistingCollection(new OpenAIEmbeddings({ model: "text-embedding-3-small" }), {collectionName: KNOWLEDGE_BASE_DB_COLLECTION_NAME}); const retriever = vectorStore.asRetriever({ k: 6, searchType: "similarity" }); return retriever; } export default generateContextRetreiver;

RAG - Question-Answer Bot Architecture

Prompt Creation import { PromptTemplate } from "@langchain/core/prompts"; const createPrompt
= async ()=> { const template = `You are an assistant for question-answering tasks. Use the following pieces context to answer the question. If you don't know the answer, say that you don't know. Never make up an answer. Context: {context} Question: {question}`; return PromptTemplate.fromTemplate(template); } export default createPrompt;

Bot Wrapped In An Express Server const app = express();
const port = 8080; app.use(bodyParser.json()); // Request to answer question with llm app.post('/api/question', async (req, res) => { const userQuestion = req.body.userQuestion; if (!userQuestion) { return res.status(400).json({ error: 'User question required' }); } const retrievedChunks = await retriever.invoke(userQuestion); const response = await ragChain.invoke({ question: userQuestion.trim(), context: retrievedChunks, }); res.status(200).json({ answer: response }); }); app.listen(port, () => { console.log(`Server is running on http://localhost:${port}`); });

Live Demo - Lets See The System In Action

How To Implement This Architecture To Support Large Scale ?

Scale The API Layer

In High Scale - Improve The Indexing And Search Method
In The Context retriever import { OpenAIEmbeddings } from "@langchain/openai"; import { Chroma } from "@langchain/community/vectorstores/chroma"; import { KNOWLEDGE_BASE_DB_COLLECTION_NAME } from "../dataInjest/consts"; const generateContextRetreiver = async () => { const vectorStore = await Chroma.fromExistingCollection(new OpenAIEmbeddings({ model: "text-embedding-3-small" }), {collectionName: KNOWLEDGE_BASE_DB_COLLECTION_NAME}); const retriever = vectorStore.asRetriever({ k: 6, searchType: "similarity" }); return retriever; } export default generateContextRetreiver;

In The Code You Can See That Loading, Splitting And
Save Is Done In Memory import loadDocuments from './dataLoader.js'; import splitDocuments from './dataSplitter.js'; import storeAsVectors from './saveDocumentsToDB.js'; const injestDosuments = async () => { const loadedDocuments = await loadDocuments(); const splitDocs = await splitDocuments(loadedDocuments); await storeAsVectors(splitDocs); } export default injestDosuments;

Async Iterators To Load And Split The Data In Chunks
async function* getDirectoriesAndFiles(filePath) { const entries = await fs.readdir(filePath, { withFileTypes: true }); for (const entry of entries) { yield entry.name; if (entry.isDirectory()) { yield* getDirectoriesAndFiles(path.join(filePath, entry.name)); } } } async function loadAndSplitData(rootPath) { for await (const fileSystemEntry of getDirectoriesAndFiles(rootPath)) { // Do Document Loading, splitting and save Here } }

Suggested Architecture For Ingestion Pipeline With S3 Bucket

Tamar Twena-Stern twitter @sterntwena linkedin /tamarstern [email protected]

Let's Build A ChatGPT Like Bot With AI APIs

Let's Build A ChatGPT Like Bot With AI APIs

More Decks by Tamar Twena-Stern

Other Decks in Programming

Featured

Transcript