Slide 1

Slide 1 text

Catching the AI train by Andrei Bondarev

Slide 2

Slide 2 text

About Me 13 years with Ruby ❤️ Built software and led teams in the federal government, SaaS consumer products and B2B enterprise verticals Currently: Architect / Fractional CTO Created the Langchain.rb library GitHub - andreibondarev/langchainrb: Build LLM-backed Ruby applications

Slide 3

Slide 3 text

Can we ignore the AI train?

Slide 4

Slide 4 text

The Gen AI promise "Generative AI’s impact on productivity could add the equivalent of $2.6 trillion to $4.4 trillion annually in value to the global economy." "75% of the value delivered will be across 4 areas: Customer Operations, Marketing & Sales, Software Engineering, and R&D" "Generative AI will be automating work activities that take up to 60-70% of employees' time today" "Half of today’s work activities could be automated between 2030 and 2060"

Slide 5

Slide 5 text

Coatue AI Report

Slide 6

Slide 6 text

Why (not) Ruby? Monoliths are back in fashion Pragmatic community OOP / Good software dev fundamentals Ruby ~ Python

Slide 7

Slide 7 text

What is Generative AI? Generative AI is a type of artificial intelligence technology that can produce various types of content, including text, imagery, audio/video, etc. Large Language Models (LLMs) Deep Learning Artificial Neural Networks (models) with general-purpose language understanding and generation. Exploded in popularity after the Attention Is All You Need (2017) research paper that introduced the Transformers architecture.

Slide 8

Slide 8 text

LLMs excel at Structuring Data Collecting and converting unstructured data to structured data. Summarizing Data Contextualizing a large body of text and producing a summary. Classifying Data Bucketing a large body of text into topics. and many more other tasks…

Slide 9

Slide 9 text

Problems with LLMs Hallucinations Model generating incorrect or non-sensical text Outdated Data Example: GPT-4 was trained on data up to April 2023 Relevant Knowledge is Not Used …there may be a solution… 🤔

Slide 10

Slide 10 text

Retrieval Augmented Generation (RAG) Technique for enhancing the accuracy and reliability of generative AI models with facts fetched from external sources. 1 Generate vector embeddings from user's question. 2 Retrieve relevant documents by running similarity search in a vector database. 3 Construct the RAG prompt to send to the LLM. 4 Get the response back from the LLM in natural language

Slide 11

Slide 11 text

RAG Workflow

Slide 12

Slide 12 text

Vector Embeddings Machine learning technique to represent data in an N-dimensional space. LLM encode meaning behind texts in the embedding space or "latent space". OpenAI's text-embedding-ada-002 model uses 1536 dimensions.

Slide 13

Slide 13 text

(News titles embeddings generated with SBERT embedding model)

Slide 14

Slide 14 text

Similarity Search “Vector search” or “Semantic search” Search by meaning (not keyword search)

Slide 15

Slide 15 text

Embedding Space

Slide 16

Slide 16 text

RAG Prompt instructions to enforce a format or style of response context, i.e. relevant data/documents question, i.e. user's original question prompt = Langchain::Prompt.load_from_path("rag_prompt.y ml") prompt.format(instructions:, context:, question:)

Slide 17

Slide 17 text

Putting it all together (McKinsey: Generative AI technology could drive value across an entire organization by revolutionizing internal knowledge management systems. Knowledge workers spent about one day each work week, searching for and gathering information)

Slide 18

Slide 18 text

Optimizing RAG Human evals ( 👍🏻 / 👎🏼) RAGAS metrics Faithfulness - ensuring retrieved context can act as a justification for the generated answer Context Relevance - context is focused, with little to no irrelevant information Answer Relevance - the answer addresses the actual question ragas = Langchain::Evals::Ragas::Main.new(llm: llm) ragas.score(answer: "", question: "", context: "") #=> { # ragas_score: 0.6601257446503674, # answer_relevance_score: 0.9573145866787608, # context_relevance_score: 0.6666666666666666, # faithfulness_score: 0.5 # }

Slide 19

Slide 19 text

OpenAI optimizing a RAG pipeline for a customer

Slide 20

Slide 20 text

Vector Search DBs X LLMs Matrix Pair up any vector search DB with any LLM. Identical APIs. Lower vendor lock-in. Optionality. Chroma Pgvector Pinecone Weaviate … Google Vertex AI ✅ ✅ ✅ ✅ ✅ AWS Bedrock ✅ ✅ ✅ ✅ ✅ OpenAI ✅ ✅ ✅ ✅ ✅ Local Llama 2 ✅ ✅ ✅ ✅ ✅ … ✅ ✅ ✅ ✅ ✅

Slide 21

Slide 21 text

AI Agents Autonomous (semi-autonomous) general purpose LLM-powered programs Can use Tools (APIs, other systems) Work best with powerful LLMs Can be used to automate workflows/business processes and execute multi-step tasks

Slide 22

Slide 22 text

Demo

Slide 23

Slide 23 text

Recap / Wrapping up 🫡 AI emerging as the centerpiece of each tech stack Generative AI, it's use-cases and problems. RAG Vector embeddings Similarity search RAG prompt (prompt engineering) Evals Ruby ought to adapt and address the growing AI needs.

Slide 24

Slide 24 text

Questions ❓ Twitter: www.twitter.com/@rushing_andrei LinkedIn: www.linkedin.com/in/andreibondarev Email: [email protected] 🤖 Available for contract work.