Slide 1

Slide 1 text

Building LLM-powered applications in Ruby Wroclove.rb 2024 April 14, 2024 by Andrei Bondarev

Slide 2

Slide 2 text

My Impact

Slide 3

Slide 3 text

What is Generative AI? Large Language Models (LLMs) Deep Learning Artificial Neural Networks (models) with general-purpose language understanding and generation. Exploded in popularity after the Attention Is All You Need (2017) research paper that introduced the Transformers architecture.

Slide 4

Slide 4 text

The GenAI Impact Supervised learning 1 month Get labeled data 3 months Train AI model on data 3 months Deploy (run) model Prompt-based AI Few days Specify prompt Few days Deploy model

Slide 5

Slide 5 text

LLMs excel at Structuring Data Summarizing Data Classifying Data Translating Languages Generating Content Answering Questions

Slide 6

Slide 6 text

LLM in every stack

Slide 7

Slide 7 text

AI-centric tech stack

Slide 8

Slide 8 text

Decision tree traversal

Slide 9

Slide 9 text

Business logic (in code) The Ruby on Rails promise: "Developers focus on writing business logic and not reinventing engineering solutions" Old World (before AI) Business logic in models and service objects New World (after AI) Business logic in prompts

Slide 10

Slide 10 text

Business logic (in prompts)

Slide 11

Slide 11 text

AI Agents ƻ Autonomous (semi-autonomous) general purpose LLM-powered programs Can use Tools (APIs, other systems) via 'Function Calling' Work best with powerful LLMs Can be used to automate workflows/business processes and execute multi-step tasks

Slide 12

Slide 12 text

Agent Reliability Responsibilities # of Tasks Decision Tree SIMPLER COMPLEX INCREASES Reliability DECREASES RELIABLE UNREALIABLE

Slide 13

Slide 13 text

Reliability x Focus Reliable Unreliable General Focused AGI Today Pre-Transformers Today Today

Slide 14

Slide 14 text

Slow adoption 1 Fast changing 2 IP / Copyright issues 3 Lack of tooling 4 Risky THE CITY - NYC News NYC AI Chatbot Touted by Adams Tells Businesses to Break the Law The Microsoft-powered bot says bosses can take workerʼs tips and that landlords can discriminate based on source of income. That's not right. aibusiness.com Air Canada Held Responsible for Chatbotʼs Hallucinations Air Canada's chatbot gave a traveler wrong airfare information. The traveler sued when the airline refused to give a refund. GM Authority GM Dealer Chat Bot Agrees To Sell 2024 Chevy Tahoe For $1 One customer recently managed to trick a dealer chat bot to agree to sell them a new 2024 Chevy Tahoe for just $1.

Slide 15

Slide 15 text

Prompt Engineering Alchemy Ŀ Randomness ¯\_(ツ)_/¯ [Large Language Models as Optimizers](https://arxiv.org/pdf/2309.03409.pdf)

Slide 16

Slide 16 text

Jailbreaking The process of getting a GenAI model to do or say unintended things through prompting Many-shot jailbreaking — including large amounts of text in a specific configuration that forces LLMs to produce potentially harmful responses, despite their being trained not to do so.

Slide 17

Slide 17 text

LLM Deficiencies Hallucinations Model generating incorrect or non-sensical text Outdated Data Example: GPT-4 was trained on data up to April 2023 Relevant Knowledge is Not Used

Slide 18

Slide 18 text

Retrieval Augmented Generation (RAG)

Slide 19

Slide 19 text

Detailed explanation 1 Generate vector embeddings from user's question. 2 Retrieve relevant documents by running similarity search in a vector database. 3 Construct the RAG prompt to send to the LLM. 4 Get the response back from the LLM in natural language

Slide 20

Slide 20 text

Vector embeddings An approach to cluster data in an N-dimensional space, organized by its meaning. LLMs encode meaning behind texts in the embedding space or "latent space". OpenAI's text-embedding-ada-002 model uses 1536 dimensions.

Slide 21

Slide 21 text

Vector or "Latent" space

Slide 22

Slide 22 text

Semantic/Vector search

Slide 23

Slide 23 text

RAG prompt

Slide 24

Slide 24 text

Naive RAG

Slide 25

Slide 25 text

Evals Human evals ( / ) GPT-4 RAGAS metric (0.0 - 1.0 score) Faithfulness Ensuring retrieved context can act as a justification for the generated answer Context Relevance Context is focused, with little to no irrelevant information Answer Relevance The answer addresses the actual question

Slide 26

Slide 26 text

Advanced RAG Strategy

Slide 27

Slide 27 text

Interfaces ❌ Open chat ✅ Intelligent quick actions

Slide 28

Slide 28 text

langchainrb ⭐ Most popular Ruby framework for building LLM-powered applications Use-cases: LLMs Agents Retrieval-Augmented Generation

Slide 29

Slide 29 text

Langchain.rb Approach Vendor-agnostic Support most popular providers Best-practices Staying on top of emerging research Batteries-included Ease of use

Slide 30

Slide 30 text

Langchain.rb Approach llm = Langchain::LLM::OpenAI.new # Cohere, Gemini, Mistral, and more llm.chat() llm.embed() Common interface for LLMs: vector_db = Langchain::LLM::Weaviate.new # Chroma, Qdrant, and more vector_db.add_texts() vector_db.similarity_search() vector_db.ask() # Naive RAG Common interface for vectorsearch DBs:

Slide 31

Slide 31 text

Building an Assistant

Slide 32

Slide 32 text

Why (not) Ruby? Pragmatism Focus on business outcomes OOP principles "There's a gem for that"

Slide 33

Slide 33 text

My Most Popular Tweet Twitter Andrei Bondarev on Twitter / X 1. Ruby Developer is sad a library exists in Python but not in Ruby.2. Ruby Developer copy- pastes said library' s files one by one into ChatGPT.3. ChatGPT converts Python code to Ruby code.4. Ruby Developer fixes a few undefined errors and imports the correct…

Slide 34

Slide 34 text

Learnings from OSS Be responsive Be friendly , Be helpful Ƽ

Slide 35

Slide 35 text

Langchain.rb Contributors

Slide 36

Slide 36 text

Langchain.rb Discord

Slide 37

Slide 37 text

"We do not have it all figured out yet…" "But we're optimistic!" —AI Engineering Community

Slide 38

Slide 38 text

Thank you! Questions? @rushing_andrei @andreibondarev in/andreibondarev [email protected]