Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[Rocky Mountain Ruby 2024] - Building AI Agents...

[Rocky Mountain Ruby 2024] - Building AI Agents in Ruby

The Coatue AI report is putting AI models at the centerpiece of all modern tech stacks going forward that Application Devs will be using to build on top of. It would not be controversial to say that the Ruby ecosystem lacks in its support and adoption of AI, ML and DS libraries. If we’d like to stay relevant in the future, we need to start building the foundations now. We’ll look at what Generative AI is, what kind of applications developers in other communities are building and how Ruby can be used to build similar applications today. We’ll cover Retrieval Augmented Generation (RAG), vector embeddings and semantic search, prompt engineering, and what the state of art (SOTA) in evaluating LLM output looks like today. We will also cover AI Agents, semi-autonomous general purpose LLM-backed applications, and what they’re capable of today. We'll make a case why Ruby is a great language to build these applications because of its strengths and its incredible ecosystem. After the slides, we'll build an AI Agent in 15 min.

Andrei Bondarev

October 08, 2024
Tweet

More Decks by Andrei Bondarev

Other Decks in Technology

Transcript

  1. GenAI Impact Before: 1 month Label data 3 months Train

    custom model 3 months Deploy (optimize) After: Few days Prompt engineering Few weeks Basic RAG (if needed) Few days Deploy
  2. (Re-)Rise of AI Agents 1950s 1970s — 1980s 1990s —

    2000s Intelligent Machines Expert Systems 2010s Software Agents 2020s Chatbots LLMs as Agents
  3. AI Agent ƻ Definition: An autonomous software system capable of

    perceiving its environment, making decisions, and taking actions to achieve specific goals. ♻ Environment awareness 2 Decision-making Ƣ Action-taking
  4. Agent vs Assistant Conversational Assistant Conversational system that continuously takes

    directions from a human Autonomous Agent Autonomous system that independently executes a task (like a background job)
  5. Use-cases Automating business processes Mundane low-IQ tasks Personal assistant (co-

    pilot) Time-consuming tasks Tasks in a consulting business: Creating invoices from timesheets Categorizing business expenses Writing project proposals (incl. service offering, meeting notes) Writing job descriptions. Writing JIRA tickets.
  6. Reasoning & Planning Cornerstone for problem-solving, decision-making and critical analysis.

    Primary forms of reasoning Deductive — drawing a specific conclusion from general facts. Inductive — making a broad generalization from specific observations Abductive — finding the simplest explanation for an observation Plan formulation Decomposing a top-level task into numerous sub- tasks. Plan reflection Leveraging feedback mechanism to reflect upon a plan and evaluate its merits.
  7. Chain-of-Thought (CoT) Paper: Chain-of-Thought Prompting Elicits Reasoning in Large Language

    Models (2022) Forcing the AI to explain it's reasoning. Without Chain-of-Thought prompting With Chain-of-Thought prompting
  8. Tool Calling Use tools to do the following: Get data

    from external sources (APIs) Get real-time data Take actions Execute deterministic tasks1 Without Tools Using the Tool (Code Interpreter)
  9. AI Agent diagram LLMs AI Agent Tools Triggers ⏰ Instructions

    Memory User Store/Retriever Take Actions Reason/Plan Business logic Converse
  10. Nerds & Threads Selling comfortable nerdy t-shirts for software engineers

    that work from home AI Agent ú Customer Management ✉ Email Service  Payment Gateway Service Order Management Inventory Management Shipping Service
  11. Business logic (in code) The Ruby on Rails promise: "Developers

    focus on writing business logic and not the 'plumbing'" Old World (before AI) Business logic in models and service objects New World (after AI) Business logic in prompts
  12. Why would you use this? Changing requirements on the fly

    Intelligence in your process Tackling complex workflows
  13. Evaluations Benchmarks Comparing to a large dataset of question-answer pairs.

    "LLM as a Judge" Asking LLM whether the answer fits a list of criteria.
  14. Benchmarks huggingface gretelai/gsm8k-synthetic-diverse-405b · Datasets at Hugging Face We ʼ

    re on a journey to advance and democratize artificial intelligence through open source and open science .
  15. Agent Reliability Responsibilities # of Tasks Decision Tree SIMPLER COMPLEX

    INCREASES Reliability DECREASES RELIABLE UNREALIABLE
  16. System reliability Modern software fails because: AI systems fail because:

    Dependencies Inaccurate or incomplete data / Bias in data Doesn't scale Compute limits Cloud outages Cloud outages Cyber attacks Adversarial attacks Insufficient testing (bugs) Black box behavior Unclear liability & accountability Engineering problems that will be solved.