Upgrade to Pro — share decks privately, control downloads, hide ads and more …

AI Agents 101: Architecting an SRE Kubernetes A...

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.

AI Agents 101: Architecting an SRE Kubernetes Agent

Move beyond passive chatbots. Discover how to build intelligent, autonomous AI agents that can perceive, reason, and act within real-world cloud infrastructure. This presentation breaks down the complete architecture of a custom Site Reliability Engineering (SRE) Agent designed to transform Kubernetes operations from reactive monitoring to autonomous observability.

Avatar for Patrick Eichler

Patrick Eichler

May 15, 2026

More Decks by Patrick Eichler

Other Decks in Technology

Transcript

  1. Move beyond passive chatbots. Discover how autonomous AI agents perceive,

    reason, and act in the real world by connecting to custom data and external tools in Kubernetes. AI Agents 101: Architecting an SRE Kubernetes Agent Patrick Eichler YunaCloud
  2. • Freelancer at YunaCloud (https://yunacloud.com) • Kubernetes Professional (Kubestronaut) •

    Google Cloud Architect (multi-certified) • Site Reliability Engineer (Bridge between Developers and the Infrastructure) Who Am I? Patrick Eichler YunaCloud
  3. • The core intention of my SRE AI Agent is

    to transform Kubernetes operations from reactive monitoring to intelligent, autonomous observability, as well as to actively resolve issues and make changes via GitOps. • By combining the reasoning capabilities of generative AI with the structured, relational data, the agent bridges the gap between raw cluster metrics and human-readable root-cause analysis. • It is built to securely democratize SRE knowledge, reducing Mean Time to Resolution (MTTR) while enforcing strict enterprise-grade guardrails. Demo: SRE Kubernetes AI Agent Patrick Eichler YunaCloud
  4. • Definition: AI agents are software systems that use artificial

    intelligence to autonomously pursue goals and complete complex workflows on behalf of users • How are they different from regular bots? ◦ Autonomy: They make independent decisions to reach a defined goal without hand-holding. ◦ Complexity: They handle complex, multi-step actions rather than reacting to simple commands. ◦ Learning: They employ machine learning to adapt, improve performance, and learn from past mistakes. What is an AI Agent ? Patrick Eichler YunaCloud
  5. • The AI Agent: This is the technical execution code,

    or the hands of the operation. Counterintuitively, the agent is the only dumb actor—it doesn't think or decide; it only executes exactly what the LLM tells it to do. • The Agentic System: This is the complete product or ecosystem you interact with, which includes the user, the LLM, the agent, the tools, and the environment. What is an AI Agent ? Patrick Eichler YunaCloud
  6. • Data Analytics: An agent can act as a data

    engineer. A user can say, Show me why sales dipped in Q3, and the agent will autonomously find the database, clean the data, write the SQL code to analyze it, and generate visual charts. • Healthcare & Life Sciences: Agents can summarize massive amounts of clinical research or help hospitals automate administrative tasks like coordinating a patient’s journey from intake to scheduling, freeing up doctors for actual patient care. • Software Development: Developers use agents to automatically review code repositories, spot bugs, and even generate and test code fixes autonomously. • Customer Service: Instead of a rigid bot that just links to an FAQ page, an AI agent can securely access a customer's specific account, understand their unique problem, and process a complex refund or troubleshooting sequence without human intervention. Real-World Examples Patrick Eichler YunaCloud
  7. • Custom-built Backend-For-Frontend (BFF) application running as container in Kubernetes

    • Backend: NestJS (TypeScript) - Operations Executor • Frontend: Angular - Interactive chat interface • Features: ◦ Executes multiple LLM-driven tool calls (K8s, Prometheus, RAG) concurrently ◦ Backend dynamically merges MCP-provided tools with local RAG tools The Agent Core Patrick Eichler YunaCloud
  8. • Key Benefits of LLM Integration: ◦ Enhanced intent understanding:

    Interprets nuanced, context-rich user prompts. ◦ Rich, human-like responses: Synthesizes retrieved data from multiple services to generate conversational answers. ◦ Advanced multi-step logic: Leverages the LLM’s reasoning capabilities to orchestrate complex sequences The Agent Core with LLM Enhancement Patrick Eichler YunaCloud
  9. • An LLM (Large Language Model) is the center of

    an AI Agent and acts as its brain. It gives the agent the ability to process information, understand language, and reason through problems. • Core Capabilities: It is the engine that gives the agent its ability to process incoming information, comprehend human language, and apply reasoning to solve complex problems. • The "Closed-Book" Constraint: On its own, querying a standard LLM is like asking it to take a "closed-book exam". It must rely entirely on the static knowledge it memorized during its initial training phase. What is an LLM ? Patrick Eichler YunaCloud
  10. • The Risk of Hallucination: Because of this closed-book nature,

    if the LLM does not actually know the answer to a question, it might guess or hallucinate incorrect information unless it is augmented with external, trusted data. • The Compulsion to Generate: By design, an LLM's primary function is to predict the next word and resolve the user's prompt. Because of this core mechanic, its default "instinct" is to produce an answer (confident, plausible-sounding response). The Limitation of LLMs Patrick Eichler YunaCloud
  11. • Stateless (A Blank Slate): The LLM has no inherent

    memory of past interactions once a request is finished. It relies entirely on the context passed to it in the immediate prompt. • Probabilistic (Not Strictly Deterministic): While the model calculates the exact same mathematical probabilities given the same input and settings, it samples from those probabilities to generate the response. This means the actual outcome will vary with each request, resulting in different text or images even from the exact same prompt. • Autoregressive: It generates answers sequentially, predicting the very next word (token) based on all the words that came before it. Patrick Eichler YunaCloud The Limitation of LLMs
  12. • Key Benefits of Memorization: ◦ Resolves Statelessness: The agent

    remembers findings from step 1 to make decisions later based on it. ◦ Personalization: The agents learns user preferences. ◦ Experience Accumulation: Building up historical interactions rather than model retraining The Agent Core with Memorization Patrick Eichler YunaCloud
  13. • Short-Term Memory (STM): Immediate working context (Chat history). Analogous

    to RAM. Resets after a session or when context windows overflows. • Episodic Memory (LTM): Stores specific sequences of past events. “Last time the Database crashed, we rolled back the new configuration”. • Semantic Memory (LTM): Generalized factual knowledge stored as vector embeddings. “The standard timeout for the API is 5 seconds”. Patrick Eichler YunaCloud The Memory Spectrum
  14. • The LLM: Reasoning Engine - Understands intent, formulates plans,

    and makes logical decisions based on prompts. • Memorization: State & Context - Retrieves runbooks (Semantic) and recalls past incident fixes (Episodic) via Redis. • Core Logic: Orchestration & Tools - Manages loops, handles API calls, executes code, and enforces fail-safes. Patrick Eichler YunaCloud What happens when you combine Reasoning, State, and Execution?
  15. • MCP is an open standard that enables AI assistants

    to safely and easily access external data sources and tools. It was published by Anthropic in November 2024 and is hosted by the Linux Foundation. • The Core Analogy: You can think of MCP as the "USB-C for AI". Just like REST standardized resource interactions for web APIs, MCP standardizes how AI models and agent runtimes discover and use tools. What is an MCP? Patrick Eichler YunaCloud
  16. • Goal: To serve as the action layer—or the "Hands"—of

    the SRE AI Agent by physically executing commands against the Kubernetes cluster. • Architecture Placement: It acts as a bridge connecting the AI Agent's core logic directly to the Kubernetes API. • Capabilities: It exposes model-controlled, action-oriented tools that have side effects on the environment. • In a real-world flow, the agent uses this MCP to execute specific commands such as kubectl get pods, kubectl apply, kubectl logs, and kubectl rollout restart to resolve issues. Kubernetes MCP Patrick Eichler YunaCloud
  17. • Goal: To empower the AI Agent with real-time observability

    by allowing it to autonomously check system metrics. • Architecture Placement: It sits behind the agent's core container to interface directly with the Prometheus API. • Capabilities: It functions primarily to provide application-controlled, read-only context and data sources without introducing side effects to the environment. • For example, if an alert fires for high memory usage, the agent uses the Prometheus MCP to check the metrics and understand the state of the cluster. Prometheus MCP Patrick Eichler YunaCloud
  18. • Goal: To connect the AI Agent to your organization's

    version control and code repositories via the GitHub Server. • Architecture Placement: It is deployed as a containerized MCP server connecting the AI core to the GitHub Server. • Capabilities: It enables autonomous software development workflows & the agent can automatically review code repositories, spot bugs, and generate or test code fixes. Github MCP Patrick Eichler YunaCloud
  19. • RAG: Retrieval-Augmented Generation • The easiest way to explain

    RAG is to use the open-book exam analogy: ◦ If you ask a standard LLM a question, it's taking a closed-book exam. It has to rely purely on whatever it memorized during its initial training. If it doesn't know the answer, it might guess (hallucinate). RAG turns it into an open-book exam. It allows the AI to search through a specific, trusted stack of documents to find the exact facts before it writes its answer. What is RAG ? Patrick Eichler YunaCloud
  20. • Serving as the Agent's Long-Term, Private Memory: Ingesting internal

    documentation, SRE runbooks, historical incident postmortems, or architecture wikis. • Instant Deployment Updates: Upload new cluster requirements or policies as documents directly to the vector database. The agent learns these specifications instantly without expensive model retraining, ensuring alignment with the latest standards. Why RAG is Essential for SRE Agents Patrick Eichler YunaCloud
  21. • Real-Time Accuracy: You can simply drop a new PDF

    or policy into a vector database, and the agent instantly knows about it without expensive model retraining. • Hallucination Control: It forces the LLM to answer only based on the retrieved documents in its context window, drastically reducing the chance of it making things up. • Verifiability: Because the agent physically retrieves a document to answer the question, it can reliably cite its sources. Why RAG is Essential for SRE Agents Patrick Eichler YunaCloud
  22. • Hallucinated Tool Calls: The LLM might invoke the wrong

    tool or pass incorrect parameters. • Unintended Loops: Agents can get stuck in infinite reasoning loops without clear stopping conditions. • Over-permissioning: Agents can accidentally drop databases if given root access—even if explicitly prompted not to touch production, they can hallucinate a destructive command. • Compounding Errors: In a multi-step chain, an early mistake can cascade through subsequent steps before anyone notices. • The Solution: Observability, guardrails, and human-in-the-loop design are absolutely non-negotiable. Real Challenges of Agents in Production Patrick Eichler YunaCloud
  23. • Role-Based Access Control (RBAC) & Secure Login: ◦ Implement

    IAM and authenticated login to mitigate production risks such as Over-permissioning and accidental data deletion. ◦ Enable granular security where the agent’s ability to Act is restricted based on the specific permissions of the logged-in user. ◦ Transition from broad access to Human-in-the-loop approval gates based on assigned SRE roles. Roadmap: Upcoming Enhancements Patrick Eichler YunaCloud
  24. • Refined Reliability & GitOps Alignment: ◦ Improve internal System

    Prompts to enhance the agent’s Operating Instructions and step-by-step reasoning reliability. ◦ Standardize the agent’s execution loop to strictly follow GitOps patterns, ensuring all cluster changes are version-controlled and auditable. ◦ Enhance the "Think" phase of the ReAct loop to better align with enterprise-grade deployment guardrails. Roadmap: Upcoming Enhancements Patrick Eichler YunaCloud
  25. • Jira MCP Integration: ◦ Deploy a dedicated Jira MCP

    server to expand the Agentic System into internal ticketing and project management tools. ◦ Empower the agent to perform Action based on tickets in the Kubernetes cluster. ◦ Empower the agent to perform Action functions such as updating ticket statuses or retrieving incident context directly from Jira. Roadmap: Upcoming Enhancements Patrick Eichler YunaCloud
  26. • Beyond Chatbots: AI Agents combine reasoning, memory, and action

    to autonomously resolve issues. • Standardized Integration: The Model Context Protocol (MCP) acts as the universal USB-C for AI, securely connecting models to external tools like Kubernetes, Prometheus, and GitHub. • Grounded & Verifiable: Retrieval-Augmented Generation (RAG) serves as the agent's private, long-term memory. This controls hallucinations by anchoring decisions in your actual runbooks and wikis. • Security is Non-Negotiable: True autonomy requires strict enterprise guardrails. Human-in-the-loop design, RBAC, and LLM Firewalls (like GCP Model Armor) ensure production environments remain safe. Key Takeaways Patrick Eichler YunaCloud
  27. • Beyond Chatbots: Agents don't just answer questions; they perform

    tasks. • The Four Pillars: • The "Brain" (The AI Model): At the center of an AI agent is usually a Large Language Model (LLM). This acts as the brain, giving the agent the ability to process information, understand language, and reason through problems. • Memory: Just like humans, agents need memory to be effective. They use short-term memory to keep track of a current task, and long-term memory to recall historical data and past interactions so they don't have to start from scratch every time. • Planning: An agent can take a massive goal and break it down into smaller, actionable steps. It evaluates potential actions and chooses the best strategic path forward based on the desired outcome. • Tools (Action): An agent isn't trapped in a chat box. It can be connected to outside software—like databases, search engines, or coding environments—allowing it to execute real-world tasks. • Autonomy: Capable of operating with varying degrees of human oversight. The Anatomy and Loop of an Agent Patrick Eichler YunaCloud
  28. • AI agents generally operate on a continuous, intelligent loop:

    • Perception (Reasoning): The agent takes in a prompt or data (like sensory data, or system alerts) and uses its brain to understand the context and what needs to be done. • Planning: It sets goals, creates a step-by-step roadmap, and selects the right digital tools for the job. • Action: It executes the plan. This could mean writing a piece of code, searching the web, analyzing a spreadsheet, querying a database, or sending an email. • Reflection: After acting, the agent evaluates the results. Did the code work? Did the search find the right answer? If not, it learns from the feedback, adjusts its plan, and tries again. The Continuous Loop Patrick Eichler YunaCloud
  29. • The foundational pattern powering almost every modern agent is

    ReAct (Reasoning + Acting). • The agent continuously alternates between three phases: ◦ Think (Reason): The LLM acts as the strategist, deciding what needs to happen. ◦ Act (Execute Tools): The agent executes the specific tools proposed. ◦ Observe (Result): The system takes the result and feeds it back into the context window for the next step. The ReAct Loop (Reason + Act) Patrick Eichler YunaCloud
  30. • 1. Define Foundation & Design ◦ Establish Purpose: Clearly

    define what the agent will do, its use cases, and its limitations. ◦ Craft the Prompt: Design the system prompt to give the agent its specific goals, role, persona, and operating instructions. How to build an AI Agent ? Patrick Eichler YunaCloud
  31. • 2. Integrate Core Components ◦ Choose the LLM: Select

    the right underlying Large Language Model (LLM) by weighing factors like capabilities, cost, and speed. ◦ Equip with Tools: Connect the agent to the outside world using APIs and custom functions. ◦ Build Memory: Set up memory systems (like vector databases or episodic memory) so the agent can remember past interactions and access stored knowledge. How to build an AI Agent ? Patrick Eichler YunaCloud
  32. • The Problem it Solves: Before MCP, connecting AI to

    tools required custom integrations for every single combination, known as the N x M integration problem (e.g., 4 models x 4 tools = 16 custom integrations) . MCP standardizes this, so you build an MCP server once, and any MCP-compatible AI can use it (4 models + 4 servers = 8 total implementations). MCP Architecture & Primitives Patrick Eichler YunaCloud
  33. • The 3-Layer Architecture: ◦ MCP Host: The AI application

    the user interacts with (e.g., your terminal or chatbot). ◦ MCP Client: Lives inside the Host and maintains a dedicated 1:1 connection with one MCP server. ◦ MCP Server: An external, modular process that exposes specific capabilities. MCP Architecture & Primitives Patrick Eichler YunaCloud
  34. • The 3 Primitives exposed by servers: ◦ Tools: Model-controlled,

    action-oriented functions that have side effects (e.g., running kubectl). ◦ Resources: Application-controlled, read-only context and data sources without side effects (e.g., reading logs or API responses). ◦ Prompts: User-controlled, reusable templates for common workflows. MCP Architecture & Primitives Patrick Eichler YunaCloud
  35. • Target Audience: APIs are designed specifically for developers to

    connect applications to services , whereas MCPs are built directly for AI models and agent runtimes to safely interact with tools. • Integration Approach: APIs require custom plumbing and hardcoded integrations. In contrast, MCP features built-in discovery and model-friendly descriptions, allowing the AI to dynamically figure out how to use the tools without custom glue code. • The Execution Workflow: The API workflow relies on manual programming (Developer -> Code -> Endpoint -> Service) , while the MCP workflow is autonomous (LLM -> MCP Hub -> Discover & Use -> Tools). MCP vs. API Patrick Eichler YunaCloud
  36. • Before the AI agent can troubleshoot your cluster, you

    have to give it your internal documentation (like your SRE runbooks, past incident postmortems, and architecture wiki pages): ◦ The system chops these large technical documents into smaller, digestible chunks (like individual paragraphs or configuration blocks). ◦ It then uses a mathematical process to convert the meaning of the text into numbers (called embeddings). ◦ These numbers are stored in a specialized filing system, often called a Vector Database. RAG: Ingestion Patrick Eichler YunaCloud
  37. • When an on-call engineer asks the AI a question

    (e.g., What is the purpose for payment-svc? ), the system doesn't immediately send that question to the LLM. ◦ First, it converts the engineer's question into numbers. ◦ It then searches the Vector Database to find the text chunks that are mathematically most similar to the question. ◦ It pulls out the exact paragraph from your internal architecture wiki specifying the purpose. RAG: Retrieval Patrick Eichler YunaCloud
  38. • Once the relevant facts are retrieved from the Vector

    Database, the system still needs to hand them over to the AI so it can formulate an answer. ◦ Context Injection: The system takes the engineer's original question and physically combines it with the specific text chunks it just retrieved (e.g., the wiki paragraph stating The payment service connect to Paypal on port 8443 ). ◦ The "Open-Book" Prompt: It constructs a massive, consolidated prompt—essentially handing the LLM its "open-book exam" containing your private infrastructure docs—and sends it to the reasoning engine. ◦ Grounded Generation: The LLM reads the injected context alongside the query to generate a highly accurate, grounded response. This allows the AI to accurately answer questions about your private environment and even cite its internal sources, without hallucinating incorrect configurations. RAG: Augmentation Patrick Eichler YunaCloud
  39. • Coordinator (Dynamic Router): A central coordinator agent decomposes a

    request and dispatches subtasks to specialized agents. Ideal for workflows needing adaptive routing at runtime. • Human-in-the-loop: The workflow explicitly pauses at checkpoints for a person to approve, correct, or provide input before continuing. This is highly critical for high-stakes, destructive operations. • Review & Critique: A generator produces output, and a critic evaluates it against criteria and approves or returns feedback. Agent Design Patterns Patrick Eichler YunaCloud
  40. • The LLM Firewall: A fully managed service that acts

    as an infrastructure-level firewall, screening both inbound prompts and outbound responses in real-time. • Prompt Injection & Jailbreak Defense: Detects and blocks manipulative inputs trying to override system instructions or hijack the agent's tools (e.g., preventing an attacker from executing malicious kubectl commands). GCP Model Armor Patrick Eichler YunaCloud
  41. • Sensitive Data Leakage Prevention: Integrates with Sensitive Data Protection

    (SDP) to filter, mask, or block API keys, cloud credentials, or PII from being leaked in outputs or passed to external APIs. • Malicious URL & Content Filtering: Identifies and blocks phishing links and harmful content embedded in prompts or generated responses. • GKE Native Integration: Integrates seamlessly with the GKE Inference Gateway, applying centralized security policies directly at the infrastructure level. GCP Model Armor Patrick Eichler YunaCloud
  42. ▶ Resources https://cloud.google.com/ use-cases/ai-agents Patrick Eichler Cloud Computing, Cloud Technologies

    & IoT - SRH Berlin https://cloud.google.com/secu rity/products/model-armor https://modelcontextprotocol.io https://github.com/modelcontextprotocol/servers