AI Agents 101: Architecting an SRE Kubernetes Agent

Move beyond passive chatbots. Discover how autonomous AI agents perceive,
reason, and act in the real world by connecting to custom data and external tools in Kubernetes. AI Agents 101: Architecting an SRE Kubernetes Agent Patrick Eichler YunaCloud

• Freelancer at YunaCloud (https://yunacloud.com) • Kubernetes Professional (Kubestronaut) •
Google Cloud Architect (multi-certiﬁed) • Site Reliability Engineer (Bridge between Developers and the Infrastructure) Who Am I? Patrick Eichler YunaCloud

• The core intention of my SRE AI Agent is
to transform Kubernetes operations from reactive monitoring to intelligent, autonomous observability, as well as to actively resolve issues and make changes via GitOps. • By combining the reasoning capabilities of generative AI with the structured, relational data, the agent bridges the gap between raw cluster metrics and human-readable root-cause analysis. • It is built to securely democratize SRE knowledge, reducing Mean Time to Resolution (MTTR) while enforcing strict enterprise-grade guardrails. Demo: SRE Kubernetes AI Agent Patrick Eichler YunaCloud

• Definition: AI agents are software systems that use artificial
intelligence to autonomously pursue goals and complete complex workflows on behalf of users • How are they different from regular bots? ◦ Autonomy: They make independent decisions to reach a defined goal without hand-holding. ◦ Complexity: They handle complex, multi-step actions rather than reacting to simple commands. ◦ Learning: They employ machine learning to adapt, improve performance, and learn from past mistakes. What is an AI Agent ? Patrick Eichler YunaCloud

• The AI Agent: This is the technical execution code,
or the hands of the operation. Counterintuitively, the agent is the only dumb actor—it doesn't think or decide; it only executes exactly what the LLM tells it to do. • The Agentic System: This is the complete product or ecosystem you interact with, which includes the user, the LLM, the agent, the tools, and the environment. What is an AI Agent ? Patrick Eichler YunaCloud

• Data Analytics: An agent can act as a data
engineer. A user can say, Show me why sales dipped in Q3, and the agent will autonomously find the database, clean the data, write the SQL code to analyze it, and generate visual charts. • Healthcare & Life Sciences: Agents can summarize massive amounts of clinical research or help hospitals automate administrative tasks like coordinating a patient’s journey from intake to scheduling, freeing up doctors for actual patient care. • Software Development: Developers use agents to automatically review code repositories, spot bugs, and even generate and test code fixes autonomously. • Customer Service: Instead of a rigid bot that just links to an FAQ page, an AI agent can securely access a customer's specific account, understand their unique problem, and process a complex refund or troubleshooting sequence without human intervention. Real-World Examples Patrick Eichler YunaCloud

Demo: Architecture Patrick Eichler YunaCloud

• Custom-built Backend-For-Frontend (BFF) application running as container in Kubernetes
• Backend: NestJS (TypeScript) - Operations Executor • Frontend: Angular - Interactive chat interface • Features: ◦ Executes multiple LLM-driven tool calls (K8s, Prometheus, RAG) concurrently ◦ Backend dynamically merges MCP-provided tools with local RAG tools The Agent Core Patrick Eichler YunaCloud

• Key Beneﬁts of LLM Integration: ◦ Enhanced intent understanding:
Interprets nuanced, context-rich user prompts. ◦ Rich, human-like responses: Synthesizes retrieved data from multiple services to generate conversational answers. ◦ Advanced multi-step logic: Leverages the LLM’s reasoning capabilities to orchestrate complex sequences The Agent Core with LLM Enhancement Patrick Eichler YunaCloud

• An LLM (Large Language Model) is the center of
an AI Agent and acts as its brain. It gives the agent the ability to process information, understand language, and reason through problems. • Core Capabilities: It is the engine that gives the agent its ability to process incoming information, comprehend human language, and apply reasoning to solve complex problems. • The "Closed-Book" Constraint: On its own, querying a standard LLM is like asking it to take a "closed-book exam". It must rely entirely on the static knowledge it memorized during its initial training phase. What is an LLM ? Patrick Eichler YunaCloud

• The Risk of Hallucination: Because of this closed-book nature,
if the LLM does not actually know the answer to a question, it might guess or hallucinate incorrect information unless it is augmented with external, trusted data. • The Compulsion to Generate: By design, an LLM's primary function is to predict the next word and resolve the user's prompt. Because of this core mechanic, its default "instinct" is to produce an answer (conﬁdent, plausible-sounding response). The Limitation of LLMs Patrick Eichler YunaCloud

• Stateless (A Blank Slate): The LLM has no inherent
memory of past interactions once a request is ﬁnished. It relies entirely on the context passed to it in the immediate prompt. • Probabilistic (Not Strictly Deterministic): While the model calculates the exact same mathematical probabilities given the same input and settings, it samples from those probabilities to generate the response. This means the actual outcome will vary with each request, resulting in different text or images even from the exact same prompt. • Autoregressive: It generates answers sequentially, predicting the very next word (token) based on all the words that came before it. Patrick Eichler YunaCloud The Limitation of LLMs

• Key Beneﬁts of Memorization: ◦ Resolves Statelessness: The agent
remembers ﬁndings from step 1 to make decisions later based on it. ◦ Personalization: The agents learns user preferences. ◦ Experience Accumulation: Building up historical interactions rather than model retraining The Agent Core with Memorization Patrick Eichler YunaCloud

• Short-Term Memory (STM): Immediate working context (Chat history). Analogous
to RAM. Resets after a session or when context windows overflows. • Episodic Memory (LTM): Stores specific sequences of past events. “Last time the Database crashed, we rolled back the new configuration”. • Semantic Memory (LTM): Generalized factual knowledge stored as vector embeddings. “The standard timeout for the API is 5 seconds”. Patrick Eichler YunaCloud The Memory Spectrum

• The LLM: Reasoning Engine - Understands intent, formulates plans,
and makes logical decisions based on prompts. • Memorization: State & Context - Retrieves runbooks (Semantic) and recalls past incident ﬁxes (Episodic) via Redis. • Core Logic: Orchestration & Tools - Manages loops, handles API calls, executes code, and enforces fail-safes. Patrick Eichler YunaCloud What happens when you combine Reasoning, State, and Execution?

• MCP is an open standard that enables AI assistants
to safely and easily access external data sources and tools. It was published by Anthropic in November 2024 and is hosted by the Linux Foundation. • The Core Analogy: You can think of MCP as the "USB-C for AI". Just like REST standardized resource interactions for web APIs, MCP standardizes how AI models and agent runtimes discover and use tools. What is an MCP? Patrick Eichler YunaCloud

MCP: Demo Architecture Patrick Eichler YunaCloud

• Goal: To serve as the action layer—or the "Hands"—of
the SRE AI Agent by physically executing commands against the Kubernetes cluster. • Architecture Placement: It acts as a bridge connecting the AI Agent's core logic directly to the Kubernetes API. • Capabilities: It exposes model-controlled, action-oriented tools that have side effects on the environment. • In a real-world ﬂow, the agent uses this MCP to execute speciﬁc commands such as kubectl get pods, kubectl apply, kubectl logs, and kubectl rollout restart to resolve issues. Kubernetes MCP Patrick Eichler YunaCloud

Kubernetes MCP Patrick Eichler YunaCloud

• Goal: To empower the AI Agent with real-time observability
by allowing it to autonomously check system metrics. • Architecture Placement: It sits behind the agent's core container to interface directly with the Prometheus API. • Capabilities: It functions primarily to provide application-controlled, read-only context and data sources without introducing side effects to the environment. • For example, if an alert ﬁres for high memory usage, the agent uses the Prometheus MCP to check the metrics and understand the state of the cluster. Prometheus MCP Patrick Eichler YunaCloud

Prometheus MCP Patrick Eichler YunaCloud

• Goal: To connect the AI Agent to your organization's
version control and code repositories via the GitHub Server. • Architecture Placement: It is deployed as a containerized MCP server connecting the AI core to the GitHub Server. • Capabilities: It enables autonomous software development workﬂows & the agent can automatically review code repositories, spot bugs, and generate or test code ﬁxes. Github MCP Patrick Eichler YunaCloud

Github MCP Patrick Eichler YunaCloud

• RAG: Retrieval-Augmented Generation • The easiest way to explain
RAG is to use the open-book exam analogy: ◦ If you ask a standard LLM a question, it's taking a closed-book exam. It has to rely purely on whatever it memorized during its initial training. If it doesn't know the answer, it might guess (hallucinate). RAG turns it into an open-book exam. It allows the AI to search through a speciﬁc, trusted stack of documents to ﬁnd the exact facts before it writes its answer. What is RAG ? Patrick Eichler YunaCloud

• Serving as the Agent's Long-Term, Private Memory: Ingesting internal
documentation, SRE runbooks, historical incident postmortems, or architecture wikis. • Instant Deployment Updates: Upload new cluster requirements or policies as documents directly to the vector database. The agent learns these speciﬁcations instantly without expensive model retraining, ensuring alignment with the latest standards. Why RAG is Essential for SRE Agents Patrick Eichler YunaCloud

• Real-Time Accuracy: You can simply drop a new PDF
or policy into a vector database, and the agent instantly knows about it without expensive model retraining. • Hallucination Control: It forces the LLM to answer only based on the retrieved documents in its context window, drastically reducing the chance of it making things up. • Veriﬁability: Because the agent physically retrieves a document to answer the question, it can reliably cite its sources. Why RAG is Essential for SRE Agents Patrick Eichler YunaCloud

SRE Agent RAG Retrieval Patrick Eichler YunaCloud

• Hallucinated Tool Calls: The LLM might invoke the wrong
tool or pass incorrect parameters. • Unintended Loops: Agents can get stuck in inﬁnite reasoning loops without clear stopping conditions. • Over-permissioning: Agents can accidentally drop databases if given root access—even if explicitly prompted not to touch production, they can hallucinate a destructive command. • Compounding Errors: In a multi-step chain, an early mistake can cascade through subsequent steps before anyone notices. • The Solution: Observability, guardrails, and human-in-the-loop design are absolutely non-negotiable. Real Challenges of Agents in Production Patrick Eichler YunaCloud

• Role-Based Access Control (RBAC) & Secure Login: ◦ Implement
IAM and authenticated login to mitigate production risks such as Over-permissioning and accidental data deletion. ◦ Enable granular security where the agent’s ability to Act is restricted based on the speciﬁc permissions of the logged-in user. ◦ Transition from broad access to Human-in-the-loop approval gates based on assigned SRE roles. Roadmap: Upcoming Enhancements Patrick Eichler YunaCloud

• Reﬁned Reliability & GitOps Alignment: ◦ Improve internal System
Prompts to enhance the agent’s Operating Instructions and step-by-step reasoning reliability. ◦ Standardize the agent’s execution loop to strictly follow GitOps patterns, ensuring all cluster changes are version-controlled and auditable. ◦ Enhance the "Think" phase of the ReAct loop to better align with enterprise-grade deployment guardrails. Roadmap: Upcoming Enhancements Patrick Eichler YunaCloud

• Jira MCP Integration: ◦ Deploy a dedicated Jira MCP
server to expand the Agentic System into internal ticketing and project management tools. ◦ Empower the agent to perform Action based on tickets in the Kubernetes cluster. ◦ Empower the agent to perform Action functions such as updating ticket statuses or retrieving incident context directly from Jira. Roadmap: Upcoming Enhancements Patrick Eichler YunaCloud

• Beyond Chatbots: AI Agents combine reasoning, memory, and action
to autonomously resolve issues. • Standardized Integration: The Model Context Protocol (MCP) acts as the universal USB-C for AI, securely connecting models to external tools like Kubernetes, Prometheus, and GitHub. • Grounded & Veriﬁable: Retrieval-Augmented Generation (RAG) serves as the agent's private, long-term memory. This controls hallucinations by anchoring decisions in your actual runbooks and wikis. • Security is Non-Negotiable: True autonomy requires strict enterprise guardrails. Human-in-the-loop design, RBAC, and LLM Firewalls (like GCP Model Armor) ensure production environments remain safe. Key Takeaways Patrick Eichler YunaCloud

Thank you! Questions? Patrick Eichler Cloud Computing, Cloud Technologies &
IoT - SRH Berlin

More Useful Stuff For the Home Patrick Eichler YunaCloud

• Beyond Chatbots: Agents don't just answer questions; they perform
tasks. • The Four Pillars: • The "Brain" (The AI Model): At the center of an AI agent is usually a Large Language Model (LLM). This acts as the brain, giving the agent the ability to process information, understand language, and reason through problems. • Memory: Just like humans, agents need memory to be effective. They use short-term memory to keep track of a current task, and long-term memory to recall historical data and past interactions so they don't have to start from scratch every time. • Planning: An agent can take a massive goal and break it down into smaller, actionable steps. It evaluates potential actions and chooses the best strategic path forward based on the desired outcome. • Tools (Action): An agent isn't trapped in a chat box. It can be connected to outside software—like databases, search engines, or coding environments—allowing it to execute real-world tasks. • Autonomy: Capable of operating with varying degrees of human oversight. The Anatomy and Loop of an Agent Patrick Eichler YunaCloud

The Anatomy and Loop of an Agent Patrick Eichler YunaCloud

• AI agents generally operate on a continuous, intelligent loop:
• Perception (Reasoning): The agent takes in a prompt or data (like sensory data, or system alerts) and uses its brain to understand the context and what needs to be done. • Planning: It sets goals, creates a step-by-step roadmap, and selects the right digital tools for the job. • Action: It executes the plan. This could mean writing a piece of code, searching the web, analyzing a spreadsheet, querying a database, or sending an email. • Reﬂection: After acting, the agent evaluates the results. Did the code work? Did the search ﬁnd the right answer? If not, it learns from the feedback, adjusts its plan, and tries again. The Continuous Loop Patrick Eichler YunaCloud

• The foundational pattern powering almost every modern agent is
ReAct (Reasoning + Acting). • The agent continuously alternates between three phases: ◦ Think (Reason): The LLM acts as the strategist, deciding what needs to happen. ◦ Act (Execute Tools): The agent executes the speciﬁc tools proposed. ◦ Observe (Result): The system takes the result and feeds it back into the context window for the next step. The ReAct Loop (Reason + Act) Patrick Eichler YunaCloud

Simpliﬁed Architecture Patrick Eichler YunaCloud

• 1. Define Foundation & Design ◦ Establish Purpose: Clearly
define what the agent will do, its use cases, and its limitations. ◦ Craft the Prompt: Design the system prompt to give the agent its specific goals, role, persona, and operating instructions. How to build an AI Agent ? Patrick Eichler YunaCloud

• 2. Integrate Core Components ◦ Choose the LLM: Select
the right underlying Large Language Model (LLM) by weighing factors like capabilities, cost, and speed. ◦ Equip with Tools: Connect the agent to the outside world using APIs and custom functions. ◦ Build Memory: Set up memory systems (like vector databases or episodic memory) so the agent can remember past interactions and access stored knowledge. How to build an AI Agent ? Patrick Eichler YunaCloud

• The Problem it Solves: Before MCP, connecting AI to
tools required custom integrations for every single combination, known as the N x M integration problem (e.g., 4 models x 4 tools = 16 custom integrations) . MCP standardizes this, so you build an MCP server once, and any MCP-compatible AI can use it (4 models + 4 servers = 8 total implementations). MCP Architecture & Primitives Patrick Eichler YunaCloud

• The 3-Layer Architecture: ◦ MCP Host: The AI application
the user interacts with (e.g., your terminal or chatbot). ◦ MCP Client: Lives inside the Host and maintains a dedicated 1:1 connection with one MCP server. ◦ MCP Server: An external, modular process that exposes speciﬁc capabilities. MCP Architecture & Primitives Patrick Eichler YunaCloud

• The 3 Primitives exposed by servers: ◦ Tools: Model-controlled,
action-oriented functions that have side effects (e.g., running kubectl). ◦ Resources: Application-controlled, read-only context and data sources without side effects (e.g., reading logs or API responses). ◦ Prompts: User-controlled, reusable templates for common workﬂows. MCP Architecture & Primitives Patrick Eichler YunaCloud

• Target Audience: APIs are designed specifically for developers to
connect applications to services , whereas MCPs are built directly for AI models and agent runtimes to safely interact with tools. • Integration Approach: APIs require custom plumbing and hardcoded integrations. In contrast, MCP features built-in discovery and model-friendly descriptions, allowing the AI to dynamically figure out how to use the tools without custom glue code. • The Execution Workflow: The API workflow relies on manual programming (Developer -> Code -> Endpoint -> Service) , while the MCP workflow is autonomous (LLM -> MCP Hub -> Discover & Use -> Tools). MCP vs. API Patrick Eichler YunaCloud

• Before the AI agent can troubleshoot your cluster, you
have to give it your internal documentation (like your SRE runbooks, past incident postmortems, and architecture wiki pages): ◦ The system chops these large technical documents into smaller, digestible chunks (like individual paragraphs or conﬁguration blocks). ◦ It then uses a mathematical process to convert the meaning of the text into numbers (called embeddings). ◦ These numbers are stored in a specialized ﬁling system, often called a Vector Database. RAG: Ingestion Patrick Eichler YunaCloud

• When an on-call engineer asks the AI a question
(e.g., What is the purpose for payment-svc? ), the system doesn't immediately send that question to the LLM. ◦ First, it converts the engineer's question into numbers. ◦ It then searches the Vector Database to ﬁnd the text chunks that are mathematically most similar to the question. ◦ It pulls out the exact paragraph from your internal architecture wiki specifying the purpose. RAG: Retrieval Patrick Eichler YunaCloud

• Once the relevant facts are retrieved from the Vector
Database, the system still needs to hand them over to the AI so it can formulate an answer. ◦ Context Injection: The system takes the engineer's original question and physically combines it with the speciﬁc text chunks it just retrieved (e.g., the wiki paragraph stating The payment service connect to Paypal on port 8443 ). ◦ The "Open-Book" Prompt: It constructs a massive, consolidated prompt—essentially handing the LLM its "open-book exam" containing your private infrastructure docs—and sends it to the reasoning engine. ◦ Grounded Generation: The LLM reads the injected context alongside the query to generate a highly accurate, grounded response. This allows the AI to accurately answer questions about your private environment and even cite its internal sources, without hallucinating incorrect conﬁgurations. RAG: Augmentation Patrick Eichler YunaCloud

• Coordinator (Dynamic Router): A central coordinator agent decomposes a
request and dispatches subtasks to specialized agents. Ideal for workﬂows needing adaptive routing at runtime. • Human-in-the-loop: The workﬂow explicitly pauses at checkpoints for a person to approve, correct, or provide input before continuing. This is highly critical for high-stakes, destructive operations. • Review & Critique: A generator produces output, and a critic evaluates it against criteria and approves or returns feedback. Agent Design Patterns Patrick Eichler YunaCloud

• The LLM Firewall: A fully managed service that acts
as an infrastructure-level ﬁrewall, screening both inbound prompts and outbound responses in real-time. • Prompt Injection & Jailbreak Defense: Detects and blocks manipulative inputs trying to override system instructions or hijack the agent's tools (e.g., preventing an attacker from executing malicious kubectl commands). GCP Model Armor Patrick Eichler YunaCloud

• Sensitive Data Leakage Prevention: Integrates with Sensitive Data Protection
(SDP) to ﬁlter, mask, or block API keys, cloud credentials, or PII from being leaked in outputs or passed to external APIs. • Malicious URL & Content Filtering: Identiﬁes and blocks phishing links and harmful content embedded in prompts or generated responses. • GKE Native Integration: Integrates seamlessly with the GKE Inference Gateway, applying centralized security policies directly at the infrastructure level. GCP Model Armor Patrick Eichler YunaCloud

▶ Resources https://cloud.google.com/ use-cases/ai-agents Patrick Eichler Cloud Computing, Cloud Technologies
& IoT - SRH Berlin https://cloud.google.com/secu rity/products/model-armor https://modelcontextprotocol.io https://github.com/modelcontextprotocol/servers

AI Agents 101: Architecting an SRE Kubernetes A...

AI Agents 101: Architecting an SRE Kubernetes Agent

More Decks by Patrick Eichler

Other Decks in Technology

Featured

Transcript