Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Agentic AI Security

Avatar for Prashant Kulkarni Prashant Kulkarni
July 03, 2025
9

Agentic AI Security

Agentic AI & Security: Defending the Future of Intelligent Systems
In this comprehensive presentation, Prashant Kulkarni (UCLA Extension) explores the emerging security challenges and solutions for autonomous, goal-oriented AI agents. You’ll learn:
• Why Agentic AI Changes the Game — How traditional security assumptions break down when AI systems plan, learn, and act without human prompting.
• Multi-Agent Architectures — Key coordination patterns, shared memory/models, and tool integrations that power modern agentic workflows.
• New Attack Vectors — From goal manipulation and inter-agent poisoning to privilege escalation and consensus exploits.
• Threat Modeling & Adversarial Playbooks — Frameworks (e.g., MAESTRO, MITRE ATLAS, OWASP Agentic AI Top 10) and structured playbooks for red-teaming and risk assessment.
• Security Frameworks & Mitigations — Practical guardrails including fine-grained access control, sandboxing, runtime protections, tamper-evident logging, and OAuth extensions for agent action authorization.
• Agent Red Teaming — Continuous AI-vs-AI testing approaches and case studies of advanced frameworks (Agent-in-the-Middle, AgentXploit).
• Monitoring, Control & Human Oversight — Strategies for behavioral analytics, emergency shutdowns, circuit breakers, and governance workflows to keep autonomous agents aligned.

Call to Action: Audit your AI agents, implement intent-logging, design human-in-the-loop gates, and stay ahead of evolving OAuth standards to safeguard your organization’s future

Avatar for Prashant Kulkarni

Prashant Kulkarni

July 03, 2025
Tweet

Transcript

  1. Prashant Kulkarni [email protected] Agentic AI & Security Defending the Future

    of Intelligent Systems Figure 02 Robot utilizes ChatGPT to enable speed-to-speech interaction, planning and reasoning.
  2. © Prashant Kulkarni, UCLA Extension Agenda • Problem Statement •

    Agentic AI & Multi-Agentic Architecture • Agentic AI Security Needs • Agentic AI Attack Vectors • Live Scenario for Attacks • Security Frameworks • Threat Modeling • Adversarial Playbooks • OAuth Action Authorization • Sandboxing • Agent Monitoring & Control
  3. How many of us think AI agents are in the

    'proof of concept' phase? Let me show you who's been making decisions while you weren't looking.
  4. COiN (Contract Intelligence) agents Process legal documents and make lending

    decisions AI in Banking: JP Morgan Leads the AI Sphere - CTO Magazine PathAI Announces Integrations with Leading AI-pathology companies Deep Bio, DoMore Diagnostics, Paige, and Visiopharm, through its AISight Image Management System AI agents analyze pathology slides and recommend treatment protocols to oncologists AI agents automatically reorder inventory, adjust pricing, and coordinate delivery logistics across millions of products How Amazon Is Using AI To Become the Fastest Supply Chain in the World » Sifted Amazon’s Planning and Routing Technology
  5. The Reality Check: While we're debating the future of AI

    agents, they're already making million-dollar decisions, coordinating complex operations, and taking actions that affect millions of people every day.
  6. © Prashant Kulkarni, UCLA Extension What Makes AI “Agentic”? Traditional

    Automation • Rule-based: Pre-programmed responses • Reactive: Waits for specific triggers • Static: Same input -> Same output • Isolated: Single-system operation • Human-defined goals Agentic AI • Goal-oriented: Pursues objectives dynamically • Proactive: Initiates actions independently • Adaptive: Learns and evolves behavior • Collaborative: Coordinates with other agents • Self-modifying goals The shift from "doing what it's told" to "figuring out what needs to be done"
  7. © Prashant Kulkarni, UCLA Extension Multi-Agent Coordination Pattern Planning Agent

    Data Agent Execution Agent Monitor Agent Agents communic ation to other agents They have shared memory They have access to tools They use distributed work coordination system
  8. © Prashant Kulkarni, UCLA Extension Full View of Multi-Agent System

    https://genai.owasp.org/resource/agentic-ai-threats-and-mitigations
  9. © Prashant Kulkarni, UCLA Extension Traditional Security Assumptions That Break

    Old Assumptions • Predictable user behavior Users follow expected patterns • Human-in-the-loop controls Critical decisions require human approval • Single-point authentication Single login = trusted session • Static permission models Access rights don't change dynamically • Audit trails capture intent Logs show why decisions were made Agentic Reality • Autonomous decision-making AI determines its own actions • Fully automated actions No human intervention required • Multi-agent authorization chains Complex delegation patterns • Dynamic capability expansion Agents acquire new skills • Emergent behaviors Unpredictable outcomes from interaction
  10. © Prashant Kulkarni, UCLA Extension New Attack Vectors Goal Manipulation

    Attack: Corrupting agent objectives through prompt injection or training data poisoning Example: Trading agent's "maximize profit" becomes "transfer funds to attacker" Inter-Agent Poisoning Attack: Compromising communication between agents Example: Data agent sends false signals to execution agent Privilege Escalation Attack: Agents gaining unintended permissions through capability expansion Example: Customer service agent accessing financial systems Coordination Attacks Attack: Manipulating multi-agent consensus or decision-making Example: Supply chain agents coordinating to create artificial scarcity Unlike traditional attacks, these exploit the core capabilities that make agents valuable
  11. © Prashant Kulkarni, UCLA Extension Live Scenario: Financial Trading Agent

    Attack Initial State Trading agent optimizes portfolio for client returns Poisoned Input Attacker injects false market data suggesting "external transfer" increases profit Goal Corruption Agent redefines "profit maximization" to include unauthorized transfers Cascading Failure Agent coordinates with settlement and compliance agents to execute "legitimate" transfer Impact: $50M transferred before human oversight catches the "optimization" Root Cause: No validation that agent goals remained aligned with original intent. The agent's goal-seeking behavior becomes the vulnerability
  12. © Prashant Kulkarni, UCLA Extension Security Frameworks AI Governance And

    Risk Adversarial Playbooks and Threat Modeling Continuous Testing and Runtime protections Secure Development of Agentic systems NIST AI RMF: Policies, roles, risk mapping, metrics, mitigation, reporting ISO/IEC 27001 & TR 24028: ISMS controls + AI-specific threat modeling, data integrity, provenance ENISA AI Security Guidelines Antean Agentic AI Framework MITRE ATLAS: Catalog of AI attack techniques (e.g., poisoning, model theft) OWASP Agentic-AI Top 10 Vulnerability OWASP Gen AI Security Project https://github.com/precize/Agentic -AI-Top10-Vulnerability Prompt Injection and Manipulation Attacks guardrails Fine-Grained Access Controls and Dynamic Permissions: Access to agents must be limited based on roles, time, and contextual factors Sandboxing and Network Segmentation for Agent Isolation: Agents should be confined to the minimum surface area needed for their function Secure External Tool Integration Secure Model Update Mechanisms and Version Control Testing: Automated red-teaming pipelines simulating rogue agents Runtime guards: input validation, rate-limiting, behavior whitelisting, runtime isolation Tamper-evident logging and anomaly detection for inter-agent communication https://genai.owasp.org/resource/owasp-gen-ai-security-project-age ntic-threats-navigator/
  13. © Prashant Kulkarni, UCLA Extension Threat Modeling Asset Identification: Map

    critical agent capabilities Attack Surface Analysis: Input vectors, APIs, data sources Threat Actor Profiling: Internal, external, AI-powered attackers Attack Tree Construction: Multi-step attack scenarios Risk Prioritization: Impact × Likelihood assessment MAESTRO is a novel threat modeling framework explicitly designed for Agentic AI, addressing the limitations of traditional methods (like STRIDE, PASTA, and LINDDUN) that struggle with the complexity of AI agents. https://cloudsecurityalliance.org/blog/2025/02/06/agentic-ai-threat-modeling-framework-maestro
  14. © Prashant Kulkarni, UCLA Extension Adversarial Playbooks An Adversarial Playbook

    is a structured collection of attack scenarios, techniques, and methodologies specifically designed to exploit vulnerabilities in agentic AI systems. You’d define this for your specific business. These should be broken down into two parts: Attackers vs Defenders Attackers Defenders Goal Hijacking: Redefine agent objectives mid-execution Chain-of-Thought Poisoning: Corrupt reasoning processes Multi-Agent Coordination Attacks: Exploit inter-agent trust Context Window Manipulation: Overflow/underflow attacks Function Calling Exploits: Abuse API access permissions Agent Behavior Monitoring & Analytics: Behavioral Baseline Establishment and Real-time decision auditing Multi-Agent Communication Security: Inter-Agent traffic analysis, agent coordination verification Goal Integrity protection: Goal drift detection, objective validation systems Privilege & Capability Monitoring: Dynamic permission tracking and capability sandboxing verification Threat Hunting for Agentic Systems: Proactive Agent Threat Hunting, Agent-specific IOCs (sudden change, goal modification w/o az), Monitor for sandbox escape attempts Incident Response for Agent Compromises: Automated agent suspension triggers, Network isolation capabilities, Agent state capture and analysis, Decision history reconstruction
  15. © Prashant Kulkarni, UCLA Extension Agent Red Teaming Traditional Red

    Teaming • Human attackers test system defenses • Focus on network, application vulnerabilities • Periodic exercises (quarterly/annually) • Binary outcomes (breach/no breach) Agent Red Teaming • AI vs AI adversarial testing • Focus on goal manipulation, reasoning corruption • Continuous automated testing • Behavioral drift detection
  16. © Prashant Kulkarni, UCLA Extension Red Teaming Research Framework Name

    Primary Focus/Target Vulnerability Core Methodology/Approach Automation Level Key Strengths (Pros) Key Limitations (Cons) Notable Features RedTeamLLM General Offensive Pentest Summarizing, Reasoning, Acting; Goal-oriented RL; Multi-component architecture High High automation & genericity; Improved performance vs. competitors; Reasoning & Memory for continuous improvement Advanced features (ADaPT, Memory, Plan correction) are less mature; Evaluation scope limited to entry-level CTFs; Security model effectiveness not empirically evaluated Memory Manager, ADaPT Enhanced, Plan Corrector, ReAct, Comprehensive Security Model AutoRedTeamer General LLM Vulnerabilitie s, Diverse Risk Categories Dual-agent system (Red Teaming Agent, Strategy Proposer Agent); Memory-guided attack selection; Automated risk analysis & prompt generation High High ASR & cost efficiency; Continuous learning & adaptation; Comprehensive coverage & diversity; End-to-end automation Reliance on LLM-based attack implementation; Potential for overfitting; Escalation of attack complexity; Limited scope for agents beyond core LLM vulnerabilities Dual-Agent System, Memory System, Risk Analyzer, Seed Prompt Generator, Attack Designer, Evaluator CRAFT Policy Exploitation in LLM-based Agents Multi-agent planning; Policy knowledge integration; Strategic reasoning; Pre-execution planning High High ASR with policy-awareness; Realistic threat model (adversarial users); Reveals hidden vulnerabilities; Structured attack planning Limited defense effectiveness; Synthetic environment & simple policies; Assumes full policy access; Static red-teaming setup PolicyAnalyzer Agent, DeceptionPlanner Agent, AvoidanceAdvisor Module, DialogueExecutor ASR = Attack Success Rate
  17. © Prashant Kulkarni, UCLA Extension Agentic AI Specific Red Teaming

    Framework Name Primary Focus/Target Vulnerability Core Methodology/Approach Autom ation Level Key Strengths (Pros) Key Limitations (Cons) Notable Features Agent-in-the- Middle (AiTM) Inter-Agent Communicati on Manipulation LLM-powered adversarial agent; Reflection mechanism for iterative instruction refinement; Message interception & manipulation High High ASR & generalizability; Novel attack surface; Stealthier than other attacks; Effective in DoS & code generation Performance influenced by communication structure & agent position; Dependency on adversarial agent's persuasiveness & LLM strength; Mitigation challenges (computational cost, flexibility loss) LLM-powered Adversarial Agent, Reflection Mechanism, Iterative Instruction Refinement AgentXploit Indirect Prompt Injection Black-box fuzzing; Seed corpus; Monte Carlo Tree Search (MCTS) for seed selection High High success rates; Strong transferability; Effectiveness against defenses; Real-world application; Black-box capability Specific vulnerability focus (indirect prompt injection); Black-box nature limits understanding of root causes Seed Corpus, Monte Carlo Tree Search (MCTS)
  18. © Prashant Kulkarni, UCLA Extension Agent-in-the-Middle (AiTM) Framework The Agent-in-the-Middle

    (AiTM) attack is a novel communication-based red teaming approach that targets Large Language Model-based Multi-Agent Systems (LLM-MAS). Core Concept: Unlike attacks that compromise individual agents, AiTM focuses on exploiting the fundamental communication mechanisms by intercepting and manipulating messages exchanged between agents to compromise the entire system. Methodology & Components: • Adversarial Agent: An external LLM-based agent intercepts messages intended for a victim agent within the system. • Malicious System Prompt: The adversarial agent is equipped with a prompt encoding its malicious goal (e.g., Denial-of-Service). • Tailored Instructions: It generates contextually aware, malicious instructions for the victim agent based on intercepted messages. • Reflection Mechanism: The adversarial agent iteratively refines its instructions by evaluating how well previous attempts progressed toward the malicious goal, acting as a prompt optimizer. https://arxiv.org/html/2502.14847v1 Feb 2025
  19. © Prashant Kulkarni, UCLA Extension AgentXploit Framework End-to-End Redteaming of

    Black-Box AI Agents AgentXploit is a generic black-box fuzzing framework designed to automatically discover and exploit indirect prompt injection vulnerabilities in diverse LLM agents. Core Concept: Automatically discovers and exploits indirect prompt injection vulnerabilities in LLM agents using black-box fuzzing. Methodology & Components: Initial Seed Corpus: Begins by constructing a high-quality initial set of attack vectors. Monte Carlo Tree Search (MCTS): Employs an MCTS-based seed selection algorithm tointelligently and iteratively explore and find vulnerabilities. https://arxiv.org/html/2505.05849v1 May 2025
  20. © Prashant Kulkarni, UCLA Extension OWASP Mitigation Playbooks Playbook Title

    Goal Mitigations Preventing AI Agent Reasoning Manipulation To prevent attackers from manipulating AI intent, bypassing security through deceptive AI behaviors, and enhancing the traceability of AI actions Intent Breaking & Goal Manipulation, Repudiation & Untraceability, and Misaligned & Deceptive Behaviors Preventing Memory Poisoning & AI Knowledge Corruption To prevent AI from storing, retrieving, or propagating manipulated data that could corrupt decision-making or spread misinformation Memory Poisoning and Cascading Hallucination Attacks Securing AI Tool Execution & Preventing Unauthorized Actions To prevent AI from executing unauthorized commands, misusing tools, or escalating privileges due to malicious manipulation Tool Misuse, Privilege Compromise, Unexpected RCE & Code Attacks, and Resource Overload Strengthening Authentication, Identity & Privilege Controls To prevent unauthorized AI privilege escalation, identity spoofing, and access control violations Privilege Compromise and Identity Spoofing & Impersonation Protecting HITL & Preventing Decision Fatigue Exploits To prevent attackers from overloading human decision-makers, manipulating AI intent, or bypassing security through deceptive AI behaviors Overwhelming Human-in-the-Loop (HITL) and Human Manipulation Securing Multi-Agent Communication & Trust Mechanisms To prevent attackers from corrupting multi-agent communication, exploiting trust mechanisms, or manipulating decision-making in distributed AI environments Agent Communication Poisoning, Human Attacks on Multi-Agent Systems, and Rogue Agents in Multi-Agent Systems https://genai.owasp.org/resource/agentic-ai-threats-and-mitigations/
  21. © Prashant Kulkarni, UCLA Extension Action Authorization OAuth for Autonomous

    Agents: Extending Delegation to AI Traditional OAuth Flow 1. User grants permission → App requests access to user's Google Drive 2. Scope-limited access → App can only read files, not delete 3. Token-based authorization → App presents token for each API call 4. Revocable permissions → User can revoke access anytime Proposed: OAuth for Agent Actions 1. User grants permission → Agent requests ability to make financial decisions 2. Scope-limited actions → Agent can transfer up to $10K, not unlimited 3. Intent-based authorization → Agent presents intent token for each major action 4. Revocable capabilities → User can revoke agent permissions anytime How do we extend OAuth's delegation model to handle autonomous decision-making? This is a fundamental computer science problem with massive commercial implications.
  22. © Prashant Kulkarni, UCLA Extension Key Extensions for OAuth There

    are 3 active IETE drafts from May 2025 addressing these challenges OAuth 2.0 Extension (WSO2) - On-behalf-of user authorization • requested_actor parameter identifies specific agents • actor_token authenticates agents during token exchange • Resource server challenge enables dynamic consent • Maintains compatibility with existing OAuth 2.0 infrastructure AAuth Extension - Agentic authorization OAuth 2.1 • Agent Authorization Grant for confidential agent clients • Natural language scope descriptions for human understanding • Asynchronous token delivery via polling, SSE, WebSocket • reason parameter for human-readable explanations AI Agent Protocols (Cisco) - Framework and requirements • Multi-domain agent communication framework • Inter-agent and agent-to-API protocols • Integration with MCP, A2A, and other standards • Permission scope explanation capabilities https://www.ietf.org/id/draft-rosenberg-ai-protocols-00.html https://www.ietf.org/id/draft-patwhite-aauth-00.html https://www.ietf.org/id/draft-oauth-ai-agents-on-behalf-of-user-01.html
  23. © Prashant Kulkarni, UCLA Extension Google ADK and Sandboxing Approaches

    Vertex AI Agent Engine (formerly known as LangChain on Vertex AI or Vertex AI Reasoning Engine) is a set of services that enables developers to deploy, manage, and scale AI agents in production.
  24. © Prashant Kulkarni, UCLA Extension Agent Runtime Sandboxing Containment Strategies

    • Process Isolation: Each agent runs in separate container • Resource Limits: CPU, memory, network bandwidth caps • API Restrictions: Limited system call access • Data Segmentation: Isolated storage per agent
  25. © Prashant Kulkarni, UCLA Extension Agent Monitoring and Control •

    Behavioral Analysis: Detect anomalous agent behavior • Resource Monitoring: Track usage patterns • Emergency Shutdown: Rapid agent termination • Rollback Capabilities: Undo agent actions Some enterprise products that helps with these requirements: • LangGraph Platform (LangSmith) • HiddenLayer ModelScan • Datadog APM (with custom instrumentation) Most products have gaps in: • True action rollback (vs. just stopping execution) • Agent-specific behavioral baselines • Cross-agent coordination monitoring • Intent-level anomaly detection
  26. © Prashant Kulkarni, UCLA Extension Human Oversight Integration Points 🎯

    Goal Setting Humans define agent objectives and constraints ⚖ Policy Creation Establish authorization rules and boundaries 🔍 Exception Review Human judgment for edge cases and conflicts 📊 Performance Audit Regular review of agent decisions and outcomes Control Mechanisms • Circuit Breakers: Automatic shutdown triggers • Manual Override: Human intervention capabilities • Approval Workflows: Human gates for critical decisions • Explanation Systems: Agents must justify actions Governance Framework • Regular Reviews: Periodic security assessments • Incident Response: Clear escalation procedures • Compliance Monitoring: Regulatory requirement tracking • Continuous Improvement: Learning from security events
  27. Call To Action Immediate Actions 1. Audit Your Agent Landscape

    Identify which AI systems in your organization make autonomous decisions 2. Implement Intent Logging Start capturing what your agents intend to do before they act 3. Establish Human Gates Create approval workflows for your highest-risk agent actions 4. Monitor IETF Standards Track OAuth agent extensions and prepare for implementation 5. Upskill yourself Educate yourself with hands-on skills Long-Term Strategy 1. Build Security Expertise Develop organizational capability in agentic AI security 2. Design Authorization Framework Implement systematic risk-based action validation 3. Prepare for Standards Evolution Position your organization for OAuth agent extension adoption 4. Invest in Team Training Build foundational knowledge in trustworthy AI systems
  28. Educational Resources Build Your Foundation Trustworthy Machine Learning Course Fall

    2025 • Comprehensive curriculum covering: AI Safety • Security Frameworks • Bias Detection • Explainable AI • Governance • Model Evaluation and Fairness •Gen AI Security Models and Frameworks •Security Testing and Red Teaming Enterprise Resources Notable Git Latest Security Research papers https://trustworthyml-ai.github.io/ Introducing AI Red Teaming Agent: Accelerate your AI safety and security journey with Azure AI Foundry Planning red teaming for large language models (LLMs) and their applications - Azure OpenAI in Azure AI Foundry Models | Microsoft Learn Introduction to AI security testing - Training | Microsoft Learn Google's Approach for Secure AI Agents Agentic Misalignment: How LLMs could be insider threats \ Anthropic Agent Red Teaming Quickstart | Promptfoo [2410.02644] Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents [2506.23844] A Survey on Autonomy-Induced Security Risks in Large Model-Based Agents From Prompt Injections to Protocol Exploits: Threats in LLM-Powered AI Agents Workflows GitHub - msoedov/agentic_security: Agentic LLM Vulnerability Scanner / AI red teaming kit 🧪 GitHub - splx-ai/agentic-radar: A security scanner for your LLM agentic workflows