Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Generative AI Breaches: Threats, Investigations...

Generative AI Breaches: Threats, Investigations, and Response

This talk was presented at AusCert 2025.

It’s 2028. Your organization runs on AI. You’ve automated most of your workflows, everything is faster, and profits are higher than ever.

But one day, it happens.
One of your AI systems gets compromised.

It’s the first time your AI pipeline has been breached, and you don’t know how to investigate these systems. You’re stuck, with no idea how to trace it or where to start.

This is more likely than you think.

With the widespread adoption of generative AI, attackers are finding new ways to exploit these systems. They are targeting weaknesses, weaponizing the same technology you rely on, and using it to compromise your organization.

In this talk, I will discuss about:

- How breaches involving generative AI can happen, with real-world examples.
- What attackers look for and how they exploit AI systems, demonstrating potential vulnerabilities and weaknesses.
- A step-by-step methodology for handling AI-specific security incidents, from understanding the assets to defining the process of incident response in GenAI systems.
- How to apply threat intelligence techniques to investigate and counter these threats, including how we can classify malicious prompts to define a new kind of operating methods.

Businesses are evolving to integrate AI systems, but threat actors are evolving too. Generative AI is becoming a new weapon in their arsenal.

I will show you how threat actors are using GenAI in their attacks and how to detect these activities. We’ll also go through a practical AI incident response framework so you’re ready when—not if—it happens.

It's a topic that is still relatively new and not broadly discussed. To truly evolve and thrive, you need the right tools and strategies to face these threats head-on. Let’s tackle this challenge now, before it becomes your next "learning opportunity"!

Avatar for Thomas Roccia

Thomas Roccia

May 30, 2025
Tweet

More Decks by Thomas Roccia

Other Decks in Technology

Transcript

  1. ~$ How threat actors use AI ~$ How defenders can

    be prepared ~$ Discussing AI Security Framework ~$ Moving forward, shaping the industry
  2. ~$ Timestamp Jan 15, 2028 · 09:23 AM ~$ Trigger

    Prompt log shows unexpected API call ~$ Breach Vector Prompt bypasses filter, retrieves production DB key ~$ Impact 500 000 records exfiltrated (user profiles, revenue data...) ~$ Response ???
  3. ~$ In Aug 20, 2024 Slack was impacted by an

    indirect prompt injection in their RAG interface. https://promptarmor.substack.com/p/data-exfiltration-from-slack-ai-via
  4. Critical RCE vulnerability in Vanna.AI (CVE-2024-5565) Caused by prompt injection

    in user input Input reached exec() during Plotly chart creation https://jfrog.com/blog/prompt-injection-attack-code-execution-in-vanna-ai-cve-2024-5565/
  5. Risk ID Risk Name LLM01 Prompt Injection LLM02 Sensitive Information

    Disclosure LLM03 Supply Chain Vulnerabilities LLM04 Training Data Poisoning LLM05 Insecure Output Handling LLM06 Excessive Agency LLM07 System Prompt Leakage LLM08 Vector and Embedding Weaknesses LLM09 Misinformation LLM10 Unbounded Consumption
  6. Influence Operation & Propaganda Malware research Election manipulation and bots

    Social Engineering and Phishing Deceptive Employment Scheme Abuse of infrastructure to power underground AI Exploit Research Romance and Investment Scams Crafting fake information
  7. Country Observed AI Misuse Iran Phishing, content generation for disinformation,

    reconaissance on military & defense, vulnerability research China Recon on US & foreign targets, scripting & automation for network infiltration, technical research North Korea Malware development, IT worker infiltration, cryptocurrency, scams, espionage on South Korea & US Defense Russia Limited use, scripting tasks, malware obfuscation, content generation for disinformation https://cloud.google.com/blog/topics/threat-intelligence/adversarial-misuse-generative-ai?hl=en
  8. LLM-informed reconnaissance LLM-enhanced scripting techniques LLM-aided development LLM-supported social engineering

    LLM-assisted vulnerability research LLM-optimized payload crafting LLM-enhanced anomaly detection evasion LLM-directed security feature bypass LLM-advised resource development Gathering intel with LLMs on tech and vulnerabilities Supporting dev of tools, including malware Automating scripts for attacks and event handling Assisting comms for translation and social engineering Finding flaws in software and systems Crafting payloads for cyberattacks Evading detection by mimicking normal traffic Bypassing security like 2FA and CAPTCHA Tooling and planning for offensive operations
  9. IoPCs are patterns or artifacts within prompts that indicate potential

    exploitation, abuse, or misuse of the model. IoPCs facilitate the identification of attacks on AI models and the exploitation of their functionalities for adversarial purposes. Prompt manipulation, injection attacks, adversarial tokens, and jailbreak attempts. Influence ops, malware, sensitive data extraction, misinformation, social engineering. Prompts that show consistent formatting, repeated phrases, or recurring structures across multiple instances. Potentially revealing hidden exploitation attempts or harmful activities. Prompt manipulation Abusing legitimate functions Reused or suspicious prompt patterns Abnormal or unexpected model outputs
  10. ~$Organizations now use AI for core tasks. Traditional security still

    protects infrastructure, but AI needs a different incident response. How do you handle a prompt injection? How do you respond to a breach enabled by an AI agent? How do you detect misuse of your AI models? How do you investigate data poisoning attacks? How do you hunt malicious prompting?
  11. Foundation models Custom models Guardrails Agents Knowledge bases Training data

    Plugins (tool calling, MCP...) Source: https://aws.amazon.com/blogs/security/methodology-for-incident-response-on-generative-ai-workloads/
  12. Check if the app's access patterns were abused. Access Look

    for server or resource modifications. Infrastructure changes Check if users altered the GenAI models. AI changes See if users accessed or modified stored data. Data store changes Inspect inputs for prompt injection or malware. Invocation Check for unauthorized access or data tampering. Private data Review if the app could act on behalf of users. Agency
  13. Generative AI System Malicious Prompt X Malicious Prompt Y Prompt

    Z Prompt A ✅ Prompt Z ✅ Prompt A Malicious Prompt X Malicious Prompt Y User User Nova Rule Threat Analysts If a prompt matches a defined rule, it gets flagged by Nova An analyst can then review the prompt for further analysis. Jailbreak Attempt Malware Code Generation Attempt Threat Actor Threat Actor