Generative AI Breaches: Threats, Investigations, and Response

Threats, Investigations, and Response Sr. Threat Researcher at Microsoft

~$ Thomas ROCCIA ~$ Sr. Threat Researcher at Microsoft ~$
www.securitybreak.io ~$ @fr0gger_

~$ How threat actors use AI ~$ How defenders can
be prepared ~$ Discussing AI Security Framework ~$ Moving forward, shaping the industry

~$ Timestamp Jan 15, 2028 · 09:23 AM ~$ Trigger
Prompt log shows unexpected API call ~$ Breach Vector Prompt bypasses filter, retrieves production DB key ~$ Impact 500 000 records exfiltrated (user profiles, revenue data...) ~$ Response ???

~$ In Aug 20, 2024 Slack was impacted by an
indirect prompt injection in their RAG interface. https://promptarmor.substack.com/p/data-exfiltration-from-slack-ai-via

Critical RCE vulnerability in Vanna.AI (CVE-2024-5565) Caused by prompt injection
in user input Input reached exec() during Plotly chart creation https://jfrog.com/blog/prompt-injection-attack-code-execution-in-vanna-ai-cve-2024-5565/

Risk ID Risk Name LLM01 Prompt Injection LLM02 Sensitive Information
Disclosure LLM03 Supply Chain Vulnerabilities LLM04 Training Data Poisoning LLM05 Insecure Output Handling LLM06 Excessive Agency LLM07 System Prompt Leakage LLM08 Vector and Embedding Weaknesses LLM09 Misinformation LLM10 Unbounded Consumption

Influence Operation & Propaganda Malware research Election manipulation and bots
Social Engineering and Phishing Deceptive Employment Scheme Abuse of infrastructure to power underground AI Exploit Research Romance and Investment Scams Crafting fake information

Country Observed AI Misuse Iran Phishing, content generation for disinformation,
reconaissance on military & defense, vulnerability research China Recon on US & foreign targets, scripting & automation for network infiltration, technical research North Korea Malware development, IT worker infiltration, cryptocurrency, scams, espionage on South Korea & US Defense Russia Limited use, scripting tasks, malware obfuscation, content generation for disinformation https://cloud.google.com/blog/topics/threat-intelligence/adversarial-misuse-generative-ai?hl=en

https://cdn.openai.com/threat-intelligence-reports/influence-and-cyber-operations-an-update_October-2024.pdf

LLM-informed reconnaissance LLM-enhanced scripting techniques LLM-aided development LLM-supported social engineering
LLM-assisted vulnerability research LLM-optimized payload crafting LLM-enhanced anomaly detection evasion LLM-directed security feature bypass LLM-advised resource development Gathering intel with LLMs on tech and vulnerabilities Supporting dev of tools, including malware Automating scripts for attacks and event handling Assisting comms for translation and social engineering Finding flaws in software and systems Crafting payloads for cyberattacks Evading detection by mimicking normal traffic Bypassing security like 2FA and CAPTCHA Tooling and planning for offensive operations

IoPCs are patterns or artifacts within prompts that indicate potential
exploitation, abuse, or misuse of the model. IoPCs facilitate the identification of attacks on AI models and the exploitation of their functionalities for adversarial purposes. Prompt manipulation, injection attacks, adversarial tokens, and jailbreak attempts. Influence ops, malware, sensitive data extraction, misinformation, social engineering. Prompts that show consistent formatting, repeated phrases, or recurring structures across multiple instances. Potentially revealing hidden exploitation attempts or harmful activities. Prompt manipulation Abusing legitimate functions Reused or suspicious prompt patterns Abnormal or unexpected model outputs

~$Organizations now use AI for core tasks. Traditional security still
protects infrastructure, but AI needs a different incident response. How do you handle a prompt injection? How do you respond to a breach enabled by an AI agent? How do you detect misuse of your AI models? How do you investigate data poisoning attacks? How do you hunt malicious prompting?

Foundation models Custom models Guardrails Agents Knowledge bases Training data
Plugins (tool calling, MCP...) Source: https://aws.amazon.com/blogs/security/methodology-for-incident-response-on-generative-ai-workloads/

Check if the app's access patterns were abused. Access Look
for server or resource modifications. Infrastructure changes Check if users altered the GenAI models. AI changes See if users accessed or modified stored data. Data store changes Inspect inputs for prompt injection or malware. Invocation Check for unauthorized access or data tampering. Private data Review if the app could act on behalf of users. Agency

https://novahunting.ai

Generative AI System Malicious Prompt X Malicious Prompt Y Prompt
Z Prompt A ✅ Prompt Z ✅ Prompt A Malicious Prompt X Malicious Prompt Y User User Nova Rule Threat Analysts If a prompt matches a defined rule, it gets flagged by Nova An analyst can then review the prompt for further analysis. Jailbreak Attempt Malware Code Generation Attempt Threat Actor Threat Actor

https://blog.securitybreak.io/protect-your-ai-system-with-nova-mcp-7941fa465046

Generative AI Breaches: Threats, Investigations...

Generative AI Breaches: Threats, Investigations, and Response

Thomas Roccia

More Decks by Thomas Roccia

Other Decks in Technology

Featured

Transcript

Threats, Investigations, and Response Sr. Threat Researcher at Microsoft

~$ Thomas ROCCIA ~$ Sr. Threat Researcher at Microsoft ~$

~$ How threat actors use AI ~$ How defenders can

~$ Timestamp Jan 15, 2028 · 09:23 AM ~$ Trigger

~$ In Aug 20, 2024 Slack was impacted by an

Critical RCE vulnerability in Vanna.AI (CVE-2024-5565) Caused by prompt injection

Risk ID Risk Name LLM01 Prompt Injection LLM02 Sensitive Information

Influence Operation & Propaganda Malware research Election manipulation and bots

Country Observed AI Misuse Iran Phishing, content generation for disinformation,

https://cdn.openai.com/threat-intelligence-reports/influence-and-cyber-operations-an-update_October-2024.pdf

LLM-informed reconnaissance LLM-enhanced scripting techniques LLM-aided development LLM-supported social engineering

IoPCs are patterns or artifacts within prompts that indicate potential

~$Organizations now use AI for core tasks. Traditional security still

Foundation models Custom models Guardrails Agents Knowledge bases Training data

Check if the app's access patterns were abused. Access Look

https://novahunting.ai

Generative AI System Malicious Prompt X Malicious Prompt Y Prompt

https://blog.securitybreak.io/protect-your-ai-system-with-nova-mcp-7941fa465046