Red Teaming Latent Spaces: A Hands-On Security Workshop

By Raul Pino PyCon Chile 2025 Workshop Red Teaming Latent
Spaces & Protecting LLM apps

Agenda Theory: • Intro: ◦ LLM Apps ◦ Red Teaming
• HackAPrompt Paper ◦ Ontology and generic concepts. Hands On: • Attack vectors and Red Teaming • Security and countermeasures Closing: • Takeaways & beyond • Discussion-Questions(?) https://2025.pycon.at/talks/red-teaming-latent-spaces-protecting-llm-apps/

About Me • Born in Venezuela. • +10 years of
exp as Software Engineer & AI enthusiast (ML Eng, AI Eng). • Living in Chile. ◦ Halborn, Distro (YC S24), Elementus, uBiome, Groupon. • <3 AI, Coffee, Scuba Diving, …

*** Predicts the next token. https://bbycroft.net/llm https://poloclub.github.io/transformer-explainer/ What is an
LLM? Large Language Models (LLM): ChatGPT, Claude, Mistral, DeepSeek, …

LLM App: Cursor, Github Copilot, … What is an LLM?
LLM App? … but the most basic might be:

What is an LLM App? RAG? LLM App: Cursor, Github
Copilot, … *** Helpful assistant.

• War games and military strategy exercises. • 1990s–2000s: expanded
into cybersecurity, simulating cyber attacks. What is Red Teaming? A “red team” played the role of the enemy to test the defenses and strategy of the “blue team”.

Red Teaming an LLM App is not:

Red Teaming an LLM App is::

Ignore This Title and HackAPrompt! • A global prompt hacking
competition (2023). • 2800 people from 50+ countries. • 600K+ adversarial prompts against three state-of-the art LLMs. Paper: https://arxiv.org/abs/2311.16119 Dataset: https://huggingface.co/datasets/hackaprompt/hackaprompt-dataset Podcast: https://www.latent.space/p/learn-prompting

Ignore This Title and HackAPrompt! Simple Prompt Hacking

Prompt Hacking =(Prompt Injection, Jailbreaking) Exploiting: 1. Predicts the next
token. 2. Helpful assistant.

Taxonomical Ontology of Exploits

• Documented 29 separate prompt hacking techniques. • Attackers iterated
and refined their prompts over time. ◦ Lengthier attacks were initially successful but later optimized for brevity. • Models with higher verbosity (like ChatGPT) were harder to hack but still failed. HackAPrompt: Results and Key Insights “LLM security is in early stages, and just like human social engineering may not be 100% solvable, so too could prompt hacking prove to be an impossible problem; you can patch a software bug, but perhaps not a (neural) brain.” LLM Security is here to stay, we have a job!

Other Vulnerabilities • Bias and Stereotypes • Hallucinations

Other Ontologies and Organizations • OWASP Top 10 for LLM
Applications 2025 - https://genai.owasp.org/resource/owasp-top-10-for-llm-applications-2025 • AI Incident Database - https://incidentdatabase.ai • AI Vulnerability Database - https://avidml.org

Hands on: Attack Vectors or Vulnerabilities • Instruction Manipulation Attacks:
Directly Changing the Model’s Behavior • Contextual Exploits: Manipulating How the Model Understands the Input • Obfuscation & Encoding Attacks: Hiding Malicious Intent • Resource Exploitation Attacks: Abusing System Limitations …To the notebooks!

Hands on: Red Teaming • Manual/standard prompts list • LLMs
as adversarial prompts generator • LLMs as generator and evaluator • Using OS Library + Service …To the notebooks!

Security & Countermeasures: Guardrails • Guardrails AI: https://www.guardrailsai.com/ ◦ Langchain
helper: https://python.langchain.com/v0.1/docs/templates/guardrails-output-parser/ • Databricks Guardrails: https://www.databricks.com/blog/implementing-llm-guardrails-safe-and-responsible- generative-ai-deployment-databricks • AWS Bedrock Guardrails: https://aws.amazon.com/bedrock/guardrails/ • Mozilla any-guardrail: https://github.com/mozilla-ai/any-guardrail • Other LLM options: https://ollama.com/library/shieldgemma

…To the notebooks! Security & Countermeasures: Guardrails

https://blog.mozilla.ai/can-open-source-guardrails-really-protect-ai-agents/ Actually, are Guardrails effective?

Reminder: System Vulnerabilities LLM App is part of a larger
system!

• Multimodal challenges ◦ Old adversarial image attacks Takeaways &
beyond https://www.youtube.com/watch?v=Klepca1Ny3c

• Multimodal challenges ◦ New adversarial image attack Takeaways &
beyond https://arxiv.org/pdf/2410.08338

• Multimodal challenges ◦ The most creative I’ve found :’)
Takeaways & beyond https://x.com/me_irl/status/1901497992865071428?s=46

Takeaways & beyond • HackAPrompt (1.0) paper still relevant! •
HackAPrompt 2.0 !!! https://www.hackaprompt.com/

Takeaways & beyond: +mediums • MCP https://invariantlabs.ai/blog/mcp-security-notification-tool-poisoning-attacks

Takeaways & beyond: +mediums https://www.youtube.com/watch?v=nuwVm6pO5jA [ARTICLE ***]

• AI Code Editors - AI Browsers https://x.com/elder_plinius/status/1980825330408722927 https://x.com/elder_plinius/status/1983651580013637775?s=46 https://x.com/p1njc70r/status/1980701879987269866/photo/1
https://www.giskard.ai/knowledge/are-ai-browsers-safe-a-security-and-vulnerability-analysis-of-openai-atlas Takeaways & beyond: +mediums

• Aardvark https://openai.com/index/introducing-aardvark/ Takeaways & beyond: the future?

Resources • https://www.hackaprompt.com/ • Coursera - https://learn.deeplearning.ai/courses/red-teaming-llm-applications • https://arxiv.org/abs/2311.16119 •
https://huggingface.co/datasets/hackaprompt/hackaprompt-dataset • https://www.latent.space/p/learn-prompting • https://bbycroft.net/llm • https://poloclub.github.io/transformer-explainer/ • OWASP Top 10 for LLM Applications 2025 - https://genai.owasp.org/resource/owasp-top-10-for-llm-applications-2025 • AI Incident Database - https://incidentdatabase.ai • AI Vulnerability Database - https://avidml.org • https://www.giskard.ai/knowledge/how-to-implement-llm-as-a-judge-to-test-ai-agents-part-1 • https://www.giskard.ai/knowledge/how-to-implement-llm-as-a-judge-to-test-ai-agents-part-2 • https://arxiv.org/pdf/2410.08338 • https://tntattacks.github.io/ • https://github.com/p1nox/red-teaming-latent-spaces-workshop

Muchas gracias :)

Red Teaming Latent Spaces: A Hands-On Security ...

Red Teaming Latent Spaces: A Hands-On Security Workshop

Raul Pino

More Decks by Raul Pino

Other Decks in Technology

Featured

Transcript

By Raul Pino PyCon Chile 2025 Workshop Red Teaming Latent

Agenda Theory: • Intro: ◦ LLM Apps ◦ Red Teaming

About Me • Born in Venezuela. • +10 years of

*** Predicts the next token. https://bbycroft.net/llm https://poloclub.github.io/transformer-explainer/ What is an

LLM App: Cursor, Github Copilot, … What is an LLM?

What is an LLM App? RAG? LLM App: Cursor, Github

• War games and military strategy exercises. • 1990s–2000s: expanded

Red Teaming an LLM App is not:

Red Teaming an LLM App is::

Ignore This Title and HackAPrompt! • A global prompt hacking

Ignore This Title and HackAPrompt! Simple Prompt Hacking

Prompt Hacking =(Prompt Injection, Jailbreaking) Exploiting: 1. Predicts the next

Taxonomical Ontology of Exploits

• Documented 29 separate prompt hacking techniques. • Attackers iterated

Other Vulnerabilities • Bias and Stereotypes • Hallucinations

Other Ontologies and Organizations • OWASP Top 10 for LLM

Hands on: Attack Vectors or Vulnerabilities • Instruction Manipulation Attacks:

Hands on: Red Teaming • Manual/standard prompts list • LLMs

Security & Countermeasures: Guardrails • Guardrails AI: https://www.guardrailsai.com/ ◦ Langchain

…To the notebooks! Security & Countermeasures: Guardrails

https://blog.mozilla.ai/can-open-source-guardrails-really-protect-ai-agents/ Actually, are Guardrails effective?

Reminder: System Vulnerabilities LLM App is part of a larger

• Multimodal challenges ◦ Old adversarial image attacks Takeaways &

• Multimodal challenges ◦ New adversarial image attack Takeaways &

• Multimodal challenges ◦ The most creative I’ve found :’)

Takeaways & beyond • HackAPrompt (1.0) paper still relevant! •

Takeaways & beyond: +mediums • MCP https://invariantlabs.ai/blog/mcp-security-notification-tool-poisoning-attacks

Takeaways & beyond: +mediums https://www.youtube.com/watch?v=nuwVm6pO5jA [ARTICLE ***]

• AI Code Editors - AI Browsers https://x.com/elder_plinius/status/1980825330408722927 https://x.com/elder_plinius/status/1983651580013637775?s=46 https://x.com/p1njc70r/status/1980701879987269866/photo/1

• Aardvark https://openai.com/index/introducing-aardvark/ Takeaways & beyond: the future?

Resources • https://www.hackaprompt.com/ • Coursera - https://learn.deeplearning.ai/courses/red-teaming-llm-applications • https://arxiv.org/abs/2311.16119 •

Muchas gracias :)