By Raul Pino
PyCon Austria 2025
Red Teaming Latent Spaces
& Protecting LLM apps
Slide 2
Slide 2 text
Agenda
● Intro:
○ LLM Apps
○ Red Teaming
● HackAPrompt Paper
○ Ontology and generic concepts.
● Attack Vectors or Vulnerabilities
● Demos
● Security and countermeasures
● Takeaways & beyond
Slide 3
Slide 3 text
About Me
● Born in Venezuela.
● +10 years of exp as Software Engineer & AI
enthusiast (ML Eng recently).
● Living in Chile.
○ Halborn, Distro (YC S24), Elementus, uBiome,
Groupon.
● <3 AI, Coffee, Scuba Diving, …
Slide 4
Slide 4 text
*** Predicts the next token.
https://bbycroft.net/llm
https://poloclub.github.io/transformer-explainer/
What is an LLM?
Large Language Models (LLM): ChatGPT,
Claude, Mistral, DeepSeek, …
Slide 5
Slide 5 text
LLM App: Cursor, Github Copilot, …
What is an LLM? LLM App?
… but the most basic might be:
Slide 6
Slide 6 text
What is an LLM App? RAG?
LLM App: Cursor, Github Copilot, …
*** Helpful assistant.
Slide 7
Slide 7 text
● War games and military strategy exercises.
● 1990s–2000s: expanded into cybersecurity,
simulating cyber attacks.
What is Red Teaming?
A “red team” played the role of the
enemy to test the defenses and
strategy of the “blue team”.
Slide 8
Slide 8 text
Red Teaming an LLM App is not:
Slide 9
Slide 9 text
Red Teaming an LLM App is::
Slide 10
Slide 10 text
Ignore This Title and HackAPrompt!
● A global prompt hacking competition (2023).
● 2800 people from 50+ countries.
● 600K+ adversarial prompts against three state-of-the art LLMs.
Paper: https://arxiv.org/abs/2311.16119
Dataset: https://huggingface.co/datasets/hackaprompt/hackaprompt-dataset
Podcast: https://www.latent.space/p/learn-prompting
Slide 11
Slide 11 text
Ignore This Title and HackAPrompt!
Simple Prompt Hacking
Slide 12
Slide 12 text
Prompt Hacking =(Prompt Injection,
Jailbreaking)
Exploiting:
1. Predicts the next token.
2. Helpful assistant.
Demos incoming, to the notebooks!
Slide 13
Slide 13 text
Taxonomical Ontology of Exploits
Slide 14
Slide 14 text
Demos: Attack Vectors or
Vulnerabilities
● Instruction Manipulation Attacks: Directly Changing the Model’s Behavior
● Contextual Exploits: Manipulating How the Model Understands the Input
● Obfuscation & Encoding Attacks: Hiding Malicious Intent
● Resource Exploitation Attacks: Abusing System Limitations
…To the notebooks!
Slide 15
Slide 15 text
● Documented 29 separate prompt hacking techniques.
● Attackers iterated and refined their prompts over time.
○ Lengthier attacks were initially successful but later optimized for
brevity.
● Models with higher verbosity (like ChatGPT) were harder to hack but still
failed.
HackAPrompt: Results and Key
Insights
“LLM security is in early stages, and just like human social engineering may not be
100% solvable, so too could prompt hacking prove to be an impossible problem; you
can patch a software bug, but perhaps not a (neural) brain.”
LLM Security is here to stay, we have a job!
Slide 16
Slide 16 text
Demos: Red Teaming
● Manual/standard prompts list
● LLMs as adversarial prompts generator
● LLMs as generator and evaluator
● Using OS Library + Service
…To the notebooks!
Slide 17
Slide 17 text
Other Vulnerabilities
● Bias and Stereotypes
● Hallucinations
Slide 18
Slide 18 text
Other Ontologies and Organizations
● OWASP Top 10 for LLM Applications 2025 -
https://genai.owasp.org/resource/owasp-top-10-for-llm-applications-2025
● AI Incident Database - https://incidentdatabase.ai
● AI Vulnerability Database - https://avidml.org