Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Red Teaming Latent Spaces: A Hands-On Security ...

Avatar for Raul Pino Raul Pino
November 08, 2025

Red Teaming Latent Spaces: A Hands-On Security Workshop

En este taller interactivo, aprenderás a identificar y defenderte contra vulnerabilidades de seguridad en aplicaciones de Modelos de Lenguaje Grande (LLM). A través de ejercicios prácticos, trabajarás en equipos para descubrir y explotar vulnerabilidades comunes en LLM usando vectores de ataque reales. También implementarás medidas de seguridad y marcos protectores con herramientas y bibliotecas de Python. Se recomienda tener conocimientos básicos de Python y familiaridad con LLMs. Es necesario llevar un portátil para los ejercicios prácticos.

PyCon Chile 2025: https://www.pycon.cl/talks/
Repo: https://github.com/p1nox/red-teaming-latent-spaces-workshop

Avatar for Raul Pino

Raul Pino

November 08, 2025
Tweet

More Decks by Raul Pino

Other Decks in Technology

Transcript

  1. Agenda Theory: • Intro: ◦ LLM Apps ◦ Red Teaming

    • HackAPrompt Paper ◦ Ontology and generic concepts. Hands On: • Attack vectors and Red Teaming • Security and countermeasures Closing: • Takeaways & beyond • Discussion-Questions(?) https://2025.pycon.at/talks/red-teaming-latent-spaces-protecting-llm-apps/
  2. About Me • Born in Venezuela. • +10 years of

    exp as Software Engineer & AI enthusiast (ML Eng, AI Eng). • Living in Chile. ◦ Halborn, Distro (YC S24), Elementus, uBiome, Groupon. • <3 AI, Coffee, Scuba Diving, …
  3. *** Predicts the next token. https://bbycroft.net/llm https://poloclub.github.io/transformer-explainer/ What is an

    LLM? Large Language Models (LLM): ChatGPT, Claude, Mistral, DeepSeek, …
  4. LLM App: Cursor, Github Copilot, … What is an LLM?

    LLM App? … but the most basic might be:
  5. What is an LLM App? RAG? LLM App: Cursor, Github

    Copilot, … *** Helpful assistant.
  6. • War games and military strategy exercises. • 1990s–2000s: expanded

    into cybersecurity, simulating cyber attacks. What is Red Teaming? A “red team” played the role of the enemy to test the defenses and strategy of the “blue team”.
  7. Ignore This Title and HackAPrompt! • A global prompt hacking

    competition (2023). • 2800 people from 50+ countries. • 600K+ adversarial prompts against three state-of-the art LLMs. Paper: https://arxiv.org/abs/2311.16119 Dataset: https://huggingface.co/datasets/hackaprompt/hackaprompt-dataset Podcast: https://www.latent.space/p/learn-prompting
  8. • Documented 29 separate prompt hacking techniques. • Attackers iterated

    and refined their prompts over time. ◦ Lengthier attacks were initially successful but later optimized for brevity. • Models with higher verbosity (like ChatGPT) were harder to hack but still failed. HackAPrompt: Results and Key Insights “LLM security is in early stages, and just like human social engineering may not be 100% solvable, so too could prompt hacking prove to be an impossible problem; you can patch a software bug, but perhaps not a (neural) brain.” LLM Security is here to stay, we have a job!
  9. Other Ontologies and Organizations • OWASP Top 10 for LLM

    Applications 2025 - https://genai.owasp.org/resource/owasp-top-10-for-llm-applications-2025 • AI Incident Database - https://incidentdatabase.ai • AI Vulnerability Database - https://avidml.org
  10. Hands on: Attack Vectors or Vulnerabilities • Instruction Manipulation Attacks:

    Directly Changing the Model’s Behavior • Contextual Exploits: Manipulating How the Model Understands the Input • Obfuscation & Encoding Attacks: Hiding Malicious Intent • Resource Exploitation Attacks: Abusing System Limitations …To the notebooks!
  11. Hands on: Red Teaming • Manual/standard prompts list • LLMs

    as adversarial prompts generator • LLMs as generator and evaluator • Using OS Library + Service …To the notebooks!
  12. Security & Countermeasures: Guardrails • Guardrails AI: https://www.guardrailsai.com/ ◦ Langchain

    helper: https://python.langchain.com/v0.1/docs/templates/guardrails-output-parser/ • Databricks Guardrails: https://www.databricks.com/blog/implementing-llm-guardrails-safe-and-responsible- generative-ai-deployment-databricks • AWS Bedrock Guardrails: https://aws.amazon.com/bedrock/guardrails/ • Mozilla any-guardrail: https://github.com/mozilla-ai/any-guardrail • Other LLM options: https://ollama.com/library/shieldgemma
  13. • Multimodal challenges ◦ Old adversarial image attacks Takeaways &

    beyond https://www.youtube.com/watch?v=Klepca1Ny3c
  14. • Multimodal challenges ◦ The most creative I’ve found :’)

    Takeaways & beyond https://x.com/me_irl/status/1901497992865071428?s=46
  15. Takeaways & beyond • HackAPrompt (1.0) paper still relevant! •

    HackAPrompt 2.0 !!! https://www.hackaprompt.com/
  16. • AI Code Editors - AI Browsers https://x.com/elder_plinius/status/1980825330408722927 https://x.com/elder_plinius/status/1983651580013637775?s=46 https://x.com/p1njc70r/status/1980701879987269866/photo/1

    https://www.giskard.ai/knowledge/are-ai-browsers-safe-a-security-and-vulnerability-analysis-of-openai-atlas Takeaways & beyond: +mediums
  17. Resources • https://www.hackaprompt.com/ • Coursera - https://learn.deeplearning.ai/courses/red-teaming-llm-applications • https://arxiv.org/abs/2311.16119 •

    https://huggingface.co/datasets/hackaprompt/hackaprompt-dataset • https://www.latent.space/p/learn-prompting • https://bbycroft.net/llm • https://poloclub.github.io/transformer-explainer/ • OWASP Top 10 for LLM Applications 2025 - https://genai.owasp.org/resource/owasp-top-10-for-llm-applications-2025 • AI Incident Database - https://incidentdatabase.ai • AI Vulnerability Database - https://avidml.org • https://www.giskard.ai/knowledge/how-to-implement-llm-as-a-judge-to-test-ai-agents-part-1 • https://www.giskard.ai/knowledge/how-to-implement-llm-as-a-judge-to-test-ai-agents-part-2 • https://arxiv.org/pdf/2410.08338 • https://tntattacks.github.io/ • https://github.com/p1nox/red-teaming-latent-spaces-workshop