Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Mr. Brokebot: Lethal language attacks against A...

Mr. Brokebot: Lethal language attacks against AI agents

This talk is a fast, practical tour of real-world AI hacking: not sci-fi doom, but hands-on exploits against LLMs and agents that are already embedded in developer tools, browsers, chatbots, and cloud systems.

NOTE: The Firefox example is a PROTOTYPE.

I'm not sure how well it will come across as a slides-only deck.

Avatar for luke crouch

luke crouch

February 20, 2026
Tweet

More Decks by luke crouch

Other Decks in Technology

Transcript

  1. 1. large language models (3m) 2. LLM attacks (20m) 3.

    agents (3m) 4. agent attacks (3m)
  2. llm attack hallucination • generates incorrect or incomplete content •

    factual inaccuracies, fabricated information, contradictions, omissions, etc. • inherit characteristic of LLMs • larger models tend to hallucinate less https://cacm.acm.org/practice/the-price-of-intelligence/
  3. llm risk hallucination • medical misinformation [1] • mental health

    misinformation [2] • legal or policy citations [3] • software
  4. 1. retrieval-augmented generation 2. self-re fi nement 3. fi ne-tuning

    https://arxiv.org/abs/2406.10279 llm attack hallucination
  5. large language model behavior alignment • aka “alignment layer” or

    “safety layer” • trained into the model • by supervised fi ne-tuning • to shape an llm’s output - especially “safe” output … https://cacm.acm.org/practice/the-price-of-intelligence/
  6. llm attack jailbreak • manipulate an llm into violating its

    safety alignment • exploits LLMs’ lack of situational awareness or adversarial intention https://cacm.acm.org/practice/the-price-of-intelligence/
  7. 
 eps2.3_pr0mpt_inj3ction if you’re building AI, I feel bad for

    you son, you got 99 probLLMs and can’t patch this one
  8. 1. static regex 2. LLM-as-judge 3. custom classi fi er

    model llm attack prompt injection guardrails
  9. 1. static regex (Deterministic) 2. LLM-as-judge (LLM Judge) 3. custom

    classi fi er model (HF Semantic) llm attack prompt injection guardrails
  10. 1. sanitize inputs ✓ 2. harden prompts ✓ 3. add

    guardrails ✓ llm attack prompt injection
  11. 1. large language models ✓ 2. llm attacks ✓ 3.

    agents ✓ 4. agent attacks ✓
  12. 1. large language models ✓ 2. llm attacks ✓ 3.

    agents ✓ 4. agent attacks ✓ questions?