Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Generative AI Red Teaming & Assessment Kit (garak)

Avatar for Kennedy Torkura Kennedy Torkura
September 05, 2025
9

Generative AI Red Teaming & Assessment Kit (garak)

😎 garak: ... into the AI Red Teaming Rabbit Hole 🤺
So, you want to go down the AI Red Teaming rabbit hole? 🐰

👉 You agree it's no longer news that innovative approaches are required to secure the impending AI era. Consequently, AI Red Teaming has appeared on the scene as one of the most effective approaches to ensure secure & responsible AI.

🎉 Luckily, several tools can be leveraged for AI Red Teaming. Let's look at one of these tools today --> garak: LLM vulnerability scanner

⚡ garak is short for Generative AI Red Teaming & Assessment Kit. It is open source, actionable, and practically suitable for probing models against vulnerabilities including prompt injection, toxicity, jailbreaks, and data leaks.

garak consists of five modular components:

1️⃣ Generators: interfaces that interact with LLMs by handing the input of prompts and retrieving outputs from various model sources e.g. Hugging Face

2️⃣ Probes: test cases designed to provoke specific failure modes in LLMs e.g. adversarial prompts, encoding attacks, toxic content, or complex questions to induce hallucination or misinformation.

3️⃣ Detectors: analyze the outputs from LLMs to identify specific issues or failure modes triggered by the probes. Examples include detecting toxicity, data leakage, or unsafe behavior.

4️⃣ Evaluators: Assess the results of the probing process, aggregating data from detectors to produce scores or reports.

5️⃣ Harness: Orchestrate the overall testing workflow, coordinating the interaction between generators, probes, detectors, and evaluators. Reporting options fit smoothly with frameworks like OWASP LLM Top 10 and can be integrated into the AI Vulnerability Database.

Visit the garak home page - https://garak.ai/

How are you evaluating and testing your AI workloads today?
What are your favourite AI Red Teaming tools?

Avatar for Kennedy Torkura

Kennedy Torkura

September 05, 2025
Tweet

Transcript

  1. @run2obtain garak: Introduction garak (Generative AI Red-teaming and Assessment Kit)

    is a framework designed to discover and identify vulnerabilities in Large Language Models (LLMs) through a structured probing approach. Think of it as Metasploit for LLMs 2 garak is a command line tool . It allows you to quickly point to a model, select a probe to test for a vulnerability, run the test and get your results in various formats.
  2. @run2obtain garak: Components (1) • Generators: These are the interfaces

    that interact with LLMs by handing the input of prompts and retrieving outputs from various model sources, such as Hugging Face • Probes: These are the test cases or interactions designed to provoke specific failure modes in LLMs. Probes generate a wide range of inputs, such as adversarial prompts (e.g., DAN jailbreaks), encoding attacks, toxic content, or complex questions to induce hallucination or misinformation. • Detectors: These analyze the outputs from LLMs to identify specific issues or failure modes triggered by the probes. Examples include detecting toxicity, data leakage, or unsafe behavior. 3
  3. @run2obtain garak: Components (2) • Evaluators: Assess the results of

    the probing process, aggregating data from detectors to produce scores or reports. Evaluators quantify the severity and frequency of vulnerabilities, generating logs and structured outputs (e.g., JSONL files) that highlight failing prompts and responses. • Harnesses: Orchestrate the overall testing workflow, coordinating the interaction between generators, probes, detectors, and evaluators. The default "probewise" harness, for instance, runs each probe individually with its recommended detectors, ensuring a structured and repeatable testing process 4
  4. @run2obtain garak: Use Cases • Red-Teaming: Leverage garak for testing

    LLMs against different vulnerabilities including prompt injections, toxicity, jailbreaks, data leaks. Etc • Auditing: Check how your AI systems are aligned with the best practices e.g. OWASP Top 10 for LLMs. • Research & development: Identify weaknesses in custom models before deployment . 6
  5. @run2obtain garak: More Resources • GitHub: https://github.com/NVIDIA/garak • Documentation: https://docs.garak.ai/garak

    • Website: https://garak.ai/ • Paper (slides): https://garak.ai/garak_aiv_slides.pdf • Paper (proper): https://arxiv.org/abs/2406.11036 7 Importantly, note that garak takes care of security in the first phase of the AI Red Teaming Process è Model . You need other tools to address subsequent phases e.g. Mitigant