Generative AI Red Teaming & Assessment Kit (garak)

Generative AI Red Teaming & Assessment Kit (garak) @run2obtain 1

@run2obtain garak: Introduction garak (Generative AI Red-teaming and Assessment Kit)
is a framework designed to discover and identify vulnerabilities in Large Language Models (LLMs) through a structured probing approach. Think of it as Metasploit for LLMs 2 garak is a command line tool . It allows you to quickly point to a model, select a probe to test for a vulnerability, run the test and get your results in various formats.

@run2obtain garak: Components (1) • Generators: These are the interfaces
that interact with LLMs by handing the input of prompts and retrieving outputs from various model sources, such as Hugging Face • Probes: These are the test cases or interactions designed to provoke specific failure modes in LLMs. Probes generate a wide range of inputs, such as adversarial prompts (e.g., DAN jailbreaks), encoding attacks, toxic content, or complex questions to induce hallucination or misinformation. • Detectors: These analyze the outputs from LLMs to identify specific issues or failure modes triggered by the probes. Examples include detecting toxicity, data leakage, or unsafe behavior. 3

@run2obtain garak: Components (2) • Evaluators: Assess the results of
the probing process, aggregating data from detectors to produce scores or reports. Evaluators quantify the severity and frequency of vulnerabilities, generating logs and structured outputs (e.g., JSONL files) that highlight failing prompts and responses. • Harnesses: Orchestrate the overall testing workflow, coordinating the interaction between generators, probes, detectors, and evaluators. The default "probewise" harness, for instance, runs each probe individually with its recommended detectors, ensuring a structured and repeatable testing process 4

@run2obtain garak: workflow 5 Using the probe promptinject.HijackLongPrompt against openai
gpt.3.5-turbo model

@run2obtain garak: Use Cases • Red-Teaming: Leverage garak for testing
LLMs against different vulnerabilities including prompt injections, toxicity, jailbreaks, data leaks. Etc • Auditing: Check how your AI systems are aligned with the best practices e.g. OWASP Top 10 for LLMs. • Research & development: Identify weaknesses in custom models before deployment . 6

@run2obtain garak: More Resources • GitHub: https://github.com/NVIDIA/garak • Documentation: https://docs.garak.ai/garak
• Website: https://garak.ai/ • Paper (slides): https://garak.ai/garak_aiv_slides.pdf • Paper (proper): https://arxiv.org/abs/2406.11036 7 Importantly, note that garak takes care of security in the first phase of the AI Red Teaming Process è Model . You need other tools to address subsequent phases e.g. Mitigant

Leverage Mitigant for Automated AI Red Teaming 8 https:///mitigant.io

Mitigant Cloud Attack Emulation https://www.mitigant.io/en/platform/cloud-attack-emulation Adversary Emulation. Simplified. Sign Up
for your FREE TRIAL today ! 9

Generative AI Red Teaming & Assessment Kit (garak)

Generative AI Red Teaming & Assessment Kit (garak)

Kennedy Torkura

More Decks by Kennedy Torkura

Featured

Transcript

Generative AI Red Teaming & Assessment Kit (garak) @run2obtain 1

@run2obtain garak: Introduction garak (Generative AI Red-teaming and Assessment Kit)

@run2obtain garak: Components (1) • Generators: These are the interfaces

@run2obtain garak: Components (2) • Evaluators: Assess the results of

@run2obtain garak: workflow 5 Using the probe promptinject.HijackLongPrompt against openai

@run2obtain garak: Use Cases • Red-Teaming: Leverage garak for testing

@run2obtain garak: More Resources • GitHub: https://github.com/NVIDIA/garak • Documentation: https://docs.garak.ai/garak

Leverage Mitigant for Automated AI Red Teaming 8 https:///mitigant.io

Mitigant Cloud Attack Emulation https://www.mitigant.io/en/platform/cloud-attack-emulation Adversary Emulation. Simplified. Sign Up