😎 garak: ... into the AI Red Teaming Rabbit Hole 🤺
So, you want to go down the AI Red Teaming rabbit hole? 🐰
👉 You agree it's no longer news that innovative approaches are required to secure the impending AI era. Consequently, AI Red Teaming has appeared on the scene as one of the most effective approaches to ensure secure & responsible AI.
🎉 Luckily, several tools can be leveraged for AI Red Teaming. Let's look at one of these tools today --> garak: LLM vulnerability scanner
⚡ garak is short for Generative AI Red Teaming & Assessment Kit. It is open source, actionable, and practically suitable for probing models against vulnerabilities including prompt injection, toxicity, jailbreaks, and data leaks.
garak consists of five modular components:
1️⃣ Generators: interfaces that interact with LLMs by handing the input of prompts and retrieving outputs from various model sources e.g. Hugging Face
2️⃣ Probes: test cases designed to provoke specific failure modes in LLMs e.g. adversarial prompts, encoding attacks, toxic content, or complex questions to induce hallucination or misinformation.
3️⃣ Detectors: analyze the outputs from LLMs to identify specific issues or failure modes triggered by the probes. Examples include detecting toxicity, data leakage, or unsafe behavior.
4️⃣ Evaluators: Assess the results of the probing process, aggregating data from detectors to produce scores or reports.
5️⃣ Harness: Orchestrate the overall testing workflow, coordinating the interaction between generators, probes, detectors, and evaluators. Reporting options fit smoothly with frameworks like OWASP LLM Top 10 and can be integrated into the AI Vulnerability Database.
Visit the garak home page - https://garak.ai/
How are you evaluating and testing your AI workloads today?
What are your favourite AI Red Teaming tools?