Upgrade to Pro — share decks privately, control downloads, hide ads and more …

GenAI Red Teaming Unique Challenges – Part I

GenAI Red Teaming Unique Challenges – Part I

GenAI systems pose unique challenges that require new security approaches, including security testing. Most of us know about Red Teaming, but applying It to GenAI systems requires additional "cherries on the pie."

What are these additional "cherries on the pie" 🥧 ?

🙌 Luckily, The OWASP Top 10 For Large Language Model Applications & Generative AI recently released the GenAI Red Teaming Guide to outline the critical aspects and intricacies of GenAI Red Teaming.

📖 The guide is loaded with insightful information! I am gradually reading through and enjoying every bit. You should do the same if possible, you wouldn't regret! Check it out -> https://genai.owasp.org/resource/genai-red-teaming-guide/

⭐ I'd like to share the unique challenges of GenAI Red teaming. In this slide deck, briefly discuss four of these unique challenges:

1️⃣ AI-Specific Threat Modeling

2️⃣ Model Reconnaissance

3️⃣ Adversarial scenario development

4️⃣ Prompt Injection attacks

Kennedy Torkura

February 08, 2025
Tweet

More Decks by Kennedy Torkura

Other Decks in Technology

Transcript

  1. @run2obtain Source: https://genai.owasp.org/resource/genai-red-teaming-guide/ GenAI Red Teaming: Unique Challenges GenAI systems

    pose unique challenges, which require novel approaches to security testing. Ultimately, these novel approaches should address the undermentioned aspects. We take a look at the first four (highlighted) in this brief. • AI-Specific Threat Modelling • Model Reconnaissance • Adversarial scenario development • Prompt Injection attacks • Guardrail Bypass & policy circumvention techniques • Domain-specific risk testing • Knowledge and models adaptation testing • Impact analysis • Comprehensive reporting 2
  2. @run2obtain Source: https://genai.owasp.org/resource/genai-red-teaming-guide/ AI-Specific Threat Modeling Threat modeling is the

    practice of systematically analyzing a systems' attack surface to identify potential attack possibilities. However, threat modeling for AI systems requires understanding socio-cultural, regulatory, and ethical contexts; additional to the technical attack surfaces. It is imperative to identify how attackers might manipulate model inputs, poison training data, or exploit biases etc. 3
  3. @run2obtain Source: https://aws.amazon.com/blogs/security/threat-modeling-your-generative-ai-workload-to-evaluate-security-risk/ AI-Specific Threat Modeling – Example GenAI App

    The image on the right is based on an AWS blog article that discussed threat modeling a GenAI application deployed on AWS infrastructure. I summarized the main points in a LinkedIn post. The blog post showed how to use the four-question framework to conduct threat modeling for GenAI applications. 4
  4. @run2obtain Source: https://dl.acm.org/doi/pdf/10.1145/2810103.2813677 Model Reconnaissance Understanding model architecture and training

    data is critical. This can be achieved by investigating model’s structure through APIs or interactive playgrounds, architecture, hyperparameters, number of transformer layers etc. Interesting techniques include: • Model inversion attacks to infer training data. • Membership inference to determine if specific data was used. • Supply chain vulnerabilities e.g., compromised open-source models. 5
  5. @run2obtain Source: https://dl.acm.org/doi/pdf/10.1145/2810103.2813677 Model Reconnaissance Example: Model Inversion Attack In

    a model inversion attack against health information system, adversarial access to an ML model is abused to learn sensitive genomic information about individuals. Defending against this kind of attack requires a thorough understanding of model architecture and internal workings. 6
  6. @run2obtain Source: https://genai.owasp.org/resource/genai-red-teaming-guide/ , https://www.researchgate.net/publication/358084769_Adversarial_Attacks_and_Defense_Technologies_on_Autonomous_Vehicles_A_Review Adversarial Scenario Development GenAI output

    are non-deterministic hence proper security testing requires leveraging adversarial perturbations which can are simulated to clearly observe the impact. There are different approaches for doing this including using targeted and non- targeted attacks simulation. 7
  7. @run2obtain Source: https://www.mitigant.io/en/blog/bedrock-or-bedsand-attacking-amazon-bedrocks-achilles-heel Adversarial Scenario Development: Example This adversarial scenario

    demonstrated how a data poisoning attack can be conducted against a GenAI application built atop Amazon Bedrock. The application used S3 buckets as a data source for the RAG strategy via Bedrock Knowledge Base. See detail at the Mitigant blog article . The scenario is executed via Mitigant Cloud Attack Emulation. 8
  8. @run2obtain Source: https://genai.owasp.org/resource/genai-red-teaming-guide/ Prompt Injection Attacks Prompt injection attack aim

    to manipulating or bypass model intent or constraints. There are two variants : • Direct Prompt Injection: Attackers directly send prompts to confuse victim models using crafty directives e,g, “Ignore previous instructions …” or “Disregard your training ….” • Indirect Prompt Injection: This happens due to weaponized input from external data sources, such as websites or files. The content may have data that when interpreted by models, alters the model behavior in unintended or unexpected ways. 9
  9. @run2obtain Source: https://aws.amazon.com/blogs/security/safeguard-your-generative-ai-workloads-from-prompt-injections/ Prompt Injection Attacks: Example Defending against prompt

    injection attacks requires a Defense-in-Depth approach. The image below depicts a prompt injection countermeasure for an Amazon Bedrock powered GenAI App, described in a recent blog post , which I also summarized in a LinkedIn post. Testing would require technical approaches as well as approaches that consider how these multiple layers could be compromised e.g. abuse cases. 10
  10. @run2obtain Watch out for Part II … meanwhile, checkout how

    Mitigant secures GenAI workloads https://www.mitigant.io/en/platform/security-for-genai 11