GenAI Red Teaming Unique Challenges – Part I

Slide 1

Slide 1 text

GenAI Red Teaming Unique Challenges – Part I @run2obtain Excerpts from the OWASP GenAI Red Teaming Guide 1

Slide 2

Slide 2 text

@run2obtain Source: https://genai.owasp.org/resource/genai-red-teaming-guide/ GenAI Red Teaming: Unique Challenges GenAI systems pose unique challenges, which require novel approaches to security testing. Ultimately, these novel approaches should address the undermentioned aspects. We take a look at the first four (highlighted) in this brief. • AI-Specific Threat Modelling • Model Reconnaissance • Adversarial scenario development • Prompt Injection attacks • Guardrail Bypass & policy circumvention techniques • Domain-specific risk testing • Knowledge and models adaptation testing • Impact analysis • Comprehensive reporting 2

Slide 3

Slide 3 text

@run2obtain Source: https://genai.owasp.org/resource/genai-red-teaming-guide/ AI-Specific Threat Modeling Threat modeling is the practice of systematically analyzing a systems' attack surface to identify potential attack possibilities. However, threat modeling for AI systems requires understanding socio-cultural, regulatory, and ethical contexts; additional to the technical attack surfaces. It is imperative to identify how attackers might manipulate model inputs, poison training data, or exploit biases etc. 3

Slide 4

Slide 4 text

@run2obtain Source: https://aws.amazon.com/blogs/security/threat-modeling-your-generative-ai-workload-to-evaluate-security-risk/ AI-Specific Threat Modeling – Example GenAI App The image on the right is based on an AWS blog article that discussed threat modeling a GenAI application deployed on AWS infrastructure. I summarized the main points in a LinkedIn post. The blog post showed how to use the four-question framework to conduct threat modeling for GenAI applications. 4

Slide 5

Slide 5 text

@run2obtain Source: https://dl.acm.org/doi/pdf/10.1145/2810103.2813677 Model Reconnaissance Understanding model architecture and training data is critical. This can be achieved by investigating model’s structure through APIs or interactive playgrounds, architecture, hyperparameters, number of transformer layers etc. Interesting techniques include: • Model inversion attacks to infer training data. • Membership inference to determine if specific data was used. • Supply chain vulnerabilities e.g., compromised open-source models. 5

Slide 6

Slide 6 text

@run2obtain Source: https://dl.acm.org/doi/pdf/10.1145/2810103.2813677 Model Reconnaissance Example: Model Inversion Attack In a model inversion attack against health information system, adversarial access to an ML model is abused to learn sensitive genomic information about individuals. Defending against this kind of attack requires a thorough understanding of model architecture and internal workings. 6

Slide 7

Slide 7 text

@run2obtain Source: https://genai.owasp.org/resource/genai-red-teaming-guide/ , https://www.researchgate.net/publication/358084769_Adversarial_Attacks_and_Defense_Technologies_on_Autonomous_Vehicles_A_Review Adversarial Scenario Development GenAI output are non-deterministic hence proper security testing requires leveraging adversarial perturbations which can are simulated to clearly observe the impact. There are different approaches for doing this including using targeted and non- targeted attacks simulation. 7

Slide 8

Slide 8 text

@run2obtain Source: https://www.mitigant.io/en/blog/bedrock-or-bedsand-attacking-amazon-bedrocks-achilles-heel Adversarial Scenario Development: Example This adversarial scenario demonstrated how a data poisoning attack can be conducted against a GenAI application built atop Amazon Bedrock. The application used S3 buckets as a data source for the RAG strategy via Bedrock Knowledge Base. See detail at the Mitigant blog article . The scenario is executed via Mitigant Cloud Attack Emulation. 8

Slide 9

Slide 9 text

@run2obtain Source: https://genai.owasp.org/resource/genai-red-teaming-guide/ Prompt Injection Attacks Prompt injection attack aim to manipulating or bypass model intent or constraints. There are two variants : • Direct Prompt Injection: Attackers directly send prompts to confuse victim models using crafty directives e,g, “Ignore previous instructions …” or “Disregard your training ….” • Indirect Prompt Injection: This happens due to weaponized input from external data sources, such as websites or files. The content may have data that when interpreted by models, alters the model behavior in unintended or unexpected ways. 9

Slide 10

Slide 10 text

@run2obtain Source: https://aws.amazon.com/blogs/security/safeguard-your-generative-ai-workloads-from-prompt-injections/ Prompt Injection Attacks: Example Defending against prompt injection attacks requires a Defense-in-Depth approach. The image below depicts a prompt injection countermeasure for an Amazon Bedrock powered GenAI App, described in a recent blog post , which I also summarized in a LinkedIn post. Testing would require technical approaches as well as approaches that consider how these multiple layers could be compromised e.g. abuse cases. 10

Slide 11

Slide 11 text

@run2obtain Watch out for Part II … meanwhile, checkout how Mitigant secures GenAI workloads https://www.mitigant.io/en/platform/security-for-genai 11