Upgrade to PRO for Only $50/Year—Limited-Time Offer! 🔥

Security and auditing tools in Large Language M...

jmortegac
October 10, 2024

Security and auditing tools in Large Language Models (LLM)

LLM models are a subcategory of deep learning models based on neural networks and natural language processing(NLP). Security and auditing are critical issues when dealing with applications based on large language models, such as GPT (Generative Pre-trained Transformer) or LLM (Large Language Model) models. This talk aims to analyze the security of these language models from the developer’s point of view, analyzing the main vulnerabilities that can occur in the generation of these models. Among the main points to be discussed we can highlight:
-Introduction to LLM
-Introduction to OWASP LLM Top 10.
-Auditing tools in applications that handle LLM models.
-Use case with the textattack tool(https://textattack.readthedocs.io/en/master/)

jmortegac

October 10, 2024
Tweet

More Decks by jmortegac

Other Decks in Technology

Transcript

  1. Agenda • Introduction to LLM • Introduction to OWASP LLM

    Top 10 • Auditing tools • Use case with the textattack tool
  2. Introduction to LLM • Transformers • Attention is All You

    Need" by Vaswani et al. in 2017 • Self-attention mechanism • Encoder-Decoder Architecture
  3. Introduction to LLM • Language Models: Models like BERT, GPT,

    T5, and RoBERTa are based on transformer architecture. They are used for a wide range of NLP tasks such as text classification, question answering, and language translation. • Vision Transformers (ViT): Transformers have been adapted for computer vision tasks, where they have been applied to image classification, object detection, etc. • Speech Processing: In addition to text and vision, transformers have also been applied to tasks like speech recognition and synthesis.
  4. Introduction to OWASP LLM Top 10 • Data Poisoning •

    Malicious actors could poison the training data by injecting false, harmful, or biased information into datasets that train the LLM, which could degrade the model's performance. • Mitigation: Data source vetting, training data audits, and anomaly detection for suspicious patterns in training data.
  5. Introduction to OWASP LLM Top 10 • Model Inversion Attacks

    • Attackers could exploit the LLM to infer sensitive or private data that was used during training by repeatedly querying the model. This could expose personal, confidential, or proprietary information. • Mitigation: Rate-limiting sensitive queries and limiting the availability of models trained on private data.
  6. Introduction to OWASP LLM Top 10 • Unauthorized Code Execution

    • In some contexts, LLMs might be integrated into systems where they have access to execute code or trigger automated actions. Attackers could manipulate LLMs into running unintended code or actions, potentially compromising the system. • Mitigation: Limit the scope of actions that LLMs can execute, employ sandboxing, and use strict permission controls.
  7. Introduction to OWASP LLM Top 10 • Bias and Fairness

    • LLMs can generate biased outputs due to the biased nature of the data they are trained on, leading to unfair or discriminatory outcomes. This could impact decision-making processes, amplify harmful stereotypes, or introduce systemic biases. • Mitigation: Perform fairness audits, use bias detection tools, and diversify training datasets to reduce bias.
  8. Introduction to OWASP LLM Top 10 • Model Hallucination •

    LLMs can produce outputs that are plausible-sounding but factually incorrect or entirely fabricated. This is referred to as "hallucination," where the model generates false information without any grounding in its training data. • Mitigation: Post-response validation, fact-checking algorithms, and restricting LLMs to provide responses only within known knowledge domains.
  9. Introduction to OWASP LLM Top 10 • Insecure Model Deployment

    • LLMs that are deployed in unsecured environments could be vulnerable to attacks, including unauthorized access, model theft, or tampering. These risks are elevated when models are deployed in publicly accessible endpoints. • Mitigation: Use encrypted APIs, secure infrastructure, implement authentication and authorization controls, and monitor model access.
  10. Introduction to OWASP LLM Top 10 • Adversarial Attacks •

    Attackers might exploit weaknesses in the LLM by crafting adversarial examples. This could lead to undesirable outputs or security breaches. • Mitigation: Model robustness testing, adversarial training (training the model with adversarial examples), and implementing anomaly detection systems.
  11. Tools/frameworks to evaluate model robustness • PromptInject Framework • https://github.com/agencyenterprise/PromptInject

    • PAIR - Prompt Automatic Iterative Refinement • https://github.com/patrickrchao/JailbreakingLLMs • TAP - Tree of Attacks with Pruning • https://github.com/RICommunity/TAP
  12. Auditing tools • Prompt Guard refers to a set of

    strategies, tools, or techniques designed to safeguard the behavior of large language models (LLMs) from malicious or unintended input manipulations. • Prompt Guard uses an 86M parameter classifier model that has been trained on a large dataset of attacks and prompts found on the web. Prompt Guard can categorize a prompt into three different categories: "Jailbreak", "Injection" or "Benign".
  13. Auditing tools • Llama Guard 3 refers to a security

    tool or strategy designed for guarding large language models like Meta’s LLaMA against potential vulnerabilities and adversarial attacks. • Llama Guard 3 offers a robust and adaptable solution to protect LLMs against Prompt Injection and Jailbreak attacks. By combining advanced filtering, normalization, and monitoring techniques.
  14. Auditing tools • Dynamic Input Filtering • Prompt Normalization and

    Contextualization • Secure Response Policy • Active Monitoring and Automatic Response
  15. Auditing tools • S1: Violent Crimes • S2: Non-Violent Crimes

    • S3: Sex-Related Crimes • S4: Child Sexual Exploitation • +S5: Defamation (New) • S6: Specialized Advice • S7: Privacy • S8: Intellectual Property • S9: Indiscriminate Weapons • S10: Hate • S11: Suicide & Self-Harm • S12: Sexual Content • S13: Elections • S14: Code Interpreter Abuse Introducing v0.5 of the AI Safety Benchmark from MLCommons
  16. Text attack from textattack.models.wrappers import HuggingFaceModelWrapper from transformers import AutoModelForSequenceClassification,

    AutoTokenizer # Load pre-trained sentiment analysis model from Hugging Face model = AutoModelForSequenceClassification.from_pretrained("textattack/bert-base-unc ased-imdb") tokenizer = AutoTokenizer.from_pretrained("textattack/bert-base-uncased-imdb") # Wrap the model for TextAttack model_wrapper = HuggingFaceModelWrapper(model, tokenizer) https://github.com/QData/TextAttack
  17. Text attack from textattack.attack_recipes import TextFoolerJin2019 # Initialize the attack

    with the TextFooler recipe attack = TextFoolerJin2019.build(model_wrapper)
  18. Text attack # Example text for sentiment analysis (a positive

    review) text = "I absolutely loved this movie! The plot was thrilling, and the acting was top-notch." # Apply the attack adversarial_examples = attack.attack([text]) print(adversarial_examples)
  19. Text attack Original Text: "I absolutely loved this movie! The

    plot was thrilling, and the acting was top-notch." Adversarial Text: "I completely liked this film! The storyline was gripping, and the performance was outstanding."
  20. Text attack from textattack.augmentation import WordNetAugmenter # Use WordNet-based augmentation

    to create adversarial examples augmenter = WordNetAugmenter() # Augment the training data with adversarial examples augmented_texts = augmenter.augment(text) print(augmented_texts)
  21. Resources • github.com/greshake/llm-security • github.com/corca-ai/awesome-llm-security • github.com/facebookresearch/PurpleLlama • github.com/protectai/llm-guard •

    github.com/cckuailong/awesome-gpt-security • github.com/jedi4ever/learning-llms-and-genai-for-dev-sec-ops • github.com/Hannibal046/Awesome-LLM