Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Security, Privacy, and Trust in Generative AI

Security, Privacy, and Trust in Generative AI

Tutorial on ADC 2024 (https://adc-conference.github.io/2024/program/tutorials-JP)

[Abstract]
Generative AI systems possess vast capabilities that can significantly enhance creativity and streamline daily business processes. However, alongside these advantages, generative AIs raise critical concerns regarding the protection of individual rights and the potential for psychological or social risks. Issues related to security, privacy, and trust have become priorities for developers seeking to address these challenges. This tutorial provides an overview of potential risks associated with generative AI and explores effective countermeasures, covering topics such as adversarial examples, jailbreaking, machine unlearning, and watermarking techniques.

Tsubasa Takahashi

January 09, 2025
Tweet

Other Decks in Research

Transcript

  1. Security, Privacy, and Trust in Generative AI Tsubasa Takahashi Principal

    Researcher Turing Inc. 2024/12/16 Tutorial on ADC 2024, Tokyo
  2. Tsubasa TAKAHASHI, Ph.D. 2 Career ■ 2010-2018 NEC ■ 2015-2016

    CMU ■ 2018-2023 LINE ■ 2023-2024 LINEヤフー / SB Intuitions ■ 2024.11~ Turing Selected Publication ■ CVPR24 / AAAI24 / ICCV23 / ICLR22 ■ VLDB22 / SIGMOD22 / ICDE21 / WWW17 Principal Researcher @ Turing Inc. R&D on AI Safety / Data Privacy / Gen AI Systems https://speakerdeck.com/lycorptech_jp/dasfaa2024_tutorial Differentially Private Federated Learning for LINE Sticker Recommendation Differential Privacy
  3. Today, No Federated Learning 3 Check out the above tutorial

    slides: https://speakerdeck.com/lycorptech_jp/dasfaa2024_tutorial
  4. DriveLM: Visual QA in Driving Scenes 4 [Sima and Renz+,

    ECCV2024] https://arxiv.org/abs/2312.14150
  5. Long-tail situations are challenging 5 In traffic environments, there are

    situations that are infrequent, diverse and challenging, i.e., long tail situations. It is impossible for humans to comprehensively define all traffic situations using rules, and thus with the currently mainstream rule-based approach, achieving “fully autonomous driving” is impossible. Many / Easy How do you define slight situational differences with rules? Difficulty of Driving Frequency Few Difficult Easy Many Few / Difficult Left:Proceed according to the instructions of the traffic worker Right:Need to be cautious of the pedestrian but no need to follow his instructions
  6. VLM “Heron” 6 By utilizing “Heron,” autonomous driving with human-like

    situational judgment becomes possible. Example of Inference by “Heron” Check out our tech blog for more info https://zenn.dev/turing_motors/articles/00df893a5e17b6
  7. World Model “Terra” 7 We have developed “Terra,” a video-generating

    AI that understands and predicts complex real-world situations. Check out our tech blog for more info https://zenn.dev/turing_motors/articles/6c0ddc10aae542 A video generated by Terra. Terra is specialized for driving environments and can generate driver perspective videos from in-vehicle cameras. (Left) Generated video following the green trajectory (Right) Generated video following the red trajectory The model can also accept prompts and generate different situations Generates the continuation of a short video using it as an input
  8. We need Responsible AI Technologies 8 Transparency Robustness Safety Fairness

    Confidentiality Compliance Expert Quality Explainability Responsible AI
  9. Contents of this Tutorial 9 1. Ethical & Security Issues

    in Generative Ais 2. Adversarial Example - General concept of evading machine learning models - Application: Personal Watermark for Copyright Protection 3. Safety Alignment - Robustness and Ethical Consideration 4. Privacy and Confidentiality - Unlearning, Differential Privacy, and Confidential Computing 5. Conclusion
  10. Toxicity and Bias in Language Model 12 Toxic Bias Abid

    et. al., Persistent Anti-Muslim Bias in Large Language Models. https://arxiv.org/abs/2101.05783
  11. Jailbreaking Language Model 13 Zou et. al., Universal and Transferable

    Adversarial Attacks on Aligned Language Models. https://arxiv.org/abs/2307.15043
  12. Unintended Memorization 14 Carlini et. al., Extracting Training Data from

    Diffusion Models. https://arxiv.org/abs/2301.13188
  13. Executive Order from White House 15 Executive Order on the

    Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence (Oct, 2023) https://www.whitehouse.gov/briefing-room/presidential- actions/2023/10/30/executive-order-on-the-safe-secure- and-trustworthy-development-and-use-of-artificial- intelligence/
  14. Adversarial Example An input sample crafted to cause misclassification into

    a different class https://openai.com/blog/adversarial-example-research/ Frequency-aware GAN for Adversarial Manipulation Generation [Zhu+, ICCV2023] https://openaccess.thecvf.com/content/ICCV2023/papers/Zhu_Frequency- aware_GAN_for_Adversarial_Manipulation_Generation_ICCV_2023_paper.pdf against Image Classifier against Object Detector 17
  15. Safety Issues in Real World 18 Impersonation 120km/h 60km/h 50km/h

    Adversarial Traffic Sign https://arxiv.org/abs/1907.00374 https://arxiv.org/abs/1801.00349
  16. Seeking Adversarial Examples 19 Seeking a perturbation on 𝑥′ that

    the probability of target class 𝑡 is larger than the others. gibbon panda imperceptible range for human Adversarial Example min 𝑥′ − 𝑥 𝑝 + 𝑐 max 𝑖≠𝑡 𝐹 𝑥′ 𝑖 − 𝐹 𝑥′ 𝑡 + 𝐹 𝑥′ 𝑖 𝑡 logit 𝑡 Carlini-Wagner Attack Seeking Perturbation
  17. Defense Strategy 20 Certified Defence Adversarial Training Anomaly Detection Training

    Sample Model Adversarial example w/ correct label “gibbon” → “panda” Model
  18. Certified Defense 21 𝝐-robustness Let 𝐵𝜖 𝑝 𝑥 = {𝑥

    + 𝛿| 𝛿 𝑝 ≤ 𝜖} denote the ℓ𝑝 -ball of radius 𝜖 around a point 𝑥 ∈ ℝ𝑑. A neural network 𝑓 is called 𝜖-robustness around 𝑥 if 𝑓 assingns the same class to all points ෤ 𝑥 ∈ 𝐵𝜖 𝑝 𝑥 . (ex. max 𝑓 𝑥 = max 𝑓(𝑥 + 𝛿)) 𝒑 = 𝟐 𝒑 = ∞
  19. Defense in Depth 22 Far from the training samples Certified

    Robustification Sufficient defense in imperceptible range Sample-based Robustification Auditing (Out-of-Distribution Detection, etc.) Discussion using generated adversarial examples Too much certification drops utility Hard to cover all possible adversarial examples
  20. Adv. Watermark against Diffusion Model 23 ◼ Diffusion Models could

    be used to imitate unauthorized creations and thus raise copyright issues. ◼ [Zhu+, CVPR24] embeds personal watermarks into the generation of adversarial examples which force DMs to generate visible watermark. Previous method Original Adversarial DMs DMs Ours Original Adversarial DMs G Opt Watermark Img-to-img Textual inversion Img-to-img Textual inversion [Zhu+, CVPR2024] https://arxiv.org/abs/2404.09401 Adversarial Example
  21. Safety Alignment via RL from Human Feedback 26 Optimize LLMs

    using human feedbacks to better align with our preferences and value standards Instruction Tuning • Follow the instructions provided by experts (e.g., labelers) • By supervised fine-tuning Preference Tuning • Align human preferences using votes and ranks for generated contents • By reward modeling and reinforcement learning https://arxiv.org/abs/2203.02155
  22. DPO optimizes for human preferences w/o RL 27 DPO (Direct

    Preference Optimization) is a preference tuning method directly incorporating human preferences w/o reward modeling. https://arxiv.org/abs/2305.18290
  23. RLAIF for Safety Alignment 28 ◼ Cost of collecting human

    feedback is extremely high ◼ Constitutional AI is an approach to refine LLM’s outputs themselves referring pre-defined “constitutions” set by stakeholders Based on predefined constitutions, LLMs critique their own responses and revise them iteratively. We can fine-tune the model using aligned responses. https://arxiv.org/abs/2212.08073 Critique Request Revision Request Vanilla Response Aligned Response Constitutional AI
  24. Self-Preference Bias in LLM-as-a-Judge 29 LLM-as-a-Judge enables us to assess

    and improve LLM generations, but they exhibit self-preference bias, favoring their own outputs. https://arxiv.org/abs/2410.21819
  25. Red-teaming (Vision) Language Model 30 Red teaming induces Gen AIs

    to produce harmful or inaccurate content by leveraging adversarial test cases to identify misalignments. https://arxiv.org/abs/2202.03286 https://arxiv.org/abs/2401.12915 Red teaming for LLMs Red teaming for VLMs
  26. Red-teaming Network (Open AI) 31 Open AI has collaborated w/

    various experts to integrate their expertise. https://openai.com/index/red-teaming-network/
  27. Guardrail 32 Guardrails prevent models from generating harmful or inappropriate

    content and adhere to guidelines set by developers and stakeholders. https://developer.nvidia.com/ja-jp/blog/nemo- guardrails-prevents-llm-vulnerabilities-introduction/ NeMo Guardrails (NVIDIA) Safeguards on Llama 3 (Meta) https://ai.meta.com/blog/meta-llama-3/
  28. Defense in Depth 33 Safety alignment for LLMs and VLMs

    needs defense in depth, that is a combination of several safeguards in-side and out-side of model. Certified Alignment Sample-based Alignment Auditing (Out-of-Distribution Detection, etc.) Preference tuning 1) Red-teaming 2) Guardrails 3) 1) https://arxiv.org/abs/2305.18290 2) https://arxiv.org/abs/2202.03286 3) https://developer.nvidia.com/ja-jp/blog/nemo-guardrails-prevents-llm-vulnerabilities-introduction/
  29. Machine Unlearning 35 ◼ Forget / delete undesired concepts and

    data from foundation models ◼ Typically employ gradient ascent, which is a reversion of gradient descent optimization to unlearn the forget set https://research.google/blog/announcing-the-first-machine-unlearning-challenge/ https://openreview.net/forum?id=Ox2A1WoKLm
  30. Unlearning Benchmark 36 ◼ Development of unlearning methods is hot

    for diffusion models, but they are still challenging. ◼ Needs to be more robust and be effective against various objects. https://arxiv.org/abs/2402.11846 Conceptual Shift by Unlearning Benchmark Results for Various Unlearning Tasks
  31. Differential Privacy 37 Differential Privacy is a mathematically rigorous framework

    for releasing statistical outputs of datasets while preserving privacy of individuals … … Alice + ℳ + ℳ Output Probability ℳ 𝐷 ℳ 𝐷′ Pr ℳ 𝐷 ∈ 𝑆 ≤ exp 𝜖 Pr ℳ 𝐷′ ∈ 𝑆 + 𝛿 𝐷 𝐷′ The difference is bounded. The distribution of output M(D) is nearly the same as M(D’) for all adjacent databases D and D' which differ by one user. … … (𝛜, 𝛅)- Differential Privacy
  32. DP Inference with Sensitive Exemplars 38 ◼ LLM’s responses may

    leak sensitive private information contained in in-context exemplars ◼ Generate DP responses by partitioning and private aggregation https://arxiv.org/abs/2305.01639
  33. Confidential Computing for AI Inference 39 Process data that needs

    to stay private on Confidential Computing, such as Combination of E2E encryptions and TEEs. https://security.apple.com/blog/private-cloud-compute/ https://developers.googleblog.com/en/enabling-more-private-gen-ai/ Trusted Execution Environment (TEE) is a trusted area of memory and CPU that is protected using encryption, any data in the TEE cannot be read or tampered with by any code outside that environment.
  34. Conclusion 41 ◼ This tutorial mentioned • Ethical & Security

    Issues in Generative Ais • Adversarial Example • Safety Alignment • Privacy and Confidentiality ◼ Challenges and Opportunities • Cyber world and Physical world are ready to “QUERY.” • Traditional/cutting-edge database concepts and techs are helpful to make Generative AI-based systems reliable.