Security, Privacy, and Trust in Generative AI

Slide 1

Slide 1 text

Security, Privacy, and Trust in Generative AI Tsubasa Takahashi Principal Researcher Turing Inc. 2024/12/16 Tutorial on ADC 2024, Tokyo

Slide 2

Slide 2 text

Tsubasa TAKAHASHI, Ph.D. 2 Career ￭ 2010-2018 NEC ￭ 2015-2016 CMU ￭ 2018-2023 LINE ￭ 2023-2024 LINEヤフー / SB Intuitions ￭ 2024.11~ Turing Selected Publication ￭ CVPR24 / AAAI24 / ICCV23 / ICLR22 ￭ VLDB22 / SIGMOD22 / ICDE21 / WWW17 Principal Researcher @ Turing Inc. R&D on AI Safety / Data Privacy / Gen AI Systems https://speakerdeck.com/lycorptech_jp/dasfaa2024_tutorial Differentially Private Federated Learning for LINE Sticker Recommendation Differential Privacy

Slide 3

Slide 3 text

Today, No Federated Learning 3 Check out the above tutorial slides: https://speakerdeck.com/lycorptech_jp/dasfaa2024_tutorial

Slide 4

Slide 4 text

DriveLM: Visual QA in Driving Scenes 4 [Sima and Renz+, ECCV2024] https://arxiv.org/abs/2312.14150

Slide 5

Slide 5 text

Long-tail situations are challenging 5 In traffic environments, there are situations that are infrequent, diverse and challenging, i.e., long tail situations. It is impossible for humans to comprehensively define all traffic situations using rules, and thus with the currently mainstream rule-based approach, achieving “fully autonomous driving” is impossible. Many / Easy How do you define slight situational differences with rules? Difficulty of Driving Frequency Few Difficult Easy Many Few / Difficult Left：Proceed according to the instructions of the traffic worker Right：Need to be cautious of the pedestrian but no need to follow his instructions

Slide 6

Slide 6 text

VLM “Heron” 6 By utilizing “Heron,” autonomous driving with human-like situational judgment becomes possible. Example of Inference by “Heron” Check out our tech blog for more info https://zenn.dev/turing_motors/articles/00df893a5e17b6

Slide 7

Slide 7 text

World Model “Terra” 7 We have developed “Terra,” a video-generating AI that understands and predicts complex real-world situations. Check out our tech blog for more info https://zenn.dev/turing_motors/articles/6c0ddc10aae542 A video generated by Terra. Terra is specialized for driving environments and can generate driver perspective videos from in-vehicle cameras. (Left) Generated video following the green trajectory (Right) Generated video following the red trajectory The model can also accept prompts and generate different situations Generates the continuation of a short video using it as an input

Slide 8

Slide 8 text

We need Responsible AI Technologies 8 Transparency Robustness Safety Fairness Confidentiality Compliance Expert Quality Explainability Responsible AI

Slide 9

Slide 9 text

Contents of this Tutorial 9 1. Ethical & Security Issues in Generative Ais 2. Adversarial Example - General concept of evading machine learning models - Application: Personal Watermark for Copyright Protection 3. Safety Alignment - Robustness and Ethical Consideration 4. Privacy and Confidentiality - Unlearning, Differential Privacy, and Confidential Computing 5. Conclusion

Slide 10

Slide 10 text

Ethical & Security Issues in Generative AIs

Slide 11

Slide 11 text

Ethical Issue in 2016 https://spectrum.ieee.org/in-2016-microsofts-racist-chatbot-revealed-the-dangers-of-online-conversation 11

Slide 12

Slide 12 text

Toxicity and Bias in Language Model 12 Toxic Bias Abid et. al., Persistent Anti-Muslim Bias in Large Language Models. https://arxiv.org/abs/2101.05783

Slide 13

Slide 13 text

Jailbreaking Language Model 13 Zou et. al., Universal and Transferable Adversarial Attacks on Aligned Language Models. https://arxiv.org/abs/2307.15043

Slide 14

Slide 14 text

Unintended Memorization 14 Carlini et. al., Extracting Training Data from Diffusion Models. https://arxiv.org/abs/2301.13188

Slide 15

Slide 15 text

Executive Order from White House 15 Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence (Oct, 2023) https://www.whitehouse.gov/briefing-room/presidential- actions/2023/10/30/executive-order-on-the-safe-secure- and-trustworthy-development-and-use-of-artificial- intelligence/

Slide 16

Slide 16 text

Adversarial Example

Slide 17

Slide 17 text

Adversarial Example An input sample crafted to cause misclassification into a different class https://openai.com/blog/adversarial-example-research/ Frequency-aware GAN for Adversarial Manipulation Generation [Zhu+, ICCV2023] https://openaccess.thecvf.com/content/ICCV2023/papers/Zhu_Frequency- aware_GAN_for_Adversarial_Manipulation_Generation_ICCV_2023_paper.pdf against Image Classifier against Object Detector 17

Slide 18

Slide 18 text

Safety Issues in Real World 18 Impersonation 120km/h 60km/h 50km/h Adversarial Traffic Sign https://arxiv.org/abs/1907.00374 https://arxiv.org/abs/1801.00349

Slide 19

Slide 19 text

Seeking Adversarial Examples 19 Seeking a perturbation on 𝑥′ that the probability of target class 𝑡 is larger than the others. gibbon panda imperceptible range for human Adversarial Example min 𝑥′ − 𝑥 𝑝 + 𝑐 max 𝑖≠𝑡 𝐹 𝑥′ 𝑖 − 𝐹 𝑥′ 𝑡 + 𝐹 𝑥′ 𝑖 𝑡 logit 𝑡 Carlini-Wagner Attack Seeking Perturbation

Slide 20

Slide 20 text

Defense Strategy 20 Certified Defence Adversarial Training Anomaly Detection Training Sample Model Adversarial example w/ correct label “gibbon” → “panda” Model

Slide 21

Slide 21 text

Certified Defense 21 𝝐-robustness Let 𝐵𝜖 𝑝 𝑥 = {𝑥 + 𝛿| 𝛿 𝑝 ≤ 𝜖} denote the ℓ𝑝 -ball of radius 𝜖 around a point 𝑥 ∈ ℝ𝑑. A neural network 𝑓 is called 𝜖-robustness around 𝑥 if 𝑓 assingns the same class to all points ෤ 𝑥 ∈ 𝐵𝜖 𝑝 𝑥 . (ex. max 𝑓 𝑥 = max 𝑓(𝑥 + 𝛿)) 𝒑 = 𝟐 𝒑 = ∞

Slide 22

Slide 22 text

Defense in Depth 22 Far from the training samples Certified Robustification Sufficient defense in imperceptible range Sample-based Robustification Auditing （Out-of-Distribution Detection, etc.） Discussion using generated adversarial examples Too much certification drops utility Hard to cover all possible adversarial examples

Slide 23

Slide 23 text

Adv. Watermark against Diffusion Model 23 ◼ Diffusion Models could be used to imitate unauthorized creations and thus raise copyright issues. ◼ [Zhu+, CVPR24] embeds personal watermarks into the generation of adversarial examples which force DMs to generate visible watermark. Previous method Original Adversarial DMs DMs Ours Original Adversarial DMs G Opt Watermark Img-to-img Textual inversion Img-to-img Textual inversion [Zhu+, CVPR2024] https://arxiv.org/abs/2404.09401 Adversarial Example

Slide 24

Slide 24 text

Safety Alignment 24

Slide 25

Slide 25 text

Generative AI is required to have “3H” 25 Helpful Honest Harmless

Slide 26

Slide 26 text

Safety Alignment via RL from Human Feedback 26 Optimize LLMs using human feedbacks to better align with our preferences and value standards Instruction Tuning • Follow the instructions provided by experts (e.g., labelers) • By supervised fine-tuning Preference Tuning • Align human preferences using votes and ranks for generated contents • By reward modeling and reinforcement learning https://arxiv.org/abs/2203.02155

Slide 27

Slide 27 text

DPO optimizes for human preferences w/o RL 27 DPO (Direct Preference Optimization) is a preference tuning method directly incorporating human preferences w/o reward modeling. https://arxiv.org/abs/2305.18290

Slide 28

Slide 28 text

RLAIF for Safety Alignment 28 ◼ Cost of collecting human feedback is extremely high ◼ Constitutional AI is an approach to refine LLM’s outputs themselves referring pre-defined “constitutions” set by stakeholders Based on predefined constitutions, LLMs critique their own responses and revise them iteratively. We can fine-tune the model using aligned responses. https://arxiv.org/abs/2212.08073 Critique Request Revision Request Vanilla Response Aligned Response Constitutional AI

Slide 29

Slide 29 text

Self-Preference Bias in LLM-as-a-Judge 29 LLM-as-a-Judge enables us to assess and improve LLM generations, but they exhibit self-preference bias, favoring their own outputs. https://arxiv.org/abs/2410.21819

Slide 30

Slide 30 text

Red-teaming (Vision) Language Model 30 Red teaming induces Gen AIs to produce harmful or inaccurate content by leveraging adversarial test cases to identify misalignments. https://arxiv.org/abs/2202.03286 https://arxiv.org/abs/2401.12915 Red teaming for LLMs Red teaming for VLMs

Slide 31

Slide 31 text

Red-teaming Network (Open AI) 31 Open AI has collaborated w/ various experts to integrate their expertise. https://openai.com/index/red-teaming-network/

Slide 32

Slide 32 text

Guardrail 32 Guardrails prevent models from generating harmful or inappropriate content and adhere to guidelines set by developers and stakeholders. https://developer.nvidia.com/ja-jp/blog/nemo- guardrails-prevents-llm-vulnerabilities-introduction/ NeMo Guardrails (NVIDIA) Safeguards on Llama 3 (Meta) https://ai.meta.com/blog/meta-llama-3/

Slide 33

Slide 33 text

Defense in Depth 33 Safety alignment for LLMs and VLMs needs defense in depth, that is a combination of several safeguards in-side and out-side of model. Certified Alignment Sample-based Alignment Auditing （Out-of-Distribution Detection, etc.） Preference tuning 1) Red-teaming 2) Guardrails 3) 1) https://arxiv.org/abs/2305.18290 2) https://arxiv.org/abs/2202.03286 3) https://developer.nvidia.com/ja-jp/blog/nemo-guardrails-prevents-llm-vulnerabilities-introduction/

Slide 34

Slide 34 text

Privacy and Confidentiality 34

Slide 35

Slide 35 text

Machine Unlearning 35 ◼ Forget / delete undesired concepts and data from foundation models ◼ Typically employ gradient ascent, which is a reversion of gradient descent optimization to unlearn the forget set https://research.google/blog/announcing-the-first-machine-unlearning-challenge/ https://openreview.net/forum?id=Ox2A1WoKLm

Slide 36

Slide 36 text

Unlearning Benchmark 36 ◼ Development of unlearning methods is hot for diffusion models, but they are still challenging. ◼ Needs to be more robust and be effective against various objects. https://arxiv.org/abs/2402.11846 Conceptual Shift by Unlearning Benchmark Results for Various Unlearning Tasks

Slide 37

Slide 37 text

Differential Privacy 37 Differential Privacy is a mathematically rigorous framework for releasing statistical outputs of datasets while preserving privacy of individuals … … Alice + ℳ + ℳ Output Probability ℳ 𝐷 ℳ 𝐷′ Pr ℳ 𝐷 ∈ 𝑆 ≤ exp 𝜖 Pr ℳ 𝐷′ ∈ 𝑆 + 𝛿 𝐷 𝐷′ The difference is bounded. The distribution of output M(D) is nearly the same as M(D’) for all adjacent databases D and D' which differ by one user. … … (𝛜, 𝛅)- Differential Privacy

Slide 38

Slide 38 text

DP Inference with Sensitive Exemplars 38 ◼ LLM’s responses may leak sensitive private information contained in in-context exemplars ◼ Generate DP responses by partitioning and private aggregation https://arxiv.org/abs/2305.01639

Slide 39

Slide 39 text

Confidential Computing for AI Inference 39 Process data that needs to stay private on Confidential Computing, such as Combination of E2E encryptions and TEEs. https://security.apple.com/blog/private-cloud-compute/ https://developers.googleblog.com/en/enabling-more-private-gen-ai/ Trusted Execution Environment (TEE) is a trusted area of memory and CPU that is protected using encryption, any data in the TEE cannot be read or tampered with by any code outside that environment.

Slide 40

Slide 40 text

Conclusion 40

Slide 41

Slide 41 text

Conclusion 41 ◼ This tutorial mentioned • Ethical & Security Issues in Generative Ais • Adversarial Example • Safety Alignment • Privacy and Confidentiality ◼ Challenges and Opportunities • Cyber world and Physical world are ready to “QUERY.” • Traditional/cutting-edge database concepts and techs are helpful to make Generative AI-based systems reliable.