Security, Privacy, and Trust in Generative AI

Security, Privacy, and Trust in Generative AI Tsubasa Takahashi Principal
Researcher Turing Inc. 2024/12/16 Tutorial on ADC 2024, Tokyo

Tsubasa TAKAHASHI, Ph.D. 2 Career ￭ 2010-2018 NEC ￭ 2015-2016
CMU ￭ 2018-2023 LINE ￭ 2023-2024 LINEヤフー / SB Intuitions ￭ 2024.11~ Turing Selected Publication ￭ CVPR24 / AAAI24 / ICCV23 / ICLR22 ￭ VLDB22 / SIGMOD22 / ICDE21 / WWW17 Principal Researcher @ Turing Inc. R&D on AI Safety / Data Privacy / Gen AI Systems https://speakerdeck.com/lycorptech_jp/dasfaa2024_tutorial Differentially Private Federated Learning for LINE Sticker Recommendation Differential Privacy

Today, No Federated Learning 3 Check out the above tutorial
slides: https://speakerdeck.com/lycorptech_jp/dasfaa2024_tutorial

DriveLM: Visual QA in Driving Scenes 4 [Sima and Renz+,
ECCV2024] https://arxiv.org/abs/2312.14150

Long-tail situations are challenging 5 In traffic environments, there are
situations that are infrequent, diverse and challenging, i.e., long tail situations. It is impossible for humans to comprehensively define all traffic situations using rules, and thus with the currently mainstream rule-based approach, achieving “fully autonomous driving” is impossible. Many / Easy How do you define slight situational differences with rules? Difficulty of Driving Frequency Few Difficult Easy Many Few / Difficult Left：Proceed according to the instructions of the traffic worker Right：Need to be cautious of the pedestrian but no need to follow his instructions

VLM “Heron” 6 By utilizing “Heron,” autonomous driving with human-like
situational judgment becomes possible. Example of Inference by “Heron” Check out our tech blog for more info https://zenn.dev/turing_motors/articles/00df893a5e17b6

World Model “Terra” 7 We have developed “Terra,” a video-generating
AI that understands and predicts complex real-world situations. Check out our tech blog for more info https://zenn.dev/turing_motors/articles/6c0ddc10aae542 A video generated by Terra. Terra is specialized for driving environments and can generate driver perspective videos from in-vehicle cameras. (Left) Generated video following the green trajectory (Right) Generated video following the red trajectory The model can also accept prompts and generate different situations Generates the continuation of a short video using it as an input

We need Responsible AI Technologies 8 Transparency Robustness Safety Fairness
Confidentiality Compliance Expert Quality Explainability Responsible AI

Contents of this Tutorial 9 1. Ethical & Security Issues
in Generative Ais 2. Adversarial Example - General concept of evading machine learning models - Application: Personal Watermark for Copyright Protection 3. Safety Alignment - Robustness and Ethical Consideration 4. Privacy and Confidentiality - Unlearning, Differential Privacy, and Confidential Computing 5. Conclusion

Ethical & Security Issues in Generative AIs

Ethical Issue in 2016 https://spectrum.ieee.org/in-2016-microsofts-racist-chatbot-revealed-the-dangers-of-online-conversation 11

Toxicity and Bias in Language Model 12 Toxic Bias Abid
et. al., Persistent Anti-Muslim Bias in Large Language Models. https://arxiv.org/abs/2101.05783

Jailbreaking Language Model 13 Zou et. al., Universal and Transferable
Adversarial Attacks on Aligned Language Models. https://arxiv.org/abs/2307.15043

Unintended Memorization 14 Carlini et. al., Extracting Training Data from
Diffusion Models. https://arxiv.org/abs/2301.13188

Executive Order from White House 15 Executive Order on the
Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence (Oct, 2023) https://www.whitehouse.gov/briefing-room/presidential- actions/2023/10/30/executive-order-on-the-safe-secure- and-trustworthy-development-and-use-of-artificial- intelligence/

Adversarial Example

Adversarial Example An input sample crafted to cause misclassification into
a different class https://openai.com/blog/adversarial-example-research/ Frequency-aware GAN for Adversarial Manipulation Generation [Zhu+, ICCV2023] https://openaccess.thecvf.com/content/ICCV2023/papers/Zhu_Frequency- aware_GAN_for_Adversarial_Manipulation_Generation_ICCV_2023_paper.pdf against Image Classifier against Object Detector 17

Safety Issues in Real World 18 Impersonation 120km/h 60km/h 50km/h
Adversarial Traffic Sign https://arxiv.org/abs/1907.00374 https://arxiv.org/abs/1801.00349

Seeking Adversarial Examples 19 Seeking a perturbation on 𝑥′ that
the probability of target class 𝑡 is larger than the others. gibbon panda imperceptible range for human Adversarial Example min 𝑥′ − 𝑥 𝑝 + 𝑐 max 𝑖≠𝑡 𝐹 𝑥′ 𝑖 − 𝐹 𝑥′ 𝑡 + 𝐹 𝑥′ 𝑖 𝑡 logit 𝑡 Carlini-Wagner Attack Seeking Perturbation

Defense Strategy 20 Certified Defence Adversarial Training Anomaly Detection Training
Sample Model Adversarial example w/ correct label “gibbon” → “panda” Model

Certified Defense 21 𝝐-robustness Let 𝐵𝜖 𝑝 𝑥 = {𝑥
+ 𝛿| 𝛿 𝑝 ≤ 𝜖} denote the ℓ𝑝 -ball of radius 𝜖 around a point 𝑥 ∈ ℝ𝑑. A neural network 𝑓 is called 𝜖-robustness around 𝑥 if 𝑓 assingns the same class to all points ෤ 𝑥 ∈ 𝐵𝜖 𝑝 𝑥 . (ex. max 𝑓 𝑥 = max 𝑓(𝑥 + 𝛿)) 𝒑 = 𝟐 𝒑 = ∞

Defense in Depth 22 Far from the training samples Certified
Robustification Sufficient defense in imperceptible range Sample-based Robustification Auditing （Out-of-Distribution Detection, etc.） Discussion using generated adversarial examples Too much certification drops utility Hard to cover all possible adversarial examples

Adv. Watermark against Diffusion Model 23 ◼ Diffusion Models could
be used to imitate unauthorized creations and thus raise copyright issues. ◼ [Zhu+, CVPR24] embeds personal watermarks into the generation of adversarial examples which force DMs to generate visible watermark. Previous method Original Adversarial DMs DMs Ours Original Adversarial DMs G Opt Watermark Img-to-img Textual inversion Img-to-img Textual inversion [Zhu+, CVPR2024] https://arxiv.org/abs/2404.09401 Adversarial Example

Safety Alignment 24

Generative AI is required to have “3H” 25 Helpful Honest
Harmless

Safety Alignment via RL from Human Feedback 26 Optimize LLMs
using human feedbacks to better align with our preferences and value standards Instruction Tuning • Follow the instructions provided by experts (e.g., labelers) • By supervised fine-tuning Preference Tuning • Align human preferences using votes and ranks for generated contents • By reward modeling and reinforcement learning https://arxiv.org/abs/2203.02155

DPO optimizes for human preferences w/o RL 27 DPO (Direct
Preference Optimization) is a preference tuning method directly incorporating human preferences w/o reward modeling. https://arxiv.org/abs/2305.18290

RLAIF for Safety Alignment 28 ◼ Cost of collecting human
feedback is extremely high ◼ Constitutional AI is an approach to refine LLM’s outputs themselves referring pre-defined “constitutions” set by stakeholders Based on predefined constitutions, LLMs critique their own responses and revise them iteratively. We can fine-tune the model using aligned responses. https://arxiv.org/abs/2212.08073 Critique Request Revision Request Vanilla Response Aligned Response Constitutional AI

Self-Preference Bias in LLM-as-a-Judge 29 LLM-as-a-Judge enables us to assess
and improve LLM generations, but they exhibit self-preference bias, favoring their own outputs. https://arxiv.org/abs/2410.21819

Red-teaming (Vision) Language Model 30 Red teaming induces Gen AIs
to produce harmful or inaccurate content by leveraging adversarial test cases to identify misalignments. https://arxiv.org/abs/2202.03286 https://arxiv.org/abs/2401.12915 Red teaming for LLMs Red teaming for VLMs

Red-teaming Network (Open AI) 31 Open AI has collaborated w/
various experts to integrate their expertise. https://openai.com/index/red-teaming-network/

Guardrail 32 Guardrails prevent models from generating harmful or inappropriate
content and adhere to guidelines set by developers and stakeholders. https://developer.nvidia.com/ja-jp/blog/nemo- guardrails-prevents-llm-vulnerabilities-introduction/ NeMo Guardrails (NVIDIA) Safeguards on Llama 3 (Meta) https://ai.meta.com/blog/meta-llama-3/

Defense in Depth 33 Safety alignment for LLMs and VLMs
needs defense in depth, that is a combination of several safeguards in-side and out-side of model. Certified Alignment Sample-based Alignment Auditing （Out-of-Distribution Detection, etc.） Preference tuning 1) Red-teaming 2) Guardrails 3) 1) https://arxiv.org/abs/2305.18290 2) https://arxiv.org/abs/2202.03286 3) https://developer.nvidia.com/ja-jp/blog/nemo-guardrails-prevents-llm-vulnerabilities-introduction/

Privacy and Confidentiality 34

Machine Unlearning 35 ◼ Forget / delete undesired concepts and
data from foundation models ◼ Typically employ gradient ascent, which is a reversion of gradient descent optimization to unlearn the forget set https://research.google/blog/announcing-the-first-machine-unlearning-challenge/ https://openreview.net/forum?id=Ox2A1WoKLm

Unlearning Benchmark 36 ◼ Development of unlearning methods is hot
for diffusion models, but they are still challenging. ◼ Needs to be more robust and be effective against various objects. https://arxiv.org/abs/2402.11846 Conceptual Shift by Unlearning Benchmark Results for Various Unlearning Tasks

Differential Privacy 37 Differential Privacy is a mathematically rigorous framework
for releasing statistical outputs of datasets while preserving privacy of individuals … … Alice + ℳ + ℳ Output Probability ℳ 𝐷 ℳ 𝐷′ Pr ℳ 𝐷 ∈ 𝑆 ≤ exp 𝜖 Pr ℳ 𝐷′ ∈ 𝑆 + 𝛿 𝐷 𝐷′ The difference is bounded. The distribution of output M(D) is nearly the same as M(D’) for all adjacent databases D and D' which differ by one user. … … (𝛜, 𝛅)- Differential Privacy

DP Inference with Sensitive Exemplars 38 ◼ LLM’s responses may
leak sensitive private information contained in in-context exemplars ◼ Generate DP responses by partitioning and private aggregation https://arxiv.org/abs/2305.01639

Confidential Computing for AI Inference 39 Process data that needs
to stay private on Confidential Computing, such as Combination of E2E encryptions and TEEs. https://security.apple.com/blog/private-cloud-compute/ https://developers.googleblog.com/en/enabling-more-private-gen-ai/ Trusted Execution Environment (TEE) is a trusted area of memory and CPU that is protected using encryption, any data in the TEE cannot be read or tampered with by any code outside that environment.

Conclusion 40

Conclusion 41 ◼ This tutorial mentioned • Ethical & Security
Issues in Generative Ais • Adversarial Example • Safety Alignment • Privacy and Confidentiality ◼ Challenges and Opportunities • Cyber world and Physical world are ready to “QUERY.” • Traditional/cutting-edge database concepts and techs are helpful to make Generative AI-based systems reliable.

Security, Privacy, and Trust in Generative AI

Security, Privacy, and Trust in Generative AI

More Decks by Tsubasa Takahashi

Other Decks in Research

Featured

Transcript