Defending deep learning from adversarial attacks

Defending deep learning from adversarial attacks — Svetlana Levitan, PhD
Developer Advocate Center for Open-source Data and AI Technologies IBM Cloud and Cognitive Software August 21, 2019

Intro to trusted AI and NN Adversarial attacks and defenses
Adversarial Robustness Toolbox AIF360 and AIX360, MAX and DAX Links and resources Cloud and Cognitive Applications/ August 21, 2019 / © 2019 IBM Corporation 2 Contents

IBM Cloud and Cognitive Software/August 21, 2019 / © 2019
IBM Corporation 3 and open standards

Center for Open Source Data and AI Technologies CODAIT codait.org
codait (French) = coder/coded https://m.interglot.com/fr/en/codait CODAIT aims to make AI solutions dramatically easier to create, deploy, and manage in the enterprise Relaunch of the Spark Technology Center (STC) to reflect expanded mission 4

5 × 2019 IBM Corporation AI IS NOW USED IN
MANY HIGH-STAKES DECISION MAKING APPLICATIONS Credit Employme nt Admissio n Healthcare Sentencing Is it fair? Is it easy to understand? Did anyone tamper with it? Is it accountable? WHAT DOES IT TAKE TO TRUST AI DECISIONS? (BEYOND ACCURACY)

Backpropagation Labeled Training Data Coat Sneaker T-shirt Sneaker Pullover Output
Errors Pullover Coat Coat Sneaker T-shirt ❌ ❌ ❌ Fashion-MNIST dataset by Zalando Research, on GitHub <https://github.com/zalandoresearch/fashion-mnist> (MIT License). Slide created by Bradley Holt

Input Output Sneaker 98% Neural Network Inferencing Fashion-MNIST dataset by
Zalando Research, on GitHub <https://github.com/zalandoresearch/fashion-mnist> (MIT License). Slide created by Bradley Holt

Convolutional Neural Networks IBM Cloud and Cognitive Software/August 21, 2019
/ © 2019 IBM Corporation 8 https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53

Adversarial Robustness 9

Deep learning and adversarial attacks 10 Deep Learning models are
now used in many areas Can we trust them?

11 https://arxiv.org/pdf/1707.0 8945.pdf A Scarier example

Data Attacker Neural Network poison train input perturb output result
$$$ benefit Adversarial Threats to AI Evasion attacks ▪ Performed at test time ▪ Perturb inputs with crafted noise ▪ Model fails to predict correctly ▪ Undetectable by humans Poisoning attacks ▪ Performed at training time ▪ Insert poisoned sample in training data ▪ Use backdoor later

Exposure to poisoning • Could the attacker have created backdoors
via poisoning of training data? Plausible deniability • How important is it for the adversary to use adversarial samples with strong resemblance to the original inputs? Type I vs type II errors • Is the attacker trying to bypass safeguards or aiming to cause false alarms? • What are the costs associated with such errors? Black vs white box • What knowledge does the attacker have about the AI model? • How does the attacker access the AI model? • Limitations to the number of queries? 13 Threat Models

Evasion attacks – an analysis 14 Why do adversarial examples
exist? • Unless test error is 0%, there is always room for adversarial samples. • Attacks push inputs across the decision boundary. • Surprising: proximity of the nearest decision boundary! [Gilmer et al., 2018. Adversarial Spheres. https://arxiv.org/abs/1801.02774]

exist? Linearity hypothesis: • Neural Network outputs extrapolate linearly as a function of their inputs. • Adversarial examples push DNNs quickly outside their designated “operating range”. • Adversarial directions form a subspace. [Goodfellow et al., 2014. Explaining and Harnessing Adversarial Examples. https://arxiv.org/abs/1412.6572; Fawzi et al., 2016. Robustness of Classifiers: From Adversarial to Random Noise. Advances in Neural Information Processing Systems (NIPS)] Adversarial direction Model logits (10 classes)

exist? Fooling images: • DNNs don’t learn actually to recognize e.g. a schoolbus, but to discriminate it from any other object in the training set. [Nguyen et al., 2014. Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images. https://arxiv.org/abs/1412.1897]

Robustness Metrics Detection Model Hardening Static Data Preprocessing Model Design
Statistical Tests Detector Networks Bayesian Uncertainty Attack Independent Attack Specific Adversarial Training Dynamic Gaussian Data Augmentation Feature Squeezing Label Smoothing Shattered Gradients Stochastic Gradients Saddlepoint Optimization Dimensionality Reduction BReLUs CLEVER Global Lipschitz Bound Loss Sensitivity Minimal Perturbation Adversarial Success Rates MMD Kernel Density Estimates Local Intrinsic Dimensionality Magnet Detectors on Inputs Detectors on Internal Representations Dropout Uncertainty Bayesian SVMs How to defend? Taxonomy of defenses

How to defend? 18 Adversarial training • Train DNNs solely
on adversarial samples • Increase DNN capacity to maintain accuracy on clean data • Use specific algorithm for crafting the adversarial samples [Madry et al., 2017. Towards Deep Learning Models Resistant to Adversarial Attacks. https://arxiv.org/abs/1706.06083] Performance on CIFAR-10 data Data Model Accuracy Original A 87.3% PGD-20 A 45.8% PGD-7 A’ 64.2% FGSM Anat 85.6%

How to defend? 19 Preprocessing data • Process samples in
order to remove adversarial noise • Input the cleaned samples to the classifier • Somewhat effective, however can be easily defeated by an adaptive adversary. Feature squeezing [W. Xu, D. Evans, and Y. Qi. Feature squeezing: Detecting adversarial examples in deep neural networks. CoRR, abs/1704.01155, 2017a.]

How to defend? CLEVER Estimate Lipschitz constant to construct an
ϵ- ball within which all images are correctly classified. 20 https://bigcheck.mybluemix.net We fit the cross Lipschitz constant samples in S (see Algorithm 1) with reverse Weibull class distribution to obtain the maximum likelihood estimate of the location parameter ˆ aW , scale parameter ˆ bW and shape parameter ˆ cW , as introduced in Theorem 4.1. To validate that reverse Weibull distribution isagood fit to theempirical distribution of thecross Lipschitz constant samples, weconduct Kolmogorov-Smirnov goodness-of-fit test (a.k.a. K-S test) to calculate the K-S test statistics D and corresponding p-values. Thenull hypothesis isthat samples S follow areverseWeibull distribution. Figure 2 plots the probability distribution function of the cross Lipschitz constant samples and the fitted Reverse Weibull distribution for images from various data sets and network architectures. The estimated MLE parameters, p-values, and the K-S test statistics D are also shown. We also calculate thepercentage of exampleswhoseestimation havep-valuesgreater than 0.05, asillustrated in Figure 3. If the p-value is greater than 0.05, the null hypothesis cannot be rejected, meaning that the underlying data samples fit a reverse Weibull distribution well. Figure 3 shows that all numbers are close to 100%, validating the use of reverse Weibull distribution as an underlying distribution of gradient norm samples empirically. Therefore, the fitted location parameter of reverse Weibull distribution (i.e., the extreme value), ˆ aW , can be used as a good estimation of local cross Lipschitz constant to calculate theCLEVER score. Theexact numbers areshown in Table 5 in Appendix E. (a) CIFAR-MLP (b) MNIST-CNN (c) ImageNet-MobileNet Figure2: ThecrossLipschitz constant samplesfor threeimagesfrom CIFAR, MNIST and ImageNet datasets, and their fitted Reverse Weibull distributions with the corresponding MLE estimates of location, scale and shape parameters (aW , bW , cW ) shown on the top of each plot. The D-statistics of K-Stest and p-valuesaredenoted asksand pval. With small ksand high p-value, thehypothesized reverse Weibull distribution fits the empirical distribution of cross Lipschitz constant samples well. 80 85 90 95 100 percentage (%) MobileNet Resnet Inception CIFAR-BReLU CIFAR-DD CIFAR-CNN CIFAR-MLP MNIST-BReLU MNIST-DD MNIST-CNN MNIST-MLP p = 1 p = 2 (a) Least likely target 80 85 90 95 100 percentage (%) MobileNet Resnet Inception CIFAR-BReLU CIFAR-DD CIFAR-CNN CIFAR-MLP MNIST-BReLU MNIST-DD MNIST-CNN MNIST-MLP p = 1 p = 2 (b) Random target 80 85 90 95 100 percentage (%) MobileNet Resnet Inception CIFAR-BReLU CIFAR-DD CIFAR-CNN CIFAR-MLP MNIST-BReLU MNIST-DD MNIST-CNN MNIST-MLP p = 1 p = 2 (c) Top 2 target Proposal Highlights & Preliminary Results Abstract: Although neural networks are becoming the core engine for driving Artificial Intelligence (AI) research and technology at an unprecedented speed, recent studies have highlighted their lack of model robustness to adversarial attacks, giving rise to new safety/security challenges in both the digital space and the physical world. In order to address the emerging AI-security issue, this proposal aims to provide a certified robustness evaluation framework that jointly takes into consideration an arbitrary neural network model and its underlying datasets. Specifically, we aim at developing an attack-agnostic robustness metric to evaluate the robustness of neural network classifiers. We further aim at providing efficient data-driven schemes to improve model robustness by pinpointing exemplary anchor points inferred from the underlying datasets. Introduction Neural network classifiers are easily fooled by adversarial perturbations Visual illustration of adversarial examples crafted by adversarial attack algorithms in [2]. The original example (a) is an ostrich image selected from the ImageNet dataset. The adversarial examples in (b) are classified as the target class labels (safe, shoe shop and vacuum respectively) by the Inception-v3 model Motivations How do we evaluate the robustness of a neural network? • Upper bounds: Current robustness measure of neural network models are mostly dependent on attack methods e.g. distortions found by FGSM, I-FGSM, DeepFool, C&W attacks, etc. • Lower bounds: Theoretical robustness guarantees are limited Our goal: Devise attack-agnostic robustness metric for neural networks We proved that the robustness of a network is related to its local Lipschitz constant, which can be evaluated numerically via extreme value theory. Our approach [1]: • Targeted attack • untargeted attack Our approach – Cross Lipschitz Extreme Value for nEtwork Robustness (more results in [1]): [3, Hein] [4, Bastani] MNIST: least likely target CIFAR: least likely target ImageNet: least likely target Comparison of L-inf distortion l ([email protected]), Lily Weng ([email protected]), Pin-Yu Chen (Pin- models are mostly dependent on attack methods e.g. distortions found by FGSM, I-FGSM, DeepFool, C&W attacks, etc. • Lower bounds: Theoretical robustness guarantees are limited We proved that the robustness of a network is related to its local Lipschitz constant, which can be evaluated numerically via extreme value theory. Our approach [1]: • Targeted attack • untargeted attack Etwork Robustness (more results in [1]): of Neural Networks: An Extreme Value Theory Approach,“ ICLR 2018 s of a classifier against adversarial manipulation,“ NIPS 2017 s,“ NIPS 2016 [3, Hein] [4, Bastani] ia adversarial examples,“ AAAI 2018 likely target ImageNet: least likely target Comparison of the CLEVER score calculated with {50,100,250,500} samples and the L2 distortion by CW attack on ImageNet models nal of Global Optimization, 1996 [Weng et al., 2018. Evaluating the Robustness of Neural Networks: An Extreme Value Theory Approach. ICLR 2018]

How to defend? 21 Poisoning detection Poisoned MNIST sample (will
be classified as ‘1’ by poisoned model with high probability). Unsupervised clustering of training data based on DNN internal activations: Discovers partition of poisonous vs normal training samples.

The Adversarial Robustness Toolbox (ART) 22 • Library for adversarial
machine learning • Baseline implementation of attacks and defenses for classifiers • Dedicated to images • MIT license • Supported frameworks:

23 ART Demo: https://art-demo.mybluemix.net/

The Python code behind this

ART Demo: https://art-demo.mybluemix.net/ For more info on ART see: https://github.com/IBM/adversarial-robustness-toolbox

AI Fairness 360 AIF360 IBM Cloud and Cognitive Software/August 21,
2019 / © 2019 IBM Corporation 26

Are computer-generated results free of bias? IBM Cloud and Cognitive
Software/August 21, 2019 / © 2019 IBM Corporation 27 https://www.propublica.org/article/machine-bias-risk-assessments-in- criminal-sentencing

AIFairness 360 IBM Cloud and Cognitive Software/August 21, 2019 /
© 2019 IBM Corporation 28 Open source Python library 70+ Fairness metrics and explanations 10 Bias mitigation algorithms

32 × 2019 IBM Corporation AIX360: DIFFERENT WAYS TO EXPLAIN
One explanation does not fit all Different stakeholders require explanations for different purposes and with different objectives, and explanations will have to be tailored to their needs. End users/customers (trust) Doctors: Why did you recommend this treatment? Customers: Why was my loan denied? Teachers: Why was my teaching evaluated in this way? Gov’t/regulators (compliance, safety) Prove to me that you didn't discriminate. Developers (quality, “debuggability”) Is our system performing well? How can we improve it?

data model samples features local global direct Understand data or
model? Explanations as samples, distributions or features? distributions tabular image text ProtoDash (Case-based reasoning) DIP-VAE (Learning meaningful features) Explanations for individual samples (local) or overall behavior (global)? A directly interpretable model or posthoc explanations? BRCG or GLRM posthoc A surrogate model or visualize behavior? surrogate visualize ProfWeight (Learning accurate interpretable model) (Easy to understand rules) interactive Explanations based on samples, features, or elicited explanations? ? ? ? ProtoDash (Case-based reasoning) CEM or CEM-MAF (Feature based explanations) TED (Persona-specific explanations) features samples elicited explanations One-shot static or interactive explanations? static

MAX: Reduces “time to value” for developers 35 Find model
asset Deploy pre- trained model asset Use model asset ibm.biz/model-exchange May, 2019 / © 2019 IBM Corporation • Audio classification • Image classification • Text classification • Object detection • Facial recognition • Image-to-image translation • Image-to-text translation • Named entity recognition • Text feature extraction • …

↳ Deep Learning Code Patterns https://developer.ibm.com/patterns/category/model-asset-exchange/

37 MAX Consumption scenarios - Model-serving microservice (Docker-based) - Internet
of Things: Node-RED - JavaScript/Node.js packages May, 2019 / © 2019 IBM Corporation

IBM Data Asset eXchange (DAX) 38 • Curated free and
open datasets under open data licenses • Standardized dataset formats and metadata • Ready for use in enterprise AI applications • Complement to the Model Asset eXchange (MAX) Data Asset eXchange ibm.biz/data-asset-exchange Model Asset eXchange ibm.biz/model-exchange

Conclusions IBM Cloud and Cognitive Software/August 21, 2019 / ©
2019 IBM Corporation 39 Deep neural networks are used widely ART provides building blocks for evaluating and improving model robustness AIF360 and AIX360 provide fairness and explainability tools respectively MAX models are ready to use in a variety of applications

Learn more and connect IBM Cloud and Cognitive Software/August 21,
2019 / © 2019 IBM Corporation 40 Watson Studio: sign up for IBM Cloud: https://ibm.biz/BdzXfW Codait.org @SvetaLevitan ART: https://github.com/IBM/adversarial-robustness-toolbox AIFairness 360: https://github.com/IBM/AIF360 AI Explainability 360: http://aix360.mybluemix.net MAX: ibm.biz/model-exchange DAX: ibm.biz/data-asset-exchange

Defending deep learning from adversarial attacks

Defending deep learning from adversarial attacks

More Decks by Svetlana Levitan

Other Decks in Programming

Featured

Transcript