Defending deep learning from adversarial attacks - presented at Cream City Code 2019

Defending deep learning from adversarial attacks — Svetlana Levitan, PhD
Developer Advocate in Chicago Center for Open-source Data and AI Technologies IBM Cloud and Cognitive Software October 5, 2019 @SvetaLevitan @ibmcodait

2 Who is Svetlana Levitan? Originally from Moscow, Russia, now
in Chicago PhD in Applied Mathematics and MS in Computer Science from University of Maryland, College Park Software Engineer for SPSS Analytic components (2000-2018) Working on PMML since 2001, ONNX recently IBM acquired SPSS in 2009 Developer Advocate with IBM Center for Open Data and AI Technologies (since June 2018) Meetup organizer: Big Data Developers in Chicago, Open Source Analytics, working with Chicago ML Two daughters love programming: IIT and Niles North

IBM and open source Intro to neural networks and deep
learning Intro to adversarial attacks and defenses Adversarial Robustness Toolbox (ART) AI Fairness 360 and AI Explainability 360 Links and resources Cloud and Cognitive Applications/ October 5, 2019 / © 2019 IBM Corporation 3 Contents

IBM Cloud and Cognitive Software/October 5, 2019 / © 2019
IBM Corporation 4 and open standards

Center for Open Source Data and AI Technologies (CODAIT) Code
– Build and improve practical frameworks to enable more developers to realize immediate value. Content – Showcase solutions for complex and real-world AI problems. Community – Bring developers and data scientists to engage with IBM Improving Enterprise AI lifecycle in Open Source • Team contributes to over 10 open source projects • 17 committers and many contributors in Apache projects • Over 1100 JIRAs and 66,000 lines of code committed to Apache Spark itself; over 65,000 LoC into SystemML • Over 25 product lines within IBM leveraging Apache Spark • Speakers at over 100 conferences, meetups, unconferences and more CODAIT codait.org

2011 IBM Watson Jeopardy 2017 AlphaGo Apple’s releases Siri 1997
… Facebook’s face recognition 2015 2016 Siri gets deep learning IBM Deep Blue chess AlexNet Deep Learning and AI history 2012 Introduced deep learning with GPUs IBM Cognitive Aplications / © 2019 IBM Corporation 6

A human brain has: • 200 billion neurons • 32
trillion connections between them • 25 million “neurons” • 100 million connections (parameters) Deep Learning = Training Artificial Neural Networks IBM Cognitive Applications / © 2019 IBM Corporation 7

IBM Cognitive Applications / © 2019 IBM Corporation Some history
8 Elementary Perceptron 1957 Frank Rosenblatt Multilayer Perceptron

Backpropagation Labeled Training Data Coat Sneaker T-shirt Sneaker Pullover Output
Errors Pullover Coat Coat Sneaker T-shirt ❌ ❌ ❌ Fashion-MNIST dataset by Zalando Research, on GitHub <https://github.com/zalandoresearch/fashion-mnist> (MIT License). Slide created by Bradley Holt

Input Output Sneaker 98% Neural Network Inferencing Fashion-MNIST dataset by
Zalando Research, on GitHub <https://github.com/zalandoresearch/fashion-mnist> (MIT License). Slide created by Bradley Holt

Convolutional Neural Networks IBM Cloud and Cognitive Software/September 27, 2019
/ © 2019 IBM Corporation 11 https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53

Convolutional layer in greater detail IBM Cloud and Cognitive Software/October
5, 2019 / © 2019 IBM Corporation 12 http://cs231n.github.io/convolutional-networks/

Perception

In reality the workflow spans teams …

AIOps Prepared and Analyzed Data AI PLATFORM Initial Model Trained
Model Deploy ed Model AI Workflow

AIOps Prepared and Analyzed Data Trained Model Deployed Model Many
tools to train machine learning and deep learning models Prepared Data Initial Model Deployed Model

AIOps Trained Model Deployed Model And there are platforms to
serve your models, create model catalogues etc. Prepared Data Initial Model Deployed Model FfDL kube-batch Jupyter Enterprise Gateway MAX Istio OpenWhisk

AIOps Prepared and Analyzed Data Trained Model Deployed Model But
what about trust in AI? Prepared Data Initial Model Deployed Model Can the trained model be trusted? Can the dataset be trusted? Is the deployed model robust enough? Is the model vulnerable to adversarial attacks?

Is it fair? Is it easy to understand? Did anyone
tamper with it? Is it accountable? #21, #32, #93 #21, #32, #93 What does it take to trust a decision made by a machine? (Other than that it is 99% accurate)?

FAIRNESS EXPLAINABILITY ROBUSTNESS ASSURANCE Our vision for Trusted AI Pillars
of trust, woven into the lifecycle of an AI application

AIOps Prepared and Analyzed Data Trained Model Deployed Model Let`s
talk about Robustness Prepared Data Initial Model Deployed Model Is the model vulnerable to adversarial attacks? Is the dataset poisoned?

An adversarial attack 22

23 https://arxiv.org/pdf/1707.08945.p df A scarier example (from https://arxiv.org/pdf/1707.08945.pdf)

Data Attacker Neural Network poison train input perturb output result
$$$ benefit Adversarial Threats to AI Evasion attacks ▪ Performed at test time ▪ Perturb inputs with crafted noise ▪ Model fails to predict correctly ▪ Undetectable by humans Poisoning attacks ▪ Performed at training time ▪ Insert poisoned sample in training data ▪ Use backdoor later

Exposure to poisoning • Could the attacker have created backdoors
via poisoning of training data? Plausible deniability • How important is it for the adversary to use adversarial samples with strong resemblance to the original inputs? Type I vs type II errors • Is the attacker trying to bypass safeguards or aiming to cause false alarms? • What are the costs associated with such errors? Black vs white box • What knowledge does the attacker have about the AI model? • How does the attacker access the AI model? • Limitations to the number of queries? 25 Threat Models

Evasion attacks – an analysis 26 Why do adversarial examples
exist? • Unless test error is 0%, there is always room for adversarial samples. • Attacks push inputs across the decision boundary. • Surprising: proximity of the nearest decision boundary! [Gilmer et al., 2018. Adversarial Spheres. https://arxiv.org/abs/1801.02774]

Evasion attacks – an analysis 28 Why do adversarial examples
exist? Fooling images: • DNNs don’t learn actually to recognize e.g. a schoolbus, but to discriminate it from any other object in the training set. [Nguyen et al., 2014. Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images. https://arxiv.org/abs/1412.1897]

Robustness Metrics Detection Model Hardening Static Data Preprocessing Model Design
Statistical Tests Detector Networks Bayesian Uncertainty Attack Independent Attack Specific Adversarial Training Dynamic Gaussian Data Augmentation Feature Squeezing Label Smoothing Shattered Gradients Stochastic Gradients Saddlepoint Optimization Dimensionality Reduction BReLUs CLEVER Global Lipschitz Bound Loss Sensitivity Minimal Perturbation Adversarial Success Rates MMD Kernel Density Estimates Local Intrinsic Dimensionality Magnet Detectors on Inputs Detectors on Internal Representations Dropout Uncertainty Bayesian SVMs How to defend? Taxonomy of defenses

How to defend? 30 Adversarial training • Train DNNs solely
on adversarial samples • Increase DNN capacity to maintain accuracy on clean data • Use specific algorithm for crafting the adversarial samples [Madry et al., 2017. Towards Deep Learning Models Resistant to Adversarial Attacks. https://arxiv.org/abs/1706.06083] Performance on CIFAR-10 data Data Model Accuracy Original A 87.3% PGD-20 A 45.8% PGD-7 A’ 64.2% FGSM Anat 85.6%

How to defend? 31 Preprocessing data • Process samples in
order to remove adversarial noise • Input the cleaned samples to the classifier • Somewhat effective, however can be easily defeated by an adaptive adversary. Feature squeezing [W. Xu, D. Evans, and Y. Qi. Feature squeezing: Detecting adversarial examples in deep neural networks. CoRR, abs/1704.01155, 2017a.]

How to defend? 32 Poisoning detection Poisoned MNIST sample (will
be classified as ‘1’ by poisoned model with high probability). Unsupervised clustering of training data based on DNN internal activations: Discovers partition of poisonous vs normal training samples.

Adversaral Robustness 360, now in LF Trusted AI

The Adversarial Robustness 360 Toolbox (ART) 34 • Library for
adversarial machine learning • Baseline implementation of attacks and defenses for classifiers • Dedicated to images • Python 2 & 3 • MIT license • Supported frameworks: Load classifier model (Keras, TF, PyTorch etc) Perform attack Load ART modules Evaluate robustness

The Adversarial Robustness 360 Toolbox (ART) 35

Adversarial Robustness 360 Toolbox (ART) Poisoning detection • Detection based
on clustering activations • Proof of attack strategy Evasion detection • Detector based on inputs • Detector based on activations Robustness metrics • CLEVER • Empirical robustness • Loss sensitivity Unified model API • Training • Prediction • Access to loss and prediction gradients Evasion defenses • Feature squeezing • Spatial smoothing • Label smoothing • Adversarial training • Virtual adversarial training • Thermometer encoding • Gaussian augmentation • Total variance minimization Evasion attacks • FGSM • JSMA • BIM • PGD • Carlini & Wagner • DeepFool • NewtonFool • Elastic net attack • Universal perturbation • Spatial transformations 36

ART Demo: https://art-demo.mybluemix.net/ 37

The code behind all this is simple 38

Jupyter notebooks with examples of ART use An attack and
a simple defense on a model for clothing Attack and defensive model building for digit data Building a detector for adversarial inputs See e.g. https://developer.ibm.com/patterns/integrate-adversarial-attacks- model-training-pipeline/ 39

An example with attack and defense https://nbviewer.jupyter.org/github/IB M/adversarial-robustness- toolbox/blob/master/notebooks/attack _defense_imagenet.ipynb
40

Import ResNet50, a standard visual recognition model 41

Load an example image 42

43 An example of an attack

Apply a defense method to the input 44

An example with defensive modeling blob/master/notebooks/attack_defense_imagenet.ipynb 45

Examine the network structure 46

Model robustness vs noise level 50

Model robustness vs attack strength 51

CIFAR 10 data: 60K 32x32 color images 52 https://www.cs.toronto.edu/~kriz/cifar.html Collected
by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton.

Detection rate of adversarial attacks 60

Using the detector model Works well even on low strength
adversarial attack Apply detector on new inputs If an attack detected, trace back to source Original model does not need changes 61

Conclusions Adversarial attacks pose a threat to the deployment of
AI in security critical applications There is ongoing work on practical defenses with strong guarantees Future work: analyzing the adversarial threat on other types of data (text, speech, video, time series…) Bigger picture: Trusted AI Security ↔ Fairness ↔ Explainability ↔ Privacy https://www.research.ibm.com/artificial-intelligence/trusted-ai/ 62

AIOps Prepared and Analyzed Data Trained Model Deployed Model Now
how do we check for bias throughout AI lifecycle? Prepared Data Initial Model Deployed Model Are model weights biased? Are predictions biased? Is the dataset biased?

AI Fairness 360 https://github.com/IBM/AIF360 64 Fairness metrics (70+), explanations Bias
mitigation algorithms (10)

67 × 2019 IBM Corporation AIX360: DIFFERENT WAYS TO EXPLAIN
One explanation does not fit all Different stakeholders require explanations for different purposes and with different objectives, and explanations will have to be tailored to their needs. End users/customers (trust) Doctors: Why did you recommend this treatment? Customers: Why was my loan denied? Teachers: Why was my teaching evaluated in this way? Gov’t/regulators (compliance, safety) Prove to me that you didn't discriminate. Developers (quality, “debuggability”) Is our system performing well? How can we improve it?

data model samples features local global direct Understand data or
model? Explanations as samples, distributions or features? distributions tabular image text ProtoDash (Case-based reasoning) DIP-VAE (Learning meaningful features) Explanations for individual samples (local) or overall behavior (global)? A directly interpretable model or posthoc explanations? BRCG or GLRM posthoc A surrogate model or visualize behavior? surrogate visualize ProfWeight (Learning accurate interpretable model) (Easy to understand rules) interactive Explanations based on samples, features, or elicited explanations? ? ? ? ProtoDash (Case-based reasoning) CEM or CEM-MAF (Feature based explanations) TED (Persona-specific explanations) features samples elicited explanations One-shot static or interactive explanations? static

AIOps Trained Model Deployed Model AI Pipeline Prepared Data Initial
Model Deployed Model FfDL kube-batch Jupyter Enterprise Gateway MAX AIF360 AIF360 Istio OpenWhisk ART AIX360 AIX360 PMML, PFA , ONNX

Thank you! @SvetaLevitan CODAIT.org @ibmcodait Developer.ibm.com @IBMDeveloper ART Demo: https://art-demo.mybluemix.net/
ART: https://github.com/IBM/adversarial-robustness-toolbox Sign up for free IBM Cloud account: https://ibm.biz/BdzA6i If you are in or near Chicago, join Meetup groups: Big Data Developers in Chicago, Chicago ML. Come to Chicago ML workshop at IBM office on October 28! 70

Backup slides 71

Here is what you can see 72

Feature squeezing is less effective here, unless it is set
to high 73

AIOps Prepared and Analyzed Data Trained Model Deployed Model Enter:
AI Fairness 360 AIF360 Prepared and Analyzed Data Initial Model Deployed Model

Metrics (70+)

(d’Alessandro et al., 2017) Algorithms (10)

Defending deep learning from adversarial attack...

Defending deep learning from adversarial attacks - presented at Cream City Code 2019

More Decks by Svetlana Levitan

Other Decks in Technology

Featured

Transcript