Ensembles of Many Diverse Weak Defenses can be Strong

Ensembles of Many Diverse Weak Defenses can be Strong Ying
Meng Jianhai Su Forest Agostinelli Pooyan Jamshidi Jason O’Kane Biplav   Srivastava Invited Talk

Artiﬁcial Intelligence and Systems Laboratory (AISys Lab) Machine Learning Computer
Systems Autonomy Learning-enabled Autonomous Systems https://pooyanjamshidi.github.io/AISys/ 2

Research Directions at AISys 3 Theory:  - Transfer Learning  -
Causal Invariances  - Structure Learning  - Concept Learning  - Physics-Informed    Applications:  - Systems  - Autonomy  - Robotics Well-known Physics Big Data Limited known Physics Small Data Causal AI Thanks to NASA   for supporting   our research

So what this talk is about? The Security of Machine
Learning Deep 4

Adversarial Examples [Engstrom, Tran, Tsipras, Schmidt, Madry 2018]: Rotation +
Translation can fool classifiers [Athalye, Engstrom, Ilyas, Kwok 2017]: 3D-printed model classified as rifle from most viewpoints [Goodfellow et al. 2014]: Imperceptible noise can fool DNN classifiers 5

Adversarial Examples (Security) [Sharif et al. 2016]: Glasses the fool
face classiﬁers [Carlini et al. 2016]: Voice commands that are imperceptible by humans 6

Adversarial Examples (RL, NLP) [Huang et al. 2017]: Small input
changes can decrease RL performance [Jia Liang 2017]: Irrelevant sentences confused reading comprehension systems 7

Should we be worried? 8 Probably not here! But we
should be worried here! [Pei et al. 2017]: DeepXplore: Automated Whitebox Testing of Deep Learning Systems [Tian et al. 2017]: DeepTest: Automated Testing of Deep-Neural-Network-driven Autonomous Cars [Athalye, Engstrom, Ilyas, Kwok 2017]: 3D-printed model classiﬁed as riﬂe from most viewpoints

Where Do Adversarial Examples Come From? Distribution D θ Orange
Chimpanzee Palm tree fθ fθ1 (x, y) = palm tree fθ2 (x, y) = orange , Orange (x, y) = Find θ* such that (x,y)∼D ℒ(θ*, x, y) Is small Goal of ML: 9

Where Do Adversarial Examples Come From? minθ ℒ(θ, x, y)
maxδ ℒ(θ, x+δ, y) ||δ|| p ≤ ϵ Gradient Descent to ﬁnd good parameters θ 10 [Ilyas et al. 2019]: Adversarial Examples Are Not Bugs, They Are Features “Adversarial vulnerability is a direct result of our models’ sensitivity to well-generalizing features in the data.”

Athena: A Framework for Defending Machine Learning Systems Against Adversarial
Attacks 11

Key idea behind our approach: Input transformation (7, 0.9) (9,
0.56) (7, 0.4) +δ Rotate 180 12

O riginal BIM _l? FG SM JSM A BIM _l2
PG D DF_l2 input perturbation com press (h & v) denoise (nl_m eans) ) C W_l2 O nePixel M IM Insights - Eﬀectiveness of WDs varies - WDs complement each other -> A defense based on ensemble of WDs can be independent of particular type of adversarial attack WD: Weak Defense

Quality and quantity of weak defenses matter Number of weak
defenses Test accuracy 0.00 0.20 0.40 0.60 0.80 1.00 10 20 30 40 50 60 70 14 Adversarial Attack: DeepFool

Diversity of weak defenses matters 15 Adversarial Attack: One-Pixel Error
Rate Undefended: 0.5588 PGD-ADT: Adversarial Training Diverse Ensemble Baseline Defense Homogeneous Ensemble

Diversity of weak defenses matters 16 Adversarial Attack: BIM_l2 Error
Rate Undefended: 0.92 Adversarial Attack: MIM Error Rate Undefended: 0.94 Adversarial Attack: PGD Error Rate Undefended: 0.96

Each weak defense is essentially a model trained on a
particular type of transformation 17 Train a classifier Ti fti x Transform x for all x in D Train a weak defense xti

Athena produces the ﬁnal output based on agreement between weak
defenses at deployment time Ensemble of n Weak Defenses ft1 Predict x by WDs Ensemble strategy x y yt1 T1 Ti Tn xt1 xti xtn yti ytn fti ftn 7 7 9 7 18

Evaluation of Athena 19

Threat model: What we can assume about the knowledge of
the adversary and its strength 20 Knows the parameters of Blackbox Greybox Zero-knowledge Target Classiﬁer Weak Defenses Ensemble Strategy Existence of Defense Whitebox

Although the effectiveness of each weak defense varies, Athena is
able to decrease the error rate effectively 21 Adversarial Attack: FGSM Model: 28×10 Wide ResNet Dataset: CIFAR100 Athena (ensemble strategy): - MV: Majority Voting - T2MV: Top-2 MV - AVEO: Average of Output - RD: Random Defense Baseline Defense: PGD-ADT: Adversarial Training RS: Randomized Smoothing Athena Baseline Defenses Undefended Model Tradeoﬀ on benign samples

Although the effectiveness of each weak defense varies, Athena is
able to decrease the error rate effectively 22 FGSM BIM_l2 BIM_linf CW_l2 JSMA PGD Model: 28×10 Wide ResNet Dataset: CIFAR100

Threat model 23 Knows the parameters of Blackbox Greybox Zero-knowledge
Target Classiﬁer Weak Defenses Ensemble Strategy Existence of Defense Whitebox

Blackbox attack: The transferability-based approach 24 T rain a s
u b s titu e clas s ifier fsub fens C o llect train in g d ata s et 2 1 C raft ad v ers arial exam p les fo r th e s u b s titu te clas s ifier 3 Dbb x x' { x|x in D} A ttack th e en s em b le m o d el 4

Athena lowered the “transferability” of adversarial examples from the surrogate
model to the target model 25 Adversarial Attack: BIM_linf Model: 28×10 Wide ResNet Dataset: CIFAR100 Transferability Rate Undefended Model Athena

Athena lowered the transferability of adversarial examples from the surrogate
model to the target model 26

Athena forces the “optimization-based” blackbox attack to generate adversarial examples
with larger perturbation 27 Adversarial Attack: HopSkipJump Model: 28×10 Wide ResNet Dataset: CIFAR100 Undefended Model Athena

Threat model 28 Knows the parameters of Blackbox Greybox Zero-knowledge
Target Classiﬁer Weak Defenses Ensemble Strategy Existence of Defense Whitebox

A strong adaptive white-box adversary may be able to successfully
bypass the defense 29 Athena Weak Defenses Undefended Model

However, it becomes very easy to “detect” such attacks, so
a defense+detection would be robust Detection + MV ens MV ens Detector Max Normalized Dissimilarity Detected Rate 0.2 0.4 0.6 0.8 1.0 1.00 0.75 0.50 0.25 0.00 1.00 0.75 0.50 0.25 0.00 30 Gray-box White-box 0.2 0.4 0.6 0.8 1.0 0.1 0.3 0.5 0.7 0.9 Gray-box White-box 0.2 0.4 0.6 0.8 1.0 0.1 0.3 0.5 0.7 0.9 Gray-box White-box

Also, it comes with a high cost 31 Dissimilarity Time
for generating one AE (second)

Is Athena a general defense? Will it work with diﬀerent
types of machine learning models? 32

Athena performs similarly well with other types of machine learning
models (DNNs, SVMs, RF) 33 Adversarial Attack: FGSM Model: ResNet Shake-Shake reg Dataset: CIFAR100

Athena is effective similarly with other types of models 34
Model: ResNet Shake-Shake reg. Dataset: CIFAR100 FGSM BIM_l2 BIM_linf CW_l2 JSMA PGD

However, the effectiveness of defense may vary depending on the
type of models 35 Adversarial Attack: FGSM Model: SVM Dataset: MNIST Adversarial Attack: CW_l2 Model: SVM Dataset: MNIST

What is the overhead of Athena? - Memory - Inference
Time 36

The memory overhead of Athena is linear with number of
WDs, the inference time is on par with model inference 37 Ensemble of n Weak Defenses ft1 Predict x by WDs Ensemble strategy x y yt1 T1 Ti Tn xt1 xti xtn yti ytn fti ftn Transformation Time Inference Time

Athena is: - Flexible - Extensible - General - Moderate
overhead 38

39 https://arxiv.org/abs/2001.00308 Athena is open source

Ensembles of Many Diverse Weak Defenses can be ...

Ensembles of Many Diverse Weak Defenses can be Strong

More Decks by Pooyan Jamshidi

Other Decks in Research

Featured

Transcript