Ensembles of Many Diverse Weak Defenses can be Strong

Slide 1

Slide 1 text

Ensembles of Many Diverse Weak Defenses can be Strong Ying Meng Jianhai Su Forest Agostinelli Pooyan Jamshidi Jason O’Kane Biplav   Srivastava Invited Talk

Slide 2

Slide 2 text

Artiﬁcial Intelligence and Systems Laboratory (AISys Lab) Machine Learning Computer Systems Autonomy Learning-enabled Autonomous Systems https://pooyanjamshidi.github.io/AISys/ 2

Slide 3

Slide 3 text

Research Directions at AISys 3 Theory:  - Transfer Learning  - Causal Invariances  - Structure Learning  - Concept Learning  - Physics-Informed    Applications:  - Systems  - Autonomy  - Robotics Well-known Physics Big Data Limited known Physics Small Data Causal AI Thanks to NASA   for supporting   our research

Slide 4

Slide 4 text

So what this talk is about? The Security of Machine Learning Deep 4

Slide 5

Slide 5 text

Adversarial Examples [Engstrom, Tran, Tsipras, Schmidt, Madry 2018]: Rotation + Translation can fool classifiers [Athalye, Engstrom, Ilyas, Kwok 2017]: 3D-printed model classified as rifle from most viewpoints [Goodfellow et al. 2014]: Imperceptible noise can fool DNN classifiers 5

Slide 6

Slide 6 text

Adversarial Examples (Security) [Sharif et al. 2016]: Glasses the fool face classiﬁers [Carlini et al. 2016]: Voice commands that are imperceptible by humans 6

Slide 7

Slide 7 text

Adversarial Examples (RL, NLP) [Huang et al. 2017]: Small input changes can decrease RL performance [Jia Liang 2017]: Irrelevant sentences confused reading comprehension systems 7

Slide 8

Slide 8 text

Should we be worried? 8 Probably not here! But we should be worried here! [Pei et al. 2017]: DeepXplore: Automated Whitebox Testing of Deep Learning Systems [Tian et al. 2017]: DeepTest: Automated Testing of Deep-Neural-Network-driven Autonomous Cars [Athalye, Engstrom, Ilyas, Kwok 2017]: 3D-printed model classiﬁed as riﬂe from most viewpoints

Slide 9

Slide 9 text

Where Do Adversarial Examples Come From? Distribution D θ Orange Chimpanzee Palm tree fθ fθ1 (x, y) = palm tree fθ2 (x, y) = orange , Orange (x, y) = Find θ* such that (x,y)∼D ℒ(θ*, x, y) Is small Goal of ML: 9

Slide 10

Slide 10 text

Where Do Adversarial Examples Come From? minθ ℒ(θ, x, y) maxδ ℒ(θ, x+δ, y) ||δ|| p ≤ ϵ Gradient Descent to ﬁnd good parameters θ 10 [Ilyas et al. 2019]: Adversarial Examples Are Not Bugs, They Are Features “Adversarial vulnerability is a direct result of our models’ sensitivity to well-generalizing features in the data.”

Slide 11

Slide 11 text

Athena: A Framework for Defending Machine Learning Systems Against Adversarial Attacks 11

Slide 12

Slide 12 text

Key idea behind our approach: Input transformation (7, 0.9) (9, 0.56) (7, 0.4) +δ Rotate 180 12

Slide 13

Slide 13 text

O riginal BIM _l? FG SM JSM A BIM _l2 PG D DF_l2 input perturbation com press (h & v) denoise (nl_m eans) ) C W_l2 O nePixel M IM Insights - Eﬀectiveness of WDs varies - WDs complement each other -> A defense based on ensemble of WDs can be independent of particular type of adversarial attack WD: Weak Defense

Slide 14

Slide 14 text

Quality and quantity of weak defenses matter Number of weak defenses Test accuracy 0.00 0.20 0.40 0.60 0.80 1.00 10 20 30 40 50 60 70 14 Adversarial Attack: DeepFool

Slide 15

Slide 15 text

Diversity of weak defenses matters 15 Adversarial Attack: One-Pixel Error Rate Undefended: 0.5588 PGD-ADT: Adversarial Training Diverse Ensemble Baseline Defense Homogeneous Ensemble

Slide 16

Slide 16 text

Diversity of weak defenses matters 16 Adversarial Attack: BIM_l2 Error Rate Undefended: 0.92 Adversarial Attack: MIM Error Rate Undefended: 0.94 Adversarial Attack: PGD Error Rate Undefended: 0.96

Slide 17

Slide 17 text

Each weak defense is essentially a model trained on a particular type of transformation 17 Train a classifier Ti fti x Transform x for all x in D Train a weak defense xti

Slide 18

Slide 18 text

Athena produces the ﬁnal output based on agreement between weak defenses at deployment time Ensemble of n Weak Defenses ft1 Predict x by WDs Ensemble strategy x y yt1 T1 Ti Tn xt1 xti xtn yti ytn fti ftn 7 7 9 7 18

Slide 19

Slide 19 text

Evaluation of Athena 19

Slide 20

Slide 20 text

Threat model: What we can assume about the knowledge of the adversary and its strength 20 Knows the parameters of Blackbox Greybox Zero-knowledge Target Classiﬁer Weak Defenses Ensemble Strategy Existence of Defense Whitebox

Slide 21

Slide 21 text

Although the effectiveness of each weak defense varies, Athena is able to decrease the error rate effectively 21 Adversarial Attack: FGSM Model: 28×10 Wide ResNet Dataset: CIFAR100 Athena (ensemble strategy): - MV: Majority Voting - T2MV: Top-2 MV - AVEO: Average of Output - RD: Random Defense Baseline Defense: PGD-ADT: Adversarial Training RS: Randomized Smoothing Athena Baseline Defenses Undefended Model Tradeoﬀ on benign samples

Slide 22

Slide 22 text

Although the effectiveness of each weak defense varies, Athena is able to decrease the error rate effectively 22 FGSM BIM_l2 BIM_linf CW_l2 JSMA PGD Model: 28×10 Wide ResNet Dataset: CIFAR100

Slide 23

Slide 23 text

Threat model 23 Knows the parameters of Blackbox Greybox Zero-knowledge Target Classiﬁer Weak Defenses Ensemble Strategy Existence of Defense Whitebox

Slide 24

Slide 24 text

Blackbox attack: The transferability-based approach 24 T rain a s u b s titu e clas s ifier fsub fens C o llect train in g d ata s et 2 1 C raft ad v ers arial exam p les fo r th e s u b s titu te clas s ifier 3 Dbb x x' { x|x in D} A ttack th e en s em b le m o d el 4

Slide 25

Slide 25 text

Athena lowered the “transferability” of adversarial examples from the surrogate model to the target model 25 Adversarial Attack: BIM_linf Model: 28×10 Wide ResNet Dataset: CIFAR100 Transferability Rate Undefended Model Athena

Slide 26

Slide 26 text

Athena lowered the transferability of adversarial examples from the surrogate model to the target model 26

Slide 27

Slide 27 text

Athena forces the “optimization-based” blackbox attack to generate adversarial examples with larger perturbation 27 Adversarial Attack: HopSkipJump Model: 28×10 Wide ResNet Dataset: CIFAR100 Undefended Model Athena

Slide 28

Slide 28 text

Threat model 28 Knows the parameters of Blackbox Greybox Zero-knowledge Target Classiﬁer Weak Defenses Ensemble Strategy Existence of Defense Whitebox

Slide 29

Slide 29 text

A strong adaptive white-box adversary may be able to successfully bypass the defense 29 Athena Weak Defenses Undefended Model

Slide 30

Slide 30 text

However, it becomes very easy to “detect” such attacks, so a defense+detection would be robust Detection + MV ens MV ens Detector Max Normalized Dissimilarity Detected Rate 0.2 0.4 0.6 0.8 1.0 1.00 0.75 0.50 0.25 0.00 1.00 0.75 0.50 0.25 0.00 30 Gray-box White-box 0.2 0.4 0.6 0.8 1.0 0.1 0.3 0.5 0.7 0.9 Gray-box White-box 0.2 0.4 0.6 0.8 1.0 0.1 0.3 0.5 0.7 0.9 Gray-box White-box

Slide 31

Slide 31 text

Also, it comes with a high cost 31 Dissimilarity Time for generating one AE (second)

Slide 32

Slide 32 text

Is Athena a general defense? Will it work with diﬀerent types of machine learning models? 32

Slide 33

Slide 33 text

Athena performs similarly well with other types of machine learning models (DNNs, SVMs, RF) 33 Adversarial Attack: FGSM Model: ResNet Shake-Shake reg Dataset: CIFAR100

Slide 34

Slide 34 text

Athena is effective similarly with other types of models 34 Model: ResNet Shake-Shake reg. Dataset: CIFAR100 FGSM BIM_l2 BIM_linf CW_l2 JSMA PGD

Slide 35

Slide 35 text

However, the effectiveness of defense may vary depending on the type of models 35 Adversarial Attack: FGSM Model: SVM Dataset: MNIST Adversarial Attack: CW_l2 Model: SVM Dataset: MNIST

Slide 36

Slide 36 text

What is the overhead of Athena? - Memory - Inference Time 36

Slide 37

Slide 37 text

The memory overhead of Athena is linear with number of WDs, the inference time is on par with model inference 37 Ensemble of n Weak Defenses ft1 Predict x by WDs Ensemble strategy x y yt1 T1 Ti Tn xt1 xti xtn yti ytn fti ftn Transformation Time Inference Time

Slide 38

Slide 38 text

Athena is: - Flexible - Extensible - General - Moderate overhead 38

Slide 39

Slide 39 text

39 https://arxiv.org/abs/2001.00308 Athena is open source