Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Ensembles of Many Diverse Weak Defenses can be Strong

Pooyan Jamshidi
September 01, 2020

Ensembles of Many Diverse Weak Defenses can be Strong

Despite achieving state-of-the-art performance across many domains, machine learning systems are highly vulnerable to subtle adversarial perturbations. Although defense approaches have been proposed in recent years, many have been bypassed by even weak adversarial attacks. Previous studies showed that ensembles created by combining multiple weak defenses (i.e., input data transformations) are still weak. In this talk, I will show that it is indeed possible to construct effective ensembles using weak defenses to block adversarial attacks. However, to do so requires a diverse set of such weak defenses. Based on this motivation, I will present Athena, an extensible framework for building effective defenses to adversarial attacks against machine learning systems. I will talk about the effectiveness of ensemble strategies with a diverse set of many weak defenses that comprise transforming the inputs (e.g., rotation, shifting, noising, denoising, and many more) before feeding them to target deep neural network classifiers. I will also discuss the effectiveness of the ensembles with adversarial examples generated by various adversaries in different threat models. In the second half of the talk, I will explain why building defenses based on the idea of many diverse weak defenses works, when it is most effective, and what its inherent limitations and overhead are. Finally, I will show our recent advancement toward synthesizing effective ensemble defenses automatically by identifying complementary weak defenses over the induced space of weak defenses using a combination of search and optimization.

Pooyan Jamshidi

September 01, 2020
Tweet

More Decks by Pooyan Jamshidi

Other Decks in Research

Transcript

  1. Ensembles of Many Diverse Weak Defenses can be Strong Ying

    Meng Jianhai Su Forest Agostinelli Pooyan Jamshidi Jason O’Kane Biplav 
 Srivastava Invited Talk
  2. Artificial Intelligence and Systems Laboratory (AISys Lab) Machine Learning Computer

    Systems Autonomy Learning-enabled Autonomous Systems https://pooyanjamshidi.github.io/AISys/ 2
  3. Research Directions at AISys 3 Theory:
 - Transfer Learning
 -

    Causal Invariances
 - Structure Learning
 - Concept Learning
 - Physics-Informed
 
 Applications:
 - Systems
 - Autonomy
 - Robotics Well-known Physics Big Data Limited known Physics Small Data Causal AI Thanks to NASA 
 for supporting 
 our research
  4. Adversarial Examples [Engstrom, Tran, Tsipras, Schmidt, Madry 2018]: Rotation +

    Translation can fool classifiers [Athalye, Engstrom, Ilyas, Kwok 2017]: 3D-printed model classified as rifle from most viewpoints [Goodfellow et al. 2014]: Imperceptible noise can fool DNN classifiers 5
  5. Adversarial Examples (Security) [Sharif et al. 2016]: Glasses the fool

    face classifiers [Carlini et al. 2016]: Voice commands that are imperceptible by humans 6
  6. Adversarial Examples (RL, NLP) [Huang et al. 2017]: Small input

    changes can decrease RL performance [Jia Liang 2017]: Irrelevant sentences confused reading comprehension systems 7
  7. Should we be worried? 8 Probably not here! But we

    should be worried here! [Pei et al. 2017]: DeepXplore: Automated Whitebox Testing of Deep Learning Systems [Tian et al. 2017]: DeepTest: Automated Testing of Deep-Neural-Network-driven Autonomous Cars [Athalye, Engstrom, Ilyas, Kwok 2017]: 3D-printed model classified as rifle from most viewpoints
  8. Where Do Adversarial Examples Come From? Distribution D θ Orange

    Chimpanzee Palm tree fθ fθ1 (x, y) = palm tree fθ2 (x, y) = orange , Orange (x, y) = Find θ* such that (x,y)∼D ℒ(θ*, x, y) Is small Goal of ML: 9
  9. Where Do Adversarial Examples Come From? minθ ℒ(θ, x, y)

    maxδ ℒ(θ, x+δ, y) ||δ|| p ≤ ϵ Gradient Descent to find good parameters θ 10 [Ilyas et al. 2019]: Adversarial Examples Are Not Bugs, They Are Features “Adversarial vulnerability is a direct result of our models’ sensitivity to well-generalizing features in the data.”
  10. O riginal BIM _l? FG SM JSM A BIM _l2

    PG D DF_l2 input perturbation com press (h & v) denoise (nl_m eans) ) C W_l2 O nePixel M IM Insights - Effectiveness of WDs varies - WDs complement each other -> A defense based on ensemble of WDs can be independent of particular type of adversarial attack WD: Weak Defense
  11. Quality and quantity of weak defenses matter Number of weak

    defenses Test accuracy 0.00 0.20 0.40 0.60 0.80 1.00 10 20 30 40 50 60 70 14 Adversarial Attack: DeepFool
  12. Diversity of weak defenses matters 15 Adversarial Attack: One-Pixel Error

    Rate Undefended: 0.5588 PGD-ADT: Adversarial Training Diverse Ensemble Baseline Defense Homogeneous Ensemble
  13. Diversity of weak defenses matters 16 Adversarial Attack: BIM_l2 Error

    Rate Undefended: 0.92 Adversarial Attack: MIM Error Rate Undefended: 0.94 Adversarial Attack: PGD Error Rate Undefended: 0.96
  14. Each weak defense is essentially a model trained on a

    particular type of transformation 17 Train a classifier Ti fti x Transform x for all x in D Train a weak defense xti
  15. Athena produces the final output based on agreement between weak

    defenses at deployment time Ensemble of n Weak Defenses ft1 Predict x by WDs Ensemble strategy x y yt1 T1 Ti Tn xt1 xti xtn yti ytn fti ftn 7 7 9 7 18
  16. Threat model: What we can assume about the knowledge of

    the adversary and its strength 20 Knows the parameters of Blackbox Greybox Zero-knowledge Target Classifier Weak Defenses Ensemble Strategy Existence of Defense Whitebox
  17. Although the effectiveness of each weak defense varies, Athena is

    able to decrease the error rate effectively 21 Adversarial Attack: FGSM Model: 28×10 Wide ResNet Dataset: CIFAR100 Athena (ensemble strategy): - MV: Majority Voting - T2MV: Top-2 MV - AVEO: Average of Output - RD: Random Defense Baseline Defense: PGD-ADT: Adversarial Training RS: Randomized Smoothing Athena Baseline Defenses Undefended Model Tradeoff on benign samples
  18. Although the effectiveness of each weak defense varies, Athena is

    able to decrease the error rate effectively 22 FGSM BIM_l2 BIM_linf CW_l2 JSMA PGD Model: 28×10 Wide ResNet Dataset: CIFAR100
  19. Threat model 23 Knows the parameters of Blackbox Greybox Zero-knowledge

    Target Classifier Weak Defenses Ensemble Strategy Existence of Defense Whitebox
  20. Blackbox attack: The transferability-based approach 24 T rain a s

    u b s titu e clas s ifier fsub fens C o llect train in g d ata s et 2 1 C raft ad v ers arial exam p les fo r th e s u b s titu te clas s ifier 3 Dbb x x' { x|x in D} A ttack th e en s em b le m o d el 4
  21. Athena lowered the “transferability” of adversarial examples from the surrogate

    model to the target model 25 Adversarial Attack: BIM_linf Model: 28×10 Wide ResNet Dataset: CIFAR100 Transferability Rate Undefended Model Athena
  22. Athena forces the “optimization-based” blackbox attack to generate adversarial examples

    with larger perturbation 27 Adversarial Attack: HopSkipJump Model: 28×10 Wide ResNet Dataset: CIFAR100 Undefended Model Athena
  23. Threat model 28 Knows the parameters of Blackbox Greybox Zero-knowledge

    Target Classifier Weak Defenses Ensemble Strategy Existence of Defense Whitebox
  24. A strong adaptive white-box adversary may be able to successfully

    bypass the defense 29 Athena Weak Defenses Undefended Model
  25. However, it becomes very easy to “detect” such attacks, so

    a defense+detection would be robust Detection + MV ens MV ens Detector Max Normalized Dissimilarity Detected Rate 0.2 0.4 0.6 0.8 1.0 1.00 0.75 0.50 0.25 0.00 1.00 0.75 0.50 0.25 0.00 30 Gray-box White-box 0.2 0.4 0.6 0.8 1.0 0.1 0.3 0.5 0.7 0.9 Gray-box White-box 0.2 0.4 0.6 0.8 1.0 0.1 0.3 0.5 0.7 0.9 Gray-box White-box
  26. Is Athena a general defense? Will it work with different

    types of machine learning models? 32
  27. Athena performs similarly well with other types of machine learning

    models (DNNs, SVMs, RF) 33 Adversarial Attack: FGSM Model: ResNet Shake-Shake reg Dataset: CIFAR100
  28. Athena is effective similarly with other types of models 34

    Model: ResNet Shake-Shake reg. Dataset: CIFAR100 FGSM BIM_l2 BIM_linf CW_l2 JSMA PGD
  29. However, the effectiveness of defense may vary depending on the

    type of models 35 Adversarial Attack: FGSM Model: SVM Dataset: MNIST Adversarial Attack: CW_l2 Model: SVM Dataset: MNIST
  30. The memory overhead of Athena is linear with number of

    WDs, the inference time is on par with model inference 37 Ensemble of n Weak Defenses ft1 Predict x by WDs Ensemble strategy x y yt1 T1 Ti Tn xt1 xti xtn yti ytn fti ftn Transformation Time Inference Time