Slide 1

Slide 1 text

Lessons from the Last 3000 Years of Adversarial Examples David Evans University of Virginia evadeML.org Huawei STW Shenzhen, China 15 May 2018

Slide 2

Slide 2 text

Machine Learning Does Amazing Things 1

Slide 3

Slide 3 text

… and can solve all Security Problems! Fake Spam IDS Malware Fake Accounts … “Fake News”

Slide 4

Slide 4 text

Labelled Training Data ML Algorithm Feature Extraction Vectors Deployment Malicious / Benign Operational Data Trained Classifier Training (supervised learning)

Slide 5

Slide 5 text

ML Algorithm Feature Extraction Vectors Deployment Malicious / Benign Operational Data Training (supervised learning) Assumption: Training Data is Representative Trained Classifier Labelled Training Data

Slide 6

Slide 6 text

Adversaries Don’t Cooperate Assumption: Training Data is Representative Deployment Training

Slide 7

Slide 7 text

Deployment Adversaries Don’t Cooperate Assumption: Training Data is Representative Training Poisoning

Slide 8

Slide 8 text

Adversaries Don’t Cooperate Assumption: Training Data is Representative Evading Deployment Training

Slide 9

Slide 9 text

Adversarial Examples 8 0.007 × [&'()*] + = “panda” “gibbon” Example from: Ian J. Goodfellow, Jonathon Shlens, Christian Szegedy. Explaining and Harnessing Adversarial Examples. 2014 (in ICLR 2015)

Slide 10

Slide 10 text

9 Papers on “Adversarial Examples” (Google Scholar) 0 200 400 600 800 1000 1200 2018 (5/13) 2017 2016 2015 2014 2013 654

Slide 11

Slide 11 text

Adversarial Examples before Deep Learning 10

Slide 12

Slide 12 text

Evasive Malware Péter Ször (1970-2013)

Slide 13

Slide 13 text

12 History of the destruction of Troy, 1498 Trojans, lumber into the city, heart fear any clever/packed ambush of the Argives. Homer, The Illiad (~1200 BCE)

Slide 14

Slide 14 text

Adversarial Examples across Domains 13 Domain Classifier Space Oracle Space Trojan Wars Judgment of Trojans !(#) = “gi>” Physical Reality !∗(#) = invading army Malware Malware Detector !(#) = “benign” Victim’s Execution !∗(#) = malicious behavior Image Classification DNN Classifier !(#) = ) Human Perception !∗(#) = *

Slide 15

Slide 15 text

Adversarial Examples across Domains 14 Domain Classifier Space Oracle Space Trojan Wars Judgment of Trojans !(#) = “gi@” Physical Reality !∗(#) = invading army Malware Malware Detector !(#) = “benign” Victim’s Execution !∗(#) = malicious behavior Image Classification DNN Classifier !(#) = ) Human Perception !∗(#) = * Today

Slide 16

Slide 16 text

Adversarial Examples across Domains 15 Domain Classifier Space Oracle Space Trojan Wars Judgment of Trojans !(#) = “gi@” Physical Reality !∗(#) = invading army Malware Malware Detector !(#) = “benign” Victim’s Execution !∗(#) = malicious behavior Image Classification DNN Classifier !(#) = ) Human Perception !∗(#) = * Today Fixing (Breaking?) the Definition

Slide 17

Slide 17 text

Adversarial Examples across Domains 16 Trojan Wars !(#) = “gi=” !∗(#) = invading army Malware Malware Detector !(#) = “benign” Victim’s Execution !∗(#) = malicious behavior Image Classification DNN Classifier !(#) = ) Human Perception !∗(#) = * Fixing (Breaking?) the Definition

Slide 18

Slide 18 text

Goal of Machine Learning Classifier 17 Classifier Space (DNN Model) “Oracle” Space (human perception) Model and visualization based on work by Beilun Wang, Ji Gao and Yanjun Qi (ICLR 2017 Workshop)

Slide 19

Slide 19 text

Well-Trained Classifier 18 Model and visualization based on work by Beilun Wang, Ji Gao and Yanjun Qi (ICLR 2017 Workshop) Classifier Space (DNN Model) “Oracle” Space (human perception)

Slide 20

Slide 20 text

Adversarial Examples 19 Model and visualization based on work by Beilun Wang, Ji Gao and Yanjun Qi (ICLR 2017 Workshop) Classifier Space (DNN Model) “Oracle” Space (human perception)

Slide 21

Slide 21 text

Misleading Visualization 20 Cartoon Reality 2 dimensions thousands of dimensions few samples near boundaries all samples near boundaries every sample near 1-3 classes every sample near all classes Classifier Space (DNN Model)

Slide 22

Slide 22 text

Adversarial Examples 21 Adversary’s goal: find a small perturbation that changes class for classifier, but imperceptible to oracle. Classifier Space (DNN Model) “Oracle” Space (human perception)

Slide 23

Slide 23 text

Adversarial Examples Definition 22 Given seed sample, !, !" is an adversarial example iff: # !" = % Class is % (targeted) ∆ !, !" ≤ ) Difference below threshold ∆ !, !" is defined in some (simple!) metric space: *+ norm (# different), *,, *-norm (“Euclidean distance”), *. Assumption (to map to earlier definition): small perturbation does not change class in Oracle space

Slide 24

Slide 24 text

!" Adversary: Fast Gradient Sign 23 original 0.1 0.2 0.3 0.4 0.5 Adversary Power: # !" adversary power: max(() −() +) < # () + = () − # ⋅ sign(∇loss7 (())

Slide 25

Slide 25 text

Many Other Adversarial Methods 24 + “1” 100% confidence “4” 100% = + “2” 99.9% = + “2” 83.8% = BIM ("# ) JSMA ("% ) CW 2 ("& ) Original Perturbation Adversarial Examples

Slide 26

Slide 26 text

Impact of Adversarial Perturbations 25 95th percentile 5th percentile CIFAR-10 DenseNet FGSM attack ε = 0.1

Slide 27

Slide 27 text

Impact of Adversarial Perturbations 26 FGSM Attack Perturbation Random Perturbation

Slide 28

Slide 28 text

Defense Strategies 1. Hide the gradients 27

Slide 29

Slide 29 text

Defense Strategies 1. Hide the gradients − Transferability results 28

Slide 30

Slide 30 text

Defense Strategies 1. Hide the gradients − Transferability results − Clever adversaries can still find adversarial examples 29

Slide 31

Slide 31 text

Defense Strategies 1. Hide the gradients − Transferability results − Clever adversaries can still find adversarial examples 2. Build a robust classifier − Adversarial retraining, increasing model capacity, etc. 30

Slide 32

Slide 32 text

Defense Strategies 1. Hide the gradients − Transferability results − Clever adversaries can still find adversarial examples 2. Build a robust classifier − Adversarial retraining, increasing model capacity, etc. − If we could build a perfect model, we would! 31

Slide 33

Slide 33 text

Defense Strategies 1. Hide the gradients − Transferability results − Clever adversaries can still find adversarial examples 2. Build a robust classifier − Adversarial retraining, increasing model capacity, etc. − If we could build a perfect model, we would! 32 Our strategy: “Feature Squeezing” reduce search space available to the adversary Weilin Xu Yanjun Qi

Slide 34

Slide 34 text

Model Model Squeezer 1 Prediction0 Prediction1 "($%&'( , $%&'* , … , $%&', ) Yes Input Adversarial No Legitimate Model’ Squeezer k … Predictionk Detection Framework

Slide 35

Slide 35 text

Model Model Squeezer 1 Prediction0 Prediction1 "($%&'( , $%&'* , … , $%&', ) Yes Input Adversarial No Legitimate Model’ Squeezer k … Predictionk Detection Framework Feature Squeezer coalesces similar inputs into one pointL: • Little change for legitimate inputs. • Destruct adversarial perturbations.

Slide 36

Slide 36 text

Bit Depth Reduction 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 3-bit 1-bit 8-bit Reduce to 1-bit !" = round(!" ×2)/2 Reduce to 1-bit !" = round(!" ×2)/2 [0.312 0.271 …… 0.159 0.651] X* [0.012 0.571 …… 0.159 0.951] X Input Output 35 [0. 1. …… 0. 1. ] [0. 0. …… 0. 1. ] Signal Quantization Adversarial Example Normal Example

Slide 37

Slide 37 text

Bit Depth Reduction 36 Seed 1 1 4 2 2 1 1 1 1 1 CW 2 CW ∞ BIM FGSM

Slide 38

Slide 38 text

Spatial Smoothing: Median Filter Replace a pixel with median of its neighbors. Effective in eliminating ”salt-and-pepper” noise (!" attacks) 37 Image from https://sultanofswing90.wordpress.com/tag/image-processing/ 3×3 Median Filter

Slide 39

Slide 39 text

Spatial Smoothing: Non-local Means Replace a patch with weighted mean of similar patches (in region). 38 ! "# "$ !% = ' ((!, "+ )×"+ Preserves edges, while removing noise.

Slide 40

Slide 40 text

39 Airplane 94.4% Truck 99.9% Automobile 56.5% Airplane 98.4% Airplane 99.9% Ship 46.0% Airplane 98.3% Airplane 80.8% Airplane 70.0% Median Filter (2×2) Non-local Means (13-3-4) Original BIM (L ∞ ) JSMA (L 0 )

Slide 41

Slide 41 text

Other Potential Squeezers 40 C Xie, et al. Mitigating Adversarial Effects Through Randomization, ICLR 2018. J Buckman, et al. Thermometer Encoding: One Hot Way To Resist Adversarial Examples, ICLR 2018. D Meng and H Chen, MagNet: a Two-Pronged Defense against Adversarial Examples, in CCS 2017; A Prakash, et al., Deflecting Adversarial Attacks with Pixel Deflection, CVPR 2018;... Thermometer Encoding (learnable bit depth reduction) Image denoising using autoencoder, wavelet, JPEG, etc. Image resizing ...

Slide 42

Slide 42 text

“Feature Squeezing” (Vacuous) Conjecture For any distance-limited adversarial method, there exists some feature squeezer that accurately detects its adversarial examples. 41 Intuition: if the perturbation is small (in some simple metric space), there is some squeezer that coalesces original and adversarial example into same sample.

Slide 43

Slide 43 text

Feature Squeezing Detection Model (7-layer CNN) Model Model Bit Depth- 1 Median 2×2 Prediction0 Prediction1 Prediction2 Yes Input Adversarial No Legitimate max '( )* , )( , '( )* , )2 > -

Slide 44

Slide 44 text

Detecting Adversarial Examples 43 Distance between original input and its squeezed version Adversarial inputs (CW attack) Legitimate inputs

Slide 45

Slide 45 text

44 0 200 400 600 800 0.0 0.4 0.8 1.2 1.6 2.0 Number of Examples Legitimate Adversarial Maximum !"distance between original and squeezed input threshold = 0.0029 detection: 98.2%, FP < 4% Training a detector (MNIST) set the detection threshold to keep false positive rate below target

Slide 46

Slide 46 text

Aggregated Detection Results Dataset Squeezers Threshold False Positive Rate Detection Rate (SAEs) ROC-AUC Exclude FAEs MNIST Bit Depth (1-bit), Median (2x2) 0.0029 3.98% 98.2% 99.44% CIFAR-10 Bit Depth (5-bit), Median (2x2), Non-local Mean (13-3-2) 1.1402 4.93% 84.5% 95.74% ImageNet Bit Depth (5-bit), Median (2x2), Non-local Mean (11-3-4) 1.2128 8.33% 85.9% 94.24% 45

Slide 47

Slide 47 text

Threat Models Oblivious attack: The adversary has full knowledge of the target model, but is not aware of the detector. Adaptive attack: The adversary has full knowledge of the target model and the detector. 46

Slide 48

Slide 48 text

Adaptive Adversary Adaptive CW2 attack, unbounded adversary: Warren He, James Wei, Xinyun Chen, Nicholas Carlini, Dawn Song, Adversarial Example Defense: Ensembles of Weak Defenses are not Strong, USENIX WOOT’17. !"#"!"$% & '( − * + , ∗ Δ ', '( + 0 ∗ 12 3456%('′) 47 Misclassification term Distance term Detection term

Slide 49

Slide 49 text

Adaptive Adversarial Examples 48 No successful adversarial examples were found for images originally labeled as 3 or 8. Mean L 2 2.80 4.14 4.67 Attack Untargeted Targeted (next) Targeted (least likely)

Slide 50

Slide 50 text

Adaptive Adversary Success Rates 49 0.68 0.06 0.01 0.44 0.01 0.24 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Adversary’s Success Rate Clipped ε Targeted (Next) Targeted (LL) Untargeted Unbounded Typical !

Slide 51

Slide 51 text

Security State-of-the-Art Attack success probability Threat models Proofs Cryptography !"#!$ information theoretic, resource bounded required System Security !"%! capabilities, motivations, rationality common Adversarial Machine Learning !&; !"##* white-box, black-box rare! 50

Slide 52

Slide 52 text

Revisiting Attacker’s Goal Find one adversarial example Find many adversarial examples Suya Yuan Tian

Slide 53

Slide 53 text

Attacker Visibility “White-box attacker” Knows model architecture and all parameters “Black-box attacker” Interacts with model through API Limited number of interactions Output is vector decision-based: output is just class “bird”, 0.09 “horse”, 0.84 ...

Slide 54

Slide 54 text

Black-Box Batch Attacker 0 50 100 150 Images 0 1 2 3 4 5 6 7 8 Number of Queries 10 5 Effort (number of model interactions) to find adversarial example varies by seed most only require a few thousand queries ZOO-attack on MNIST a few require > 10x more effort

Slide 55

Slide 55 text

Easy and Hard Examples “Easy” “Hard” 1024 1280 1536 2560 2816 “Easy” images: 5 with fewest number of queries needed to find adversarial example “Hard” images: 5 with highest number of queries needed (or failed) 138,240 101,376 97,792 75,008 71,424 768,000 query attempts without success 4608 6912 12,800 13,568 14,336 2 → 7 “bird” → “horse”

Slide 56

Slide 56 text

Greedy Search Works 0 50 100 150 Number of Images Selected 0 2 4 6 8 10 12 Average Number of Queries 10 4 Greedy Search Random Search Retroactive Optimal MNIST CIFAR-10 0 50 100 150 Number of Images Selected 0 0.5 1 1.5 2 2.5 3 Average Number of Queries 10 4 Greedy Search Random Search Retroactive Optimal

Slide 57

Slide 57 text

Conclusions Domain Expertise still matters Machine learning models designed without domain knowledge will not be robust against motivated adversaries Immature, but fun and active research area Need to make progress toward meaningful threat models, robustness measures, verifiable defenses Workshop to be held at DSN (Luxembourg, 25 June) Workshop to be held at IEEE S&P (San Francisco, 24 May)

Slide 58

Slide 58 text

David Evans University of Virginia evans@virginia.edu EvadeML.org Weilin Xu Yanjun Qi Suya Yuan Tian Mainuddin Jonas Funding: NSF, Intel