Slide 1

Slide 1 text

Is "adversarial examples" an Adversarial Example? David Evans University of Virginia evadeML.org Deep Learning and Security Workshop 24 May 2018 San Francisco, CA

Slide 2

Slide 2 text

GDPR in effect May 25 (tomorrow)!

Slide 3

Slide 3 text

GDPR in effect now!

Slide 4

Slide 4 text

GDPR in Effect 00:37:34 Response Due 71:22:26 Maximum Fine (Google) $2,120,889,281 GDPR in Effect 00:37:35 Response Due 71:22:25 Maximum Fine (Google) $2,120,889,451 “Manager’s nightmare, but a researcher’s paradise!” – David Basin GDPR in Effect 00:37:36 Response Due 71:22:24 Maximum Fine (Google) $2,120,889,622 GDPR in Effect 00:37:37 Response Due 71:22:23 Maximum Fine (Google) $2,120,889,792 GDPR in Effect 00:37:38 Response Due 71:22:22 Maximum Fine (Google) $2,120,889,962 GDPR in Effect 00:37:39 Response Due 71:22:21 Maximum Fine (Google) $2,120,890,133 GDPR in Effect 00:37:40 Response Due 71:22:20 Maximum Fine (Google) $2,120,890,304 GDPR in Effect 00:37:41 Response Due 71:22:19 Maximum Fine (Google) $2,120,890,474 GDPR in Effect 00:37:42 Response Due 71:22:18 Maximum Fine (Google) $2,120,890,645 GDPR in Effect 00:37:43 Response Due 71:22:17 Maximum Fine (Google) $2,120,890,815 GDPR in Effect 00:37:42 Response Due 71:22:18 Maximum Fine (Google) $2,120,890,986 GDPR in Effect 00:37:43 Response Due 71:22:17 Maximum Fine (Google) $2,120,891,156 GDPR in Effect 00:37:44 Response Due 71:22:16 Maximum Fine (Google) $2,120,891,327 GDPR in Effect 00:37:45 Response Due 71:22:15 Maximum Fine (Google) $2,120,891,497 GDPR in Effect 00:37:46 Response Due 71:22:14 Maximum Fine (Google) $2,120,891,667 GDPR in Effect 00:37:47 Response Due 71:22:13 Maximum Fine (Google) $2,120,891,838 GDPR in Effect 00:37:48 Response Due 71:22:12 Maximum Fine (Google) $2,120,891,838 GDPR in Effect 00:37:49 Response Due 71:22:11 Maximum Fine (Google) $2,120,892,008 GDPR in Effect 00:37:50 Response Due 71:22:10 Maximum Fine (Google) $2,120,892,179 GDPR in Effect 00:37:51 Response Due 71:22:09 Maximum Fine (Google) $2,120,892,349 GDPR in Effect 00:37:52 Response Due 71:22:08 Maximum Fine (Google) $2,120,892,520 GDPR in Effect 00:37:53 Response Due 71:22:07 Maximum Fine (Google) $2,120,892,690 GDPR in Effect 00:37:54 Response Due 71:22:06 Maximum Fine (Google) $2,120,892,861 GDPR in Effect 00:37:55 Response Due 71:22:05 Maximum Fine (Google) $2,120,893,031 GDPR in Effect 00:37:56 Response Due 71:22:04 Maximum Fine (Google) $2,120,893,202 GDPR in Effect 00:37:57 Response Due 71:22:03 Maximum Fine (Google) $2,120,893,372 GDPR in Effect 00:37:58 Response Due 71:22:02 Maximum Fine (Google) $2,120,893,543 GDPR in Effect 00:37:59 Response Due 71:22:01 Maximum Fine (Google) $2,120,893,713 GDPR in Effect 00:38:00 Response Due 71:22:00 Maximum Fine (Google) $2,120,893,884 GDPR in Effect 00:38:01 Response Due 71:21:59 Maximum Fine (Google) $2,120,894,054 GDPR in Effect 00:38:02 Response Due 71:21:58 Maximum Fine (Google) $2,120,894,224 GDPR in Effect Response Due Maximum Fine (Google) GDPR in effect now!

Slide 5

Slide 5 text

Article 22

Slide 6

Slide 6 text

Is “adversarial examples” an Adversarial Example?

Slide 7

Slide 7 text

6 Papers on “Adversarial Examples” (Google Scholar) 675 0 200 400 600 800 1000 1200 2018 (5/22) 2017 2016 2015 2014 2013 1241.5 papers expected in 2018!

Slide 8

Slide 8 text

Adversarial Examples before Deep Learning 7

Slide 9

Slide 9 text

Adversarial Examples “before ML” Péter Ször (1970-2013)

Slide 10

Slide 10 text

Adversarial Examples before “Oakland” 9

Slide 11

Slide 11 text

Adversarial Examples before “Oakland” 10 The crowd, uncertain, was split by opposing opinions. Then Laocoön rushes down eagerly from the heights of the citadel, to confront them all, a large crowd with him, and shouts from far off: ‘O unhappy citizens, what madness? ... Do you think the enemy’s sailed away? Or do you think any Greek gift’s free of treachery? Is that Ulysses’s reputation? Either there are Greeks in hiding, concealed by the wood, or it’s been built as a machine to use against our walls, or spy on our homes, or fall on the city from above, or it hides some other trick: Trojans, don’t trust this horse. Whatever it is, I’m afraid of Greeks even those bearing gifts.’ Virgil, The Aenid (Book II)

Slide 12

Slide 12 text

11 How should we define “adversarial example”?

Slide 13

Slide 13 text

How should we define “adversarial example”? 12 “Adversarial examples are inputs to machine learning models that an attacker has intentionally designed to cause the model to make a mistake.” Ian Goodfellow, earlier today

Slide 14

Slide 14 text

Adversarial Examples across Domains 13 Domain Classifier Space “Reality” Space Trojan Wars Judgment of Trojans !(#) = “gift” Physical Reality !∗(#) = invading army Malware Malware Detector !(#) = “benign” Victim’s Execution !∗(#) = malicious behavior Image Classification DNN Classifier !(#) = ) Human Perception !∗(#) = * Later Next Not DL

Slide 15

Slide 15 text

Malware Adversarial Examples 14 Classifier Space Oracle Space actual program execution https://github.com/cuckoosandbox Cuckoo

Slide 16

Slide 16 text

“Oracle” Definition 15 Given seed sample, !, !" is an adversarial example iff: # !" = % Class is % (for malware, %= “benign”) ℬ !′) = ℬ(! Behavior we care about is the same Malware: evasive variant preserves malicious behavior of seed, but is classified as benign No requirement that ! ~ !′ except through ℬ.

Slide 17

Slide 17 text

Definitions suggest Attacks 16 Given seed sample, !, !" is an adversarial example iff: # !" = % Class is % (for malware, %= “benign”) ℬ !′) = ℬ(! Behavior we care about is the same Generic attack: heuristically explore input space for !′ that satisfies definition.

Slide 18

Slide 18 text

Variants Evolutionary Search Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Benign Oracle Weilin Xu Yanjun Qi Fitness Selection Mutant Generation

Slide 19

Slide 19 text

Variants Generating Variants Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Fitness Selection Mutant Generation

Slide 20

Slide 20 text

Variants Generating Variants Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Found Evasive ? 0 /JavaScript eval(‘…’); /Root /Catalog /Pages Select random node Randomly transform: delete, insert, replace

Slide 21

Slide 21 text

Variants Generating Variants Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants Variants Select Variants Found Evasive? Found Evasive ? Select random node Randomly transform: delete, insert, replace Nodes from Benign PDFs 0 /JavaScript eval(‘…’); /Root /Catalog /Pages 128 546 7 63 128

Slide 22

Slide 22 text

Variants Selecting Promising Variants Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Fitness Selection Mutant Generation

Slide 23

Slide 23 text

Variants Selecting Promising Variants Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Fitness Function Candidate Variant !(#$%&'() , #'(&++ ) Score Malicious 0 /JavaScript eval(‘…’); /Root /Catalog /Pages 128 Oracle Target Classifier

Slide 24

Slide 24 text

Oracle: ℬ "′) = ℬ(" ? Execute candidate in vulnerable Adobe Reader in virtual environment Behavioral signature: malicious if signature matches https://github.com/cuckoosandbox Simulated network: INetSim Cuckoo HTTP_URL + HOST extracted from API traces

Slide 25

Slide 25 text

Fitness Function Assumes lost malicious behavior will not be recovered !itness '′ = * 1 − classi!ier_score '3 if ℬ '′) = ℬ(' −∞ otherwise

Slide 26

Slide 26 text

0 100 200 300 400 500 0 100 200 300 Seeds Evaded (out of 500) PDFRate Number of Mutations Hidost

Slide 27

Slide 27 text

0 100 200 300 400 500 0 100 200 300 Seeds Evaded (out of 500) PDFRate Number of Mutations Hidost Simple transformations often worked

Slide 28

Slide 28 text

0 100 200 300 400 500 0 100 200 300 Seeds Evaded (out of 500) PDFRate Number of Mutations Hidost (insert, /Root/Pages/Kids, 3:/Root/Pages/Kids/4/Kids/5/) Works on 162/500 seeds

Slide 29

Slide 29 text

0 100 200 300 400 500 0 100 200 300 Seeds Evaded (out of 500) PDFRate Number of Mutations Hidost Some seeds required complex transformations

Slide 30

Slide 30 text

Attacks suggest Defenses* 29 Definitions suggest Attacks

Slide 31

Slide 31 text

Attacks suggest Defenses* 30 * That only work against a very particular instantiation of that attack. Definitions suggest Attacks Maginot Line Enigma Plugboard

Slide 32

Slide 32 text

Malicious Label Threshold Original Malicious Seeds Evading PDFrate Classification Score Malware Seed (sorted by original score) Discovered Evasive Variants

Slide 33

Slide 33 text

Discovered Evasive Variants Malicious Label Threshold Original Malicious Seeds Adjust threshold? Charles Smutz, Angelos Stavrou. When a Tree Falls: Using Diversity in Ensemble Classifiers to Identify Evasion in Malware Detectors. NDSS 2016. Classification Score Malware Seed (sorted by original score)

Slide 34

Slide 34 text

Variants found with threshold = 0.25 Variants found with threshold = 0.50 Adjust threshold? Classification Score Malware Seed (sorted by original score)

Slide 35

Slide 35 text

Variants Hide the Classifier Score? Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Fitness Function Candidate Variant !(#$%&'() , #'(&++ ) Score Malicious 0 /JavaScript eval(‘…’); /Root /Catalog /Pages 128 Oracle Target Classifier

Slide 36

Slide 36 text

Variants Binary Classifier Output is Enough Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Fitness Function Candidate Variant !(#$%&'() , #'(&++ ) Score Malicious 0 /JavaScript eval(‘…’); /Root /Catalog /Pages 128 Oracle Target Classifier ACM CCS 2017

Slide 37

Slide 37 text

36 Defenses should be designed around clear definitions of adversary goals and capabilities, not around thwarting particular attacks. (The second oldest principle in security.)

Slide 38

Slide 38 text

Adversarial Examples across Domains 37 Domain Classifier Space “Reality” Space Trojan Wars Judgment of Trojans !(#) = “gift” Physical Reality !∗(#) = invading army Malware Malware Detector !(#) = “benign” Victim’s Execution !∗(#) = malicious behavior Image Classification DNN Classifier !(#) = ) Human Perception !∗(#) = * Next Done Not DL

Slide 39

Slide 39 text

38 Adversarial Examples across Domains Domain Classifier Space “Reality” Space Trojan Wars Judgment of Trojans !(#) = “gift” Physical Reality !∗(#) = invading army Malware Malware Detector !(#) = “benign” Victim’s Execution !∗(#) = malicious behavior Image Classification DNN Classifier !(#) = ) Human Perception !∗(#) = * Fixing (Breaking?) the Definition

Slide 40

Slide 40 text

Image Classification DNN Classifier !(#) = & Human Perception !∗(#) = ( 39 Fixing (Breaking?) the Definition

Slide 41

Slide 41 text

Well-Trained Classifier 40 Model and visualization based on work by Beilun Wang, Ji Gao and Yanjun Qi (ICLR 2017 Workshop) Classifier Space (DNN Model) “Oracle” Space (human perception)

Slide 42

Slide 42 text

Adversarial Examples 41 Model and visualization based on work by Beilun Wang, Ji Gao and Yanjun Qi (ICLR 2017 Workshop) Classifier Space (DNN Model) “Oracle” Space (human perception)

Slide 43

Slide 43 text

Misleading Visualization 42 Cartoon Reality 2 dimensions thousands of dimensions few samples near boundaries all samples near boundaries every sample near 1-3 classes every sample near all classes Classifier Space (DNN Model)

Slide 44

Slide 44 text

Adversarial Examples 43 Adversary’s goal: find a small perturbation that changes class for classifier, but imperceptible to oracle. Classifier Space (DNN Model) “Oracle” Space (human perception)

Slide 45

Slide 45 text

44 Battista Biggio, et al. ECML-KDD 2013

Slide 46

Slide 46 text

“Biggio” Definition 45 Assumption (to map to earlier definition): small perturbation does not change class in “Reality Space” Given seed sample, !, !" is an adversarial example iff: # !" = % Class is % (targeted) ∆ !, !" ≤ ) Difference below threshold ∆ !, !" is defined in some (simple!) metric space: *+ norm (# different), *, , *- norm (“Euclidean distance”), *.

Slide 47

Slide 47 text

“Biggio” Definition 46 Given seed sample, !, !" is an adversarial example iff: # !" = % Class is % (targeted) ∆ !, !" ≤ ) Difference below threshold ∆ !, !" is defined in some (simple!) metric space: *+ norm (# different), *, , *- norm (“Euclidean distance”), *. Problem #1: Every model with boundaries has adversarial examples. Problem #2: Very unnatural limit on adversary strength. Problem #3: Values all adversarial examples equally.

Slide 48

Slide 48 text

DSML Papers 47 Biggio Definition (6) No Version On-Line (5) Oracle Definition (3) KFS, YKLALYP, RG AHHO, CW, GLSQ, HD, MW, SBC Building Classifiers (5) AMNKV, CSS, DAF, SHWS, ZCPS, Software (2) BGS, XLZX

Slide 49

Slide 49 text

Impact of Adversarial Perturbations 48 Distance between layer output and its output for original seed FGSM ! = 0.0245 CIFAR-10 DenseNet 95th percentile 5th percentile

Slide 50

Slide 50 text

Impact of Adversarial Perturbations 49 Distance between layer output and its output for original seed FGSM ! = 0.0245 CIFAR-10 DenseNet 95th percentile 5th percentile Mainuddin Jonas

Slide 51

Slide 51 text

Impact of Adversarial Perturbations 50 Distance between layer output and its output for original seed Random noise (same amount) FGSM ! = 0.0245 CIFAR-10 DenseNet

Slide 52

Slide 52 text

Impact of Adversarial Perturbations 51 Distance between layer output and its output for original seed Random noise (same amount) Carlini- Wagner L2 CIFAR-10 DenseNet

Slide 53

Slide 53 text

Definitions Suggest Defenses 52 Given seed sample, !, !" is an adversarial example iff: # !" = % Class is % (targeted) ∆ !, !" ≤ ) Difference below threshold ∆ !, !" is defined in some (simple!) metric space: *+ norm (# different), *, , *- norm (“Euclidean distance”), *. Suggested Defense: given an input !∗, see how the model behaves on 0(!∗) where 0(3) reverses transformations in ∆-space.

Slide 54

Slide 54 text

Model Model Squeezer 1 Prediction0 Prediction1 "($%&'( , $%&'* , … , $%&', ) Input Adversarial Legitimate Model’ Squeezer k … Predictionk Feature Squeezing Detection Framework Weilin Xu Yanjun Qi

Slide 55

Slide 55 text

Model Model Squeezer 1 Prediction0 Prediction1 "($%&'( , $%&'* , … , $%&', ) Input Adversarial Legitimate Model’ Squeezer k … Predictionk Feature Squeezing Detection Framework Feature Squeezer coalesces similar inputs into one point: • Barely change legitimate inputs. • Destruct adversarial perturbations.

Slide 56

Slide 56 text

Coalescing by Feature Squeezing 55 Metric Space 1: Target Classifier Metric Space 2: “Oracle” Before: find a small perturbation that changes class for classifier, but imperceptible to oracle. Now: change class for both original and squeezed classifier, but imperceptible to oracle.

Slide 57

Slide 57 text

Example Squeezer: Bit Depth Reduction 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 3-bit 1-bit 8-bit Input Output 56 Signal Quantization

Slide 58

Slide 58 text

Example Squeezer: Bit Depth Reduction 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 3-bit 1-bit 8-bit Input Output 57 Signal Quantization Seed 1 1 4 2 2 1 1 1 1 1 CW 2 CW ∞ BIM FGSM

Slide 59

Slide 59 text

Other Potential Squeezers 58 C Xie, et al. Mitigating Adversarial Effects Through Randomization, ICLR 2018. J Buckman, et al. Thermometer Encoding: One Hot Way To Resist Adversarial Examples, ICLR 2018. D Meng and H Chen, MagNet: a Two-Pronged Defense against Adversarial Examples, CCS 2017; A Prakash, et al., Deflecting Adversarial Attacks with Pixel Deflection, CVPR 2018;... Thermometer Encoding (learnable bit depth reduction) Image denoising using autoencoder, wavelet, JPEG, etc. Image resizing ... Spatial Smoothers: median filter, non-local means

Slide 60

Slide 60 text

“Feature Squeezing” (Vacuous) Conjecture For any distance-limited adversarial method, there exists some feature squeezer that accurately detects its adversarial examples. 59 Intuition: if the perturbation is small (in some simple metric space), there is some squeezer that coalesces original and adversarial example into same sample.

Slide 61

Slide 61 text

Feature Squeezing Detection Model (7-layer CNN) Model Model Bit Depth- 1 Median 2×2 Prediction0 Prediction1 Prediction2 Yes Input Adversarial No Legitimate max '( )* , )( , '( )* , )2 > -

Slide 62

Slide 62 text

Detecting Adversarial Examples Distance between original input and its squeezed version Adversarial inputs (CW attack) Legitimate inputs

Slide 63

Slide 63 text

62 0 200 400 600 800 0.0 0.4 0.8 1.2 1.6 2.0 Number of Examples Legitimate Adversarial Maximum !" distance between original and squeezed input threshold = 0.0029 detection: 98.2%, FP < 4% Training a detector (MNIST) set the detection threshold to keep false positive rate below target

Slide 64

Slide 64 text

ImageNet Configuration Model (MobileNet) Model Model Bit Depth- 5 Median 2×2 Prediction0 Prediction1 Prediction2 Yes Input Adversarial No Legitimate max(() (*+ , {*) , *. , */ }) > 3 Model Non-local Mean Prediction3

Slide 65

Slide 65 text

64 0 20 40 60 80 100 120 140 0.0 0.4 0.8 1.2 1.6 2.0 Legitimate Adversarial Maximum !" distance between original and squeezed input threshold = 1.24 detection: 85%, FP < 5% Training a detector (ImageNet)

Slide 66

Slide 66 text

How should we evaluate defenses? 65

Slide 67

Slide 67 text

Threat Models Oblivious attack: The adversary has full knowledge of the target model, but is not aware of the detector. Adaptive attack: The adversary has full knowledge of the target model and the detector. 66

Slide 68

Slide 68 text

(Generic) Adaptive Adversary Adaptive CW 2 attack, unbounded adversary: Warren He, James Wei, Xinyun Chen, Nicholas Carlini, Dawn Song, Adversarial Example Defense: Ensembles of Weak Defenses are not Strong, USENIX WOOT’17. !"#"!"$% & '( − * + , ∗ Δ ', '( + 0 ∗ 12 3456%('′) 67 Misclassification term Distance term Detection term

Slide 69

Slide 69 text

Adaptive Adversarial Examples 68 No successful adversarial examples were found for images originally labeled as 3 or 8. Mean L2 2.80 4.14 4.67 Attack Untargeted Targeted (next) Targeted (least likely)

Slide 70

Slide 70 text

Adaptive Adversary Success Rates 69 0.68 0.06 0.01 0.44 0.01 0.24 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Adversary’s Success Rate Clipped ε Targeted (Next) Targeted (LL) Untargeted Unbounded Typical !

Slide 71

Slide 71 text

Revisiting Attacker’s Goal Find one adversarial example Find many adversarial examples Suya Yuan Tian

Slide 72

Slide 72 text

Attacker Visibility “White-box attacker” Knows model architecture and all parameters “Black-box attacker” Interacts with model through API Limited number of interactions Output is vector decision-based: output is just class “bird”, 0.09 “horse”, 0.84 ...

Slide 73

Slide 73 text

Black-Box Cost Variance 72 CIFAR-10 MNIST Number of Queries Number of Adversarial Examples Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, Cho-Jui Hsieh. ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models. AISec 2017. (1000 queries per iteration, 256 max iters) fails for 14 117,820 Average for MNIST 60,378 CIFAR-10 overall average 15,795 (26%) CIFAR-10 lowest-cost 20 ave: 50,240 Average for lowest 20 Target: least-likely class max ! 2 = 3 256,000

Slide 74

Slide 74 text

Easy and Hard Examples “Easy” “Hard” “Easy” images: 5 with fewest number of queries needed to find adversarial example “Hard” images: 5 with highest number of queries (failed) 256,000 query attempts without success 14,592 0 → (least likely) 1 43,008 43,776 49,152 49,920

Slide 75

Slide 75 text

Easy and Hard Examples “Easy” “Hard” “Easy” images: 5 with fewest number of queries needed to find adversarial example “Hard” images: 5 with highest number of queries (failed) 256,000 query attempts without success 14,592 0 → (least likely) 1 “airplane” → “frog” 43,008 43,776 49,152 49,920 9,728 10,496 10,752 12,288 13,824 256,000 query attempts without success

Slide 76

Slide 76 text

White-Box Cost Variance 75 CIFAR-10 MNIST Number of Iterations Number of Adversarial Examples Carlini-Wagner L 2 Attack 82 CIFAR-10 average Target: least-likely class MNIST: max ! 2 = 3.0 CIFAR-10: max ! 2 = 1.0 2000 566 Average for MNIST 174 Average for lowest 20

Slide 77

Slide 77 text

White-Box Cost Variance 76 CIFAR-10 MNIST Number of Iterations Number of Adversarial Examples Carlini-Wagner L 2 Attack Target: least-likely class MNIST: max ! 2 = 3.0 CIFAR-10: max ! 2 = 1.0 2000 566 174 Average for lowest 20 CIFAR-10 lowest 20 (average: 3.6) 82 CIFAR-10 average

Slide 78

Slide 78 text

How does cost-variance impact attack cost? 77

Slide 79

Slide 79 text

78 CIFAR-10 Average queries per AE found (× 10$) Random target selection Greedy heuristic Oracle Optimal Simple Greedy Search Works Well Number of Adversarial Examples MNIST Number of Adversarial Examples ZOO Black-Box Attack Target: 20 MNIST CIFAR Greedy/Optimal 1.50 1.30 Random/Optimal 2.37 3.86 Target: 50 MNIST CIFAR Greedy/Optimal 1.46 1.21 Random/Optimal 1.96 2.45

Slide 80

Slide 80 text

White-Box Batch Attack Cost 79 Random target selection Greedy heuristic Oracle Optimal CIFAR-10 Average iterations per AE found Number of Adversarial Examples MNIST Number of Adversarial Examples CW L 2 Attack Target: 20 MNIST CIFAR Greedy/Optimal 2.01 1.22 Random/Optimal 3.20 20.05 Target: 50 MNIST CIFAR Greedy/Optimal 1.76 1.50 Random/Optimal 2.45 15.11

Slide 81

Slide 81 text

Madry Defense 80 Accuracy “9” “7” “0” Batch (10 samples, sorted by initial distance) MNIST airplane cars deer CIFAR-10 Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, Adrian Vladu. Towards Deep Learning Models Resistant to Adversarial Attacks. https://github.com/Madr yLab/mnist_challenge

Slide 82

Slide 82 text

History of the destruction of Troy, 1498 Conclusion

Slide 83

Slide 83 text

Security State-of-the-Art Attack success probability Threat models Proofs Cryptography !"#!$ information theoretic, resource bounded required System Security !"%! capabilities, motivations, rationality common Adversarial Machine Learning !&; !"#* white-box, black-box making progress? 82

Slide 84

Slide 84 text

83 Ali Rahimi NIPS Test-of-Time Award Speech (Dec 2017) ”If you're building photo- sharing systems alchemy is okay but we're beyond that; now we're building systems that govern healthcare and mediate our civic dialogue”

Slide 85

Slide 85 text

84 Ali Rahimi NIPS Test-of-Time Award Speech (Dec 2017) ”If you're building photo- sharing systems alchemy is okay but we're beyond that; now we're building systems that govern healthcare and mediate our civic dialogue”

Slide 86

Slide 86 text

Alchemy (~700 − 1660) Well-defined, testable goal (turn lead into gold) Established theory (four elements: earth, fire, water, air) Methodical experiments and lab techniques (Jabir ibn Hayyan in 8th century) Wrong and ultimately unsuccessful, but led to modern chemistry.

Slide 87

Slide 87 text

86 Domain Classifier Space “Reality” Space Trojan Wars Judgment of Trojans !(#) = “gi=” Physical Reality !∗(#) = invading army Malware Malware Detector !(#) = “benign” Victim’s Execution !∗(#) = malicious behavior Image Classification DNN Classifier !(#) = ) Human Perception !∗(#) = *

Slide 88

Slide 88 text

87 Domain Classifier Space “Reality” Space Trojan Wars Judgment of Trojans !(#) = “gift” Physical Reality !∗(#) = invading army Malware Malware Detector !(#) = “benign” Victim’s Execution !∗(#) = malicious behavior Image Classification DNN Classifier !(#) = ) Human Perception !∗(#) = * Academic Research Conferences, Fun !(+,s) = “awesome” Systems, Society, Ideas !∗ +,s = ?

Slide 89

Slide 89 text

David Evans University of Virginia evans@virginia.edu EvadeML.org Weilin Xu Yanjun Qi Fnu Suya Yuan Tian Mainuddin Jonas Funding: NSF, Intel

Slide 90

Slide 90 text

89

Slide 91

Slide 91 text

David Evans University of Virginia evans@virginia.edu EvadeML.org Weilin Xu Yanjun Qi Fnu Suya Yuan Tian Mainuddin Jonas Funding: NSF, Intel

Slide 92

Slide 92 text

91 @_youhadonejob1

Slide 93

Slide 93 text

92