Slide 1

Slide 1 text

Can Machine Learning Ever Be Trustworthy? David Evans University of Virginia evadeML.org 7 December 2018 University of Maryland

Slide 2

Slide 2 text

1 No!

Slide 3

Slide 3 text

2 Its too late!

Slide 4

Slide 4 text

3 “Unfortunately, our translation systems made an error last week that misinterpreted what this individual posted. Even though our translations are getting better each day, mistakes like these might happen from time to time and we’ve taken steps to address this particular issue. We apologize to him and his family for the mistake and the disruption this caused.”

Slide 5

Slide 5 text

4

Slide 6

Slide 6 text

Amazon Employment 5

Slide 7

Slide 7 text

Amazon Employment 6

Slide 8

Slide 8 text

Risks from Artificial Intelligence 7 Benign developers and operators AI out of control AI inadvertently causes harm Malicious operators Build AI to do harm Malicious abuse of benign AI On Robots Joe Berger and Pascal Wyse (The Guardian, 21 July 2018)

Slide 9

Slide 9 text

Risks from Artificial Intelligence Benign developers and operators AI out of control AI inadvertently causes harm Malicious operators Build AI to do harm Malicious abuse of benign AI systems 8

Slide 10

Slide 10 text

Crash Course in Artificial Intelligence and Machine Learning 9

Slide 11

Slide 11 text

Labelled Training Data ML Algorithm Feature Extraction Vectors Deployment Malicious / Benign Operational Data Trained Classifier Training (supervised learning) Statistical Machine Learning

Slide 12

Slide 12 text

Labelled Training Data ML Algorithm Feature Extraction Vectors Deployment Malicious / Benign Operational Data Trained Classifier Training (supervised learning) Assumption: Training Data is Representative

Slide 13

Slide 13 text

Deployment Adversaries Don’t Cooperate Assumption: Training Data is Representative Training Poisoning

Slide 14

Slide 14 text

Adversaries Don’t Cooperate Assumption: Training Data is Representative Evading Deployment Training

Slide 15

Slide 15 text

14

Slide 16

Slide 16 text

More Ambition 15 “The human race will have a new kind of instrument which will increase the power of the mind much more than optical lenses strengthen the eyes and which will be as far superior to microscopes or telescopes as reason is superior to sight.”

Slide 17

Slide 17 text

More Ambition 16 “The human race will have a new kind of instrument which will increase the power of the mind much more than optical lenses strengthen the eyes and which will be as far superior to microscopes or telescopes as reason is superior to sight.” Gottfried Wilhelm Leibniz (1679)

Slide 18

Slide 18 text

17 Gottfried Wilhelm Leibniz (Universitat Altdorf, 1666) who advised: Jacob Bernoulli (Universitdt Basel, 1684) who advised: Johann Bernoulli (Universitdt Basel, 1694) who advised: Leonhard Euler (Universitat Basel, 1726) who advised: Joseph Louis Lagrange who advised: Simeon Denis Poisson who advised: Michel Chasles (Ecole Polytechnique, 1814) who advised: H. A. (Hubert Anson) Newton (Yale, 1850) who advised: E. H. Moore (Yale, 1885) who advised: Oswald Veblen (U. of Chicago, 1903) who advised: Philip Franklin (Princeton 1921) who advised: Alan Perlis (MIT Math PhD 1950) who advised: Jerry Feldman (CMU Math 1966) who advised: Jim Horning (Stanford CS PhD 1969) who advised: John Guttag (U. of Toronto CS PhD 1975) who advised: David Evans (MIT CS PhD 2000) my academic great- great-great-great- great-great-great- great-great-great- great-great-great- great-great- grandparent!

Slide 19

Slide 19 text

More Precision 18 “The human race will have a new kind of instrument which will increase the power of the mind much more than optical lenses strengthen the eyes and which will be as far superior to microscopes or telescopes as reason is superior to sight.” Gottfried Wilhelm Leibniz (1679) Normal computing amplifies (quadrillions of times faster) and aggregates (enables millions of humans to work together) human cognitive abilities; AI goes beyond what humans can do.

Slide 20

Slide 20 text

Operational Definition “Artificial Intelligence” means making computers do things their programmers don’t understand well enough to program explicitly. 19 If it is explainable, its not ML!

Slide 21

Slide 21 text

Inherent Paradox of “Trustworthy” ML 20 “Artificial Intelligence” means making computers do things their programmers don’t understand well enough to program explicitly. If we could specify precisely what the model should do, we wouldn’t need ML to do it!

Slide 22

Slide 22 text

Inherent Paradox of “Trustworthy” ML 21 If we could specify precisely what the model should do, we wouldn’t need ML to do it! Best we hope for is verifying certain properties M1 M2 ∀": $% " = $' (") DeepXplore: Automated Whitebox Testing of Deep Learning Systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana. SOSP 2017 Model Similarity

Slide 23

Slide 23 text

Inherent Paradox of “Trustworthy” ML 22 Best we hope for is verifying certain properties M 1 M 2 ∀" ∈ $: &' " ≈ &) (") DeepXplore: Automated Whitebox Testing of Deep Learning Systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana. SOSP 2017 Model Similarity M ∀" ∈ $, ∀∆ ∈ .: & " ≈ &(" + ∆) " " + ∆ Model Robustness 0 M 0∗

Slide 24

Slide 24 text

Third Strategy: Specify Containing System 23 Somesh Jha’s talk (Oct 26)

Slide 25

Slide 25 text

Adversarial Robustness 24 M ∀" ∈ $, ∀∆ ∈ ': ) " ≈ )(" + ∆) " " + ∆ . M .∗ Adversary’s Goal: find a “small” perturbation that changes model output targeted attack: in some desired way Defender’s Goal: Robust Model: find model where this is hard Detection: detect inputs that are adversarial

Slide 26

Slide 26 text

Not a new problem... 25 Or do you think any Greek gift’s free of treachery? Is that Ulysses’s reputation? Either there are Greeks in hiding, concealed by the wood, or it’s been built as a machine to use against our walls, or spy on our homes, or fall on the city from above, or it hides some other trick: Trojans, don’t trust this horse. Whatever it is, I’m afraid of Greeks even those bearing gifts.’ Virgil, The Aenid (Book II)

Slide 27

Slide 27 text

Adversarial Examples for DNNs 26 0.007 × [&'()*] + = “panda” “gibbon” Example from: Ian J. Goodfellow, Jonathon Shlens, Christian Szegedy. Explaining and Harnessing Adversarial Examples. 2014 (in ICLR 2015)

Slide 28

Slide 28 text

Impact of Adversarial Perturbations 27 Distance between layer output and its output for original seed FGSM ! = 0.0245 CIFAR-10 DenseNet 95th percentile 5th percentile

Slide 29

Slide 29 text

Impact of Adversarial Perturbations 28 Distance between layer output and its output for original seed Random noise (same amount) FGSM ! = 0.0245 CIFAR-10 DenseNet

Slide 30

Slide 30 text

Impact of Adversarial Perturbations 29 Distance between layer output and its output for original seed Random noise (same amount) Carlini- Wagner L2 CIFAR-10 DenseNet

Slide 31

Slide 31 text

0 200 400 600 800 1000 1200 1400 1600 1800 2018 2017 2016 2015 2014 2013 30 Papers on “Adversarial Examples” (Google Scholar) 1826.68 papers expected in 2018!

Slide 32

Slide 32 text

0 200 400 600 800 1000 1200 1400 1600 1800 2018 2017 2016 2015 2014 2013 31 Papers on “Adversarial Examples” (Google Scholar) 1826.68 papers expected in 2018!

Slide 33

Slide 33 text

0 200 400 600 800 1000 1200 1400 1600 1800 2018 2017 2016 2015 2014 2013 32 Emergence of “Theory” ICML Workshop 2015 15% of 2018 “adversarial examples” papers contain “theorem” and “proof”

Slide 34

Slide 34 text

Adversarial Example 33 Prediction Change Definition: An input, !′ ∈ $, is an adversarial example for ! ∈ $, iff ∃!& ∈ Ball* (!) such that - ! ≠ - !& .

Slide 35

Slide 35 text

Adversarial Example 34 Ball$ (&) is some space around &, typically defined in some (simple!) metric space: () norm (# different), (* norm (“Euclidean distance”), (+ Without constraints on Ball$ , every input has adversarial examples. Prediction Change Definition: An input, &′ ∈ /, is an adversarial example for & ∈ /, iff ∃&1 ∈ Ball$ (&) such that 2 & ≠ 2 &1 .

Slide 36

Slide 36 text

Adversarial Example 35 Any non-trivial model has adversarial examples: ∃"# , "% ∈ '. ) "# ≠ )("% ) Prediction Change Definition: An input, -′ ∈ ', is an adversarial example for - ∈ ', iff ∃-/ ∈ Ball3 (-) such that ) - ≠ ) -/ .

Slide 37

Slide 37 text

Prediction Error Robustness 36 Error Robustness: An input, !′ ∈ $, is an adversarial example for (correct) ! ∈ $, iff ∃!& ∈ Ball* (!) such that - !′ ≠ true label for !′. Perfect classifier has no (error robustness) adversarial examples.

Slide 38

Slide 38 text

Prediction Error Robustness 37 Error Robustness: An input, !′ ∈ $, is an adversarial example for (correct) ! ∈ $, iff ∃!& ∈ Ball* (!) such that - !′ ≠ true label for !′. Perfect classifier has no (error robustness) adversarial examples. If we have a way to know this, don’t need an ML classifier.

Slide 39

Slide 39 text

Global Robustness Properties 38 Adversarial Risk: probability an input has an adversarial example Pr # ← % [∃ () ∈ +,--. ( . 0 () ≠ class (′ ] Dimitrios I. Diochnos, Saeed Mahloujifar, Mohammad Mahmoody, NeurIPS 2018

Slide 40

Slide 40 text

Global Robustness Properties 39 Dimitrios I. Diochnos, Saeed Mahloujifar, Mohammad Mahmoody, NeurIPS 2018 Adversarial Risk: probability an input has an adversarial example Pr # ← % [∃ () ∈ +,--. ( . 0 () ≠ class (′ ] Error Region Robustness: expected distance to closest AE: 8 # ← % [inf { =: ∃ () ∈ +,--. ( . 0 () ≠ class () }]

Slide 41

Slide 41 text

Assumption Key Result Adversarial Spheres [Gilmer et al., 2018] Uniform distribution on two concentric !-spheres Expected safe distance ("# -norm) is relatively small. Adversarial vulnerability for any classifier [Fawzi × 3, 2018] Smooth generative model: 1. Gaussian in latent space. 2. Generator is L-Lipschitz. Adversarial risk ⟶ 1 for relatively small attack strength ("# -norm). Curse of Concentration in Robust Learning [Mahloujifar et al., 2018] Normal Lévy families • Unit sphere, uniform, "# norm • Boolean hypercube, uniform, Hamming distance ... If attack strength exceeds a relatively small threshold, adversarial risk > 1/2. b > p log(k1/") p k2 · n ! Riskb(h, c) 1/2 Recent Global Robustness Results P(r(x)  ⌘) 1 r ⇡ 2 e ⌘2/2L2 Properties of any model for input space: distance to AE is small relative to expected distance between two sampled points

Slide 42

Slide 42 text

Prediction Change Robustness 41 Prediction Change: An input, !′ ∈ $, is an adversarial example for ! ∈ $, iff ∃!& ∈ Ball* (!) such that - !′ ≠ - ! . Any non-trivial model has adversarial examples: ∃!0 , !2 ∈ $. - !0 ≠ -(!2 ) Solutions: - only consider particular inputs (“good” seeds) - output isn’t just class (e.g., confidence) - targeted adversarial examples cost-sensitive adversarial robustness

Slide 43

Slide 43 text

Local (Instance) Robustness 42 Robust Region: For an input !, the robust region is the maximum region with no adversarial example: sup % > 0 ∀)* ∈ Ball/ ) , 1 )* = 1 ) }

Slide 44

Slide 44 text

Local (Instance) Robustness 43 Robust Region: For an input !, the robust region is the maximum region with no adversarial example: sup % > 0 ∀)* ∈ Ball/ ) , 1 )* = 1 ) } Robust Error: For a test set, 4, and bound, %5 : | ) ∈ 4, RobustRegion ) < %5 } | 4|

Slide 45

Slide 45 text

44 Scalability Formal Verification MILP solver (MIPVerify) SMT solver (Reluplex) Interval analysis (Reluval) robust error Heuristic Defenses distillation (Papernot et al., 2016) gradient obfuscation adversarial retraining (Madry et al., 2017) attack success rate (set of attacks) Certified Robustness CNN-Cert (Boopathy et al., 2018) Dual-LP (Kolter & Wong 2018) Dual-SDP (Raghunathan et al., 2018) bound Evaluation Metric precise feature squeezing

Slide 46

Slide 46 text

45 Theory “Practice” Reality Distributional assumptions Toy, arbitrary datasets Malware, Fake News, ... Classification Problems Adversarial Strength !" norm bound !# bound application specific Fake

Slide 47

Slide 47 text

Example: PDF Malware

Slide 48

Slide 48 text

Finding Evasive Malware 47 Given seed sample, !, with desired malicious behavior find an adversarial example !" that satisfies: # !" = “&'()*(” Model misclassifies ℬ !′) = ℬ(! Malicious behavior preserved Generic attack: heuristically explore input space for !′ that satisfies definition. No requirement that ! ~ !′ except through ℬ.

Slide 49

Slide 49 text

PDF Malware Classifiers Random Forest Random Forest Support Vector Machine Features Object counts, lengths, positions, … Object structural paths Very robust against “strongest conceivable mimicry attack”. Automated Features Manual Features PDFrate [ACSA 2012] Hidost16 [JIS 2016] Hidost13 [NDSS 2013]

Slide 50

Slide 50 text

Variants Evolutionary Search Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Benign Oracle Weilin Xu Yanjun Qi Fitness Selection Mutant Generation

Slide 51

Slide 51 text

Variants Generating Variants Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Fitness Selection Mutant Generation

Slide 52

Slide 52 text

PDF Structure

Slide 53

Slide 53 text

Variants Generating Variants Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Fitness Selection Mutant Generation

Slide 54

Slide 54 text

Variants Generating Variants Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Found Evasive ? 0 /JavaScript eval(‘…’); /Root /Catalog /Pages Select random node Randomly transform: delete, insert, replace

Slide 55

Slide 55 text

Variants Generating Variants Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants Variants Select Variants Found Evasive? Found Evasive ? Select random node Randomly transform: delete, insert, replace Nodes from Benign PDFs 0 /JavaScript eval(‘…’); /Root /Catalog /Pages 128 546 7 63 128

Slide 56

Slide 56 text

Variants Selecting Promising Variants Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Fitness Selection Mutant Generation

Slide 57

Slide 57 text

Variants Selecting Promising Variants Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Fitness Function Candidate Variant !(#$%&'() , #'(&++ ) Score Malicious 0 /JavaScript eval(‘…’); /Root /Catalog /Pages 128 Oracle Target Classifier

Slide 58

Slide 58 text

Oracle: ℬ "′) = ℬ(" ? Execute candidate in vulnerable Adobe Reader in virtual environment Behavioral signature: malicious if signature matches https://github.com/cuckoosandbox Simulated network: INetSim Cuckoo HTTP_URL + HOST extracted from API traces

Slide 59

Slide 59 text

Fitness Function Assumes lost malicious behavior will not be recovered !itness '′ = * 1 − classi!ier_score '3 if ℬ '′) = ℬ(' −∞ otherwise

Slide 60

Slide 60 text

0 100 200 300 400 500 0 100 200 300 Seeds Evaded (out of 500) PDFRate Number of Mutations Hidost

Slide 61

Slide 61 text

0 100 200 300 400 500 0 100 200 300 Seeds Evaded (out of 500) PDFRate Number of Mutations Hidost Simple transformations often worked

Slide 62

Slide 62 text

0 100 200 300 400 500 0 100 200 300 Seeds Evaded (out of 500) PDFRate Number of Mutations Hidost (insert, /Root/Pages/Kids, 3:/Root/Pages/Kids/4/Kids/5/) Works on 162/500 seeds

Slide 63

Slide 63 text

0 100 200 300 400 500 0 100 200 300 Seeds Evaded (out of 500) PDFRate Number of Mutations Hidost Some seeds required complex transformations

Slide 64

Slide 64 text

Malicious Label Threshold Original Malicious Seeds Evading PDFrate Classification Score Malware Seed (sorted by original score) Discovered Evasive Variants

Slide 65

Slide 65 text

Discovered Evasive Variants Malicious Label Threshold Original Malicious Seeds Adjust threshold? Charles Smutz, Angelos Stavrou. When a Tree Falls: Using Diversity in Ensemble Classifiers to Identify Evasion in Malware Detectors. NDSS 2016. Classification Score Malware Seed (sorted by original score)

Slide 66

Slide 66 text

Variants found with threshold = 0.25 Variants found with threshold = 0.50 Adjust threshold? Classification Score Malware Seed (sorted by original score)

Slide 67

Slide 67 text

Variants Hide the Classifier Score? Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Fitness Function Candidate Variant !(#$%&'() , #'(&++ ) Score Malicious 0 /JavaScript eval(‘…’); /Root /Catalog /Pages 128 Oracle Target Classifier

Slide 68

Slide 68 text

Variants Binary Classifier Output is Enough Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Fitness Function Candidate Variant !(#$%&'() , #'(&++ ) Score Malicious 0 /JavaScript eval(‘…’); /Root /Catalog /Pages 128 Oracle Target Classifier ACM CCS 2017

Slide 69

Slide 69 text

Labelled Training Data ML Algorithm Feature Extraction Vectors Deployment Malicious / Benign Operational Data Trained Classifier Training (supervised learning) Retrain Classifier

Slide 70

Slide 70 text

Labelled Training Data ML Algorithm Feature Extraction Vectors Training (supervised learning) Clone 01011001 101 EvadeML Deployment

Slide 71

Slide 71 text

0 100 200 300 400 500 0 200 400 600 800 Seeds Evaded (out of 500) Generations Hidost16 Original classifier: Takes 614 generations to evade all seeds

Slide 72

Slide 72 text

0 100 200 300 400 500 0 200 400 600 800 HidostR1 Seeds Evaded (out of 500) Generations Hidost16

Slide 73

Slide 73 text

0 100 200 300 400 500 0 200 400 600 800 HidostR1 Seeds Evaded (out of 500) Generations Hidost16

Slide 74

Slide 74 text

0 100 200 300 400 500 0 200 400 600 800 HidostR1 HidostR2 Seeds Evaded (out of 500) Generations Hidost16

Slide 75

Slide 75 text

0 100 200 300 400 500 0 200 400 600 800 HidostR1 HidostR2 Seeds Evaded (out of 500) Generations Hidost16

Slide 76

Slide 76 text

0 100 200 300 400 500 0 200 400 600 800 Hidost16 Genome Contagio Benign Hidost16 0.00 0.00 HidostR1 0.78 0.30 HidostR2 0.85 0.53 False Positive Rates HidostR1 Seeds Evaded (out of 500) Generations HidostR2

Slide 77

Slide 77 text

76 Only 8/6987 robust features (Hidost) Robust classifier High false positives /Names /Names /JavaScript /Names /JavaScript /Names /Names /JavaScript /JS /OpenAction /OpenAction /JS /OpenAction /S /Pages

Slide 78

Slide 78 text

Malware Classification Moral To build robust, effective malware classifiers need robust features that are strong signals for malware. 77 If you have features like this – don’t need ML!

Slide 79

Slide 79 text

78 Theory “Practice” “Reality” Distributional assumptions Toy, arbitrary datasets Malware, Fake News, ... Classification Problems Adversarial Strength !" norm bound !# bound application specific Fake

Slide 80

Slide 80 text

Adversarial Examples across Domains 79 Domain Classifier Space “Reality” Space Trojan Wars Judgment of Trojans !(#) = “gift” Physical Reality !∗(#) = invading army Malware Malware Detector !(#) = “benign” Victim’s Execution !∗(#) = malicious behavior Image Classification DNN Classifier !(#) = ) Human Perception !∗(#) = * Next Done Not DL

Slide 81

Slide 81 text

Adversarial Example 80 Prediction Change Definition: An input, !′ ∈ $, is an adversarial example for ! ∈ $, iff ∃!& ∈ Ball ' (!) such that * ! ≠ * !& . Suggested Defense: given an input !∗, see how the model behaves on .(!∗) where .(/) reverses transformations in ∆-space.

Slide 82

Slide 82 text

81 Scalability Formal Verification MILP solver (MIPVerify) SMT solver (Reluplex) Interval analysis (Reluval) robust error Heuristic Defenses distillation (Papernot et al., 2016) gradient obfuscation adversarial retraining (Madry et al., 2017) attack success rate (set of attacks) Certified Robustness CNN-Cert (Boopathy et al., 2018) Dual-LP (Kolter & Wong 2018) Dual-SDP (Raghunathan et al., 2018) bound Evaluation Metric precise feature squeezing

Slide 83

Slide 83 text

Model Model Squeezer 1 Prediction0 Prediction1 "($%&'( , $%&'* , … , $%&', ) Input Adversarial Legitimate Model’ Squeezer k … Predictionk Feature Squeezing Detection Framework Weilin Xu Yanjun Qi

Slide 84

Slide 84 text

Model Model Squeezer 1 Prediction0 Prediction1 "($%&'( , $%&'* , … , $%&', ) Input Adversarial Legitimate Model’ Squeezer k … Predictionk Feature Squeezing Detection Framework Feature Squeezer coalesces similar inputs into one point: • Barely change legitimate inputs. • Destruct adversarial perturbations.

Slide 85

Slide 85 text

Coalescing by Feature Squeezing 84 Metric Space 1: Target Classifier Metric Space 2: “Oracle” Before: find a small perturbation that changes class for classifier, but imperceptible to oracle. Now: change class for both original and squeezed classifier, but imperceptible to oracle.

Slide 86

Slide 86 text

Example Squeezer: Bit Depth Reduction 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 3-bit 1-bit 8-bit Input Output 85 Signal Quantization

Slide 87

Slide 87 text

Example Squeezer: Bit Depth Reduction 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 3-bit 1-bit 8-bit Input Output 86 Signal Quantization Seed 1 1 4 2 2 1 1 1 1 1 CW 2 CW ∞ BIM FGSM

Slide 88

Slide 88 text

Other Potential Squeezers 87 C Xie, et al. Mitigating Adversarial Effects Through Randomization, ICLR 2018. J Buckman, et al. Thermometer Encoding: One Hot Way To Resist Adversarial Examples, ICLR 2018. D Meng and H Chen, MagNet: a Two-Pronged Defense against Adversarial Examples, CCS 2017; A Prakash, et al., Deflecting Adversarial Attacks with Pixel Deflection, CVPR 2018;... Thermometer Encoding (learnable bit depth reduction) Image denoising using autoencoder, wavelet, JPEG, etc. Image resizing ... Spatial Smoothers: median filter, non-local means

Slide 89

Slide 89 text

Other Potential Squeezers 88 C Xie, et al. Mitigating Adversarial Effects Through Randomization, ICLR 2018. J Buckman, et al. Thermometer Encoding: One Hot Way To Resist Adversarial Examples, ICLR 2018. D Meng and H Chen, MagNet: a Two-Pronged Defense against Adversarial Examples, CCS 2017; A Prakash, et al., Deflecting Adversarial Attacks with Pixel Deflection, CVPR 2018;... Thermometer Encoding (learnable bit depth reduction) Image denoising using autoencoder, wavelet, JPEG, etc. Image resizing ... Spatial Smoothers: median filter, non-local means Anish Athalye, Nicholas Carlini, David Wagner. Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples. ICML 2018.

Slide 90

Slide 90 text

“Feature Squeezing” (Vacuous) Conjecture For any distance-limited adversarial method, there exists some feature squeezer that accurately detects its adversarial examples. 89 Intuition: if the perturbation is small (in some simple metric space), there is some squeezer that coalesces original and adversarial example into same sample.

Slide 91

Slide 91 text

Feature Squeezing Detection Model (7-layer CNN) Model Model Bit Depth- 1 Median 2×2 Prediction0 Prediction1 Prediction2 Yes Input Adversarial No Legitimate max '( )* , )( , '( )* , )2 > -

Slide 92

Slide 92 text

Detecting Adversarial Examples Distance between original input and its squeezed version Adversarial inputs (CW attack) Legitimate inputs

Slide 93

Slide 93 text

92 0 200 400 600 800 0.0 0.4 0.8 1.2 1.6 2.0 Number of Examples Legitimate Adversarial Maximum !" distance between original and squeezed input threshold = 0.0029 detection: 98.2%, FP < 4% Training a detector (MNIST) set the detection threshold to keep false positive rate below target

Slide 94

Slide 94 text

ImageNet Configuration Model (MobileNet) Model Model Bit Depth- 5 Median 2×2 Prediction0 Prediction1 Prediction2 Yes Input Adversarial No Legitimate max(() (*+ , {*) , *. , */ }) > 3 Model Non-local Mean Prediction3

Slide 95

Slide 95 text

94 0 20 40 60 80 100 120 140 0.0 0.4 0.8 1.2 1.6 2.0 Legitimate Adversarial Maximum !" distance between original and squeezed input threshold = 1.24 detection: 85%, FP < 5% Training a detector (ImageNet)

Slide 96

Slide 96 text

What about better adversaries? 95

Slide 97

Slide 97 text

Instance Defense-Robustness 96 For an input !, the robust-defended region is the maximum region with no undetected adversarial example: sup % > 0 ∀)* ∈ Ball/ ) , 1 )* = 1 ) ⋁ 45657654(!*)} Defense Failure: For a test set, ;, and bound, %< : | ) ∈ ;, RobustDefendedRegion ) < %< } | ;| Can we verify a defense?

Slide 98

Slide 98 text

Formal Verification of Defense Instance exhaustively test all inputs in ∀"# ∈ Ball( " for correctness or detection Need to transform model into a function amenable to verification

Slide 99

Slide 99 text

Linear Programming !"" #" + !"% #% + ⋯ ≤ (" !%" #" + !%% #% + ⋯ ≤ (% #) ≤ 0 ... Find values of + that minimize linear function under constraints: ," #" + ,% #% + ,- #- + …

Slide 100

Slide 100 text

Encoding a Neural Network Linear Components (! = #$ + &) Convolutional Layer Fully-connected Layer Batch Normalization (in test mode) Non-linear Activation (ReLU, Sigmoid, Softmax) Pooling Layer (max, avg) 99

Slide 101

Slide 101 text

Encode ReLU Mixed Integer Linear Programming adds discrete values to LP ReLU (Rectified Linear Unit ) ! = max(0, )) + ∈ 0, 1 ! ≥ ) ! ≥ 0 ! ≤ ) − 1 1 − + ! ≤ 2+ 1 2 Piecewise Linear

Slide 102

Slide 102 text

Mixed Integer Linear Programming (MILP) Intractable in theory (NP-Complete) Efficient in practice (e.g., Gurobi solver) MIPVerify Vincent Tjeng, Kai Xiao, Russ Tedrake Verify NNs using MILP

Slide 103

Slide 103 text

Encode Feature Squeezers Binary Filter 0.5 1 0 Actual Input: uint8 [0, 1, 2, … 254, 255] 127 / 255 = 0.498 128 / 255 = 0.502 An infeasible gap [0.499, 0.501] Lower semi-continuous

Slide 104

Slide 104 text

Verified L ∞ Robustness Model Test Accuracy Robust Error ε = 0.1 Robust Error with Binary Filter Raghunathan et al. 95.82% 14.36%-30.81% 7.37% Wong & Kolter 98.11% 4.38% 4.25% Ours with binary filter 98.94% 2.66-6.63% - Even without detection, this helps!

Slide 105

Slide 105 text

Encode Detection Mechanism Original version: Simplify for verification: !" ⟶ maximum difference softmax ⟶ multiple piecewise-linear approximate sigmoid score(*) = - * − -(squeeze * ) " where f(x) is softmax output

Slide 106

Slide 106 text

Preliminary Experiments 105 Model (4-layer CNN) Model Bit Depth-1 Yes Input !’ Adversarial No y1 valid max_diff +, , +. > 0 Verification: for a seed !, there is no adversarial input !1 ∈ Ball5 ! for which +. ≠ 7 ! and not detected Adversarially robust retrained [Wong & Kolter] model 1000 test MNIST seeds, 8 = 0.1 (=> ) 970 infeasible (verified no adversarial example) 13 misclassified (original seed) 17 vulnerable Robust error: 0.3% Verification time ~0.2s (compared to 0.8s without binarization)

Slide 107

Slide 107 text

106 Scalability Formal Verification MILP solver (MIPVerify) SMT solver (Reluplex) Interval analysis (Reluval) robust error Heuristic Defenses distillation (Papernot et al., 2016) gradient obfuscation adversarial retraining (Madry et al., 2017) attack success rate (set of attacks) Certified Robustness CNN-Cert (Boopathy et al., 2018) Dual-LP (Kolter & Wong 2018) Dual-SDP (Raghunathan et al., 2018) bound Evaluation Metric precise feature squeezing

Slide 108

Slide 108 text

107 target class Original Model (no robustness training) seed class target class MNIST Model 2 convolutional layers 2 fully-connected layers (100, 10 units) ! = 0.2, '(

Slide 109

Slide 109 text

108 target class Original Model (no robustness training) seed class target class MNIST Model 2 convolutional layers 2 fully-connected layers (100, 10 units) ! = 0.2, '(

Slide 110

Slide 110 text

Training a Robust Network Eric Wong and J. Zico Kolter. Provable defenses against adversarial examples via the convex outer adversarial polytope. ICML 2018. replace loss with differentiable function based on outer bound using dual network ReLU (Rectified Linear Unit ) linear approximation ! "

Slide 111

Slide 111 text

110 seed class target class Standard Robustness Training (overall robustness goal) MNIST Model 2 convolutional layers 2 fully-connected layers (100, 10 units) ! = 0.2, '(

Slide 112

Slide 112 text

Cost-Sensitive Robustness Training 111 Xiao Zhang Cost-matrix: cost of different adversarial transformations ! = − 0 1 − benign malware benign malware Incorporate a cost-matrix into robustness training

Slide 113

Slide 113 text

112 seed class target class Standard Robustness Training (overall robustness goal) MNIST Model 2 convolutional layers 2 fully-connected layers (100, 10 units) ! = 0.2, '(

Slide 114

Slide 114 text

113 seed class target class Cost- Sensitive Robustness Training Protect odd classes from evasion

Slide 115

Slide 115 text

114 seed class target class Cost- Sensitive Robustness Training Protect even classes from evasion

Slide 116

Slide 116 text

History of the destruction of Troy, 1498 Conclusion

Slide 117

Slide 117 text

Security State-of-the-Art Attack success probability Threat models Proofs Cryptography !"#!$ information theoretic, resource bounded required System Security !"%! capabilities, motivations, rationality common Adversarial Machine Learning !&; !"#* artificially limited adversary making progress! 116

Slide 118

Slide 118 text

Security State-of-the-Art Attack success probability Threat models Proofs Cryptography !"#!$ information theoretic, resource bounded required System Security !"%! capabilities, motivations, rationality common Adversarial Machine Learning !&; !"#* artificially limited adversary making progress! 117 Huge gaps to close: threat models are unrealistic (but real threats unclear) verification techniques only work for tiny models experimental defenses often (quickly) broken

Slide 119

Slide 119 text

David Evans University of Virginia evans@virginia.edu EvadeML.org Weilin Xu Yanjun Qi Funding: NSF, Intel, Baidu Xiao Zhang Center for Trustworthy Machine Learning

Slide 120

Slide 120 text

David Evans University of Virginia evans@virginia.edu EvadeML.org