3 “Unfortunately, our translation systems made an error last week that misinterpreted what this individual posted. Even though our translations are getting better each day, mistakes like these might happen from time to time and we’ve taken steps to address this particular issue. We apologize to him and his family for the mistake and the disruption this caused.”
Risks from Artificial Intelligence 7 Benign developers and operators AI out of control AI inadvertently causes harm Malicious operators Build AI to do harm Malicious abuse of benign AI On Robots Joe Berger and Pascal Wyse (The Guardian, 21 July 2018)
Risks from Artificial Intelligence Benign developers and operators AI out of control AI inadvertently causes harm Malicious operators Build AI to do harm Malicious abuse of benign AI systems 8
Labelled Training Data ML Algorithm Feature Extraction Vectors Deployment Malicious / Benign Operational Data Trained Classifier Training (supervised learning) Statistical Machine Learning
Labelled Training Data ML Algorithm Feature Extraction Vectors Deployment Malicious / Benign Operational Data Trained Classifier Training (supervised learning) Assumption: Training Data is Representative
More Ambition 15 “The human race will have a new kind of instrument which will increase the power of the mind much more than optical lenses strengthen the eyes and which will be as far superior to microscopes or telescopes as reason is superior to sight.”
More Ambition 16 “The human race will have a new kind of instrument which will increase the power of the mind much more than optical lenses strengthen the eyes and which will be as far superior to microscopes or telescopes as reason is superior to sight.” Gottfried Wilhelm Leibniz (1679)
17 Gottfried Wilhelm Leibniz (Universitat Altdorf, 1666) who advised: Jacob Bernoulli (Universitdt Basel, 1684) who advised: Johann Bernoulli (Universitdt Basel, 1694) who advised: Leonhard Euler (Universitat Basel, 1726) who advised: Joseph Louis Lagrange who advised: Simeon Denis Poisson who advised: Michel Chasles (Ecole Polytechnique, 1814) who advised: H. A. (Hubert Anson) Newton (Yale, 1850) who advised: E. H. Moore (Yale, 1885) who advised: Oswald Veblen (U. of Chicago, 1903) who advised: Philip Franklin (Princeton 1921) who advised: Alan Perlis (MIT Math PhD 1950) who advised: Jerry Feldman (CMU Math 1966) who advised: Jim Horning (Stanford CS PhD 1969) who advised: John Guttag (U. of Toronto CS PhD 1975) who advised: David Evans (MIT CS PhD 2000) my academic great- great-great-great- great-great-great- great-great-great- great-great-great- great-great- grandparent!
More Precision 18 “The human race will have a new kind of instrument which will increase the power of the mind much more than optical lenses strengthen the eyes and which will be as far superior to microscopes or telescopes as reason is superior to sight.” Gottfried Wilhelm Leibniz (1679) Normal computing amplifies (quadrillions of times faster) and aggregates (enables millions of humans to work together) human cognitive abilities; AI goes beyond what humans can do.
Operational Definition “Artificial Intelligence” means making computers do things their programmers don’t understand well enough to program explicitly. 19 If it is explainable, its not ML!
Inherent Paradox of “Trustworthy” ML 20 “Artificial Intelligence” means making computers do things their programmers don’t understand well enough to program explicitly. If we could specify precisely what the model should do, we wouldn’t need ML to do it!
Inherent Paradox of “Trustworthy” ML 21 If we could specify precisely what the model should do, we wouldn’t need ML to do it! Best we hope for is verifying certain properties M1 M2 ∀": $% " = $' (") DeepXplore: Automated Whitebox Testing of Deep Learning Systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana. SOSP 2017 Model Similarity
Inherent Paradox of “Trustworthy” ML 22 Best we hope for is verifying certain properties M 1 M 2 ∀" ∈ $: &' " ≈ &) (") DeepXplore: Automated Whitebox Testing of Deep Learning Systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana. SOSP 2017 Model Similarity M ∀" ∈ $, ∀∆ ∈ .: & " ≈ &(" + ∆) " " + ∆ Model Robustness 0 M 0∗
Adversarial Robustness 24 M ∀" ∈ $, ∀∆ ∈ ': ) " ≈ )(" + ∆) " " + ∆ . M .∗ Adversary’s Goal: find a “small” perturbation that changes model output targeted attack: in some desired way Defender’s Goal: Robust Model: find model where this is hard Detection: detect inputs that are adversarial
Not a new problem... 25 Or do you think any Greek gift’s free of treachery? Is that Ulysses’s reputation? Either there are Greeks in hiding, concealed by the wood, or it’s been built as a machine to use against our walls, or spy on our homes, or fall on the city from above, or it hides some other trick: Trojans, don’t trust this horse. Whatever it is, I’m afraid of Greeks even those bearing gifts.’ Virgil, The Aenid (Book II)
Adversarial Examples for DNNs 26 0.007 × [&'()*] + = “panda” “gibbon” Example from: Ian J. Goodfellow, Jonathon Shlens, Christian Szegedy. Explaining and Harnessing Adversarial Examples. 2014 (in ICLR 2015)
Impact of Adversarial Perturbations 27 Distance between layer output and its output for original seed FGSM ! = 0.0245 CIFAR-10 DenseNet 95th percentile 5th percentile
Impact of Adversarial Perturbations 28 Distance between layer output and its output for original seed Random noise (same amount) FGSM ! = 0.0245 CIFAR-10 DenseNet
Impact of Adversarial Perturbations 29 Distance between layer output and its output for original seed Random noise (same amount) Carlini- Wagner L2 CIFAR-10 DenseNet
Adversarial Example 34 Ball$ (&) is some space around &, typically defined in some (simple!) metric space: () norm (# different), (* norm (“Euclidean distance”), (+ Without constraints on Ball$ , every input has adversarial examples. Prediction Change Definition: An input, &′ ∈ /, is an adversarial example for & ∈ /, iff ∃&1 ∈ Ball$ (&) such that 2 & ≠ 2 &1 .
Adversarial Example 35 Any non-trivial model has adversarial examples: ∃"# , "% ∈ '. ) "# ≠ )("% ) Prediction Change Definition: An input, -′ ∈ ', is an adversarial example for - ∈ ', iff ∃-/ ∈ Ball3 (-) such that ) - ≠ ) -/ .
Prediction Error Robustness 36 Error Robustness: An input, !′ ∈ $, is an adversarial example for (correct) ! ∈ $, iff ∃!& ∈ Ball* (!) such that - !′ ≠ true label for !′. Perfect classifier has no (error robustness) adversarial examples.
Prediction Error Robustness 37 Error Robustness: An input, !′ ∈ $, is an adversarial example for (correct) ! ∈ $, iff ∃!& ∈ Ball* (!) such that - !′ ≠ true label for !′. Perfect classifier has no (error robustness) adversarial examples. If we have a way to know this, don’t need an ML classifier.
Global Robustness Properties 38 Adversarial Risk: probability an input has an adversarial example Pr # ← % [∃ () ∈ +,--. ( . 0 () ≠ class (′ ] Dimitrios I. Diochnos, Saeed Mahloujifar, Mohammad Mahmoody, NeurIPS 2018
Assumption Key Result Adversarial Spheres [Gilmer et al., 2018] Uniform distribution on two concentric !-spheres Expected safe distance ("# -norm) is relatively small. Adversarial vulnerability for any classifier [Fawzi × 3, 2018] Smooth generative model: 1. Gaussian in latent space. 2. Generator is L-Lipschitz. Adversarial risk ⟶ 1 for relatively small attack strength ("# -norm). Curse of Concentration in Robust Learning [Mahloujifar et al., 2018] Normal Lévy families • Unit sphere, uniform, "# norm • Boolean hypercube, uniform, Hamming distance ... If attack strength exceeds a relatively small threshold, adversarial risk > 1/2. b > p log(k1/") p k2 · n ! Riskb(h, c) 1/2 Recent Global Robustness Results P(r(x) ⌘) 1 r ⇡ 2 e ⌘2/2L2 Properties of any model for input space: distance to AE is small relative to expected distance between two sampled points
Local (Instance) Robustness 42 Robust Region: For an input !, the robust region is the maximum region with no adversarial example: sup % > 0 ∀)* ∈ Ball/ ) , 1 )* = 1 ) }
Local (Instance) Robustness 43 Robust Region: For an input !, the robust region is the maximum region with no adversarial example: sup % > 0 ∀)* ∈ Ball/ ) , 1 )* = 1 ) } Robust Error: For a test set, 4, and bound, %5 : | ) ∈ 4, RobustRegion ) < %5 } | 4|
Finding Evasive Malware 47 Given seed sample, !, with desired malicious behavior find an adversarial example !" that satisfies: # !" = “&'()*(” Model misclassifies ℬ !′) = ℬ(! Malicious behavior preserved Generic attack: heuristically explore input space for !′ that satisfies definition. No requirement that ! ~ !′ except through ℬ.
PDF Malware Classifiers Random Forest Random Forest Support Vector Machine Features Object counts, lengths, positions, … Object structural paths Very robust against “strongest conceivable mimicry attack”. Automated Features Manual Features PDFrate [ACSA 2012] Hidost16 [JIS 2016] Hidost13 [NDSS 2013]
Discovered Evasive Variants Malicious Label Threshold Original Malicious Seeds Adjust threshold? Charles Smutz, Angelos Stavrou. When a Tree Falls: Using Diversity in Ensemble Classifiers to Identify Evasion in Malware Detectors. NDSS 2016. Classification Score Malware Seed (sorted by original score)
Variants found with threshold = 0.25 Variants found with threshold = 0.50 Adjust threshold? Classification Score Malware Seed (sorted by original score)
Labelled Training Data ML Algorithm Feature Extraction Vectors Deployment Malicious / Benign Operational Data Trained Classifier Training (supervised learning) Retrain Classifier
Malware Classification Moral To build robust, effective malware classifiers need robust features that are strong signals for malware. 77 If you have features like this – don’t need ML!
Adversarial Example 80 Prediction Change Definition: An input, !′ ∈ $, is an adversarial example for ! ∈ $, iff ∃!& ∈ Ball ' (!) such that * ! ≠ * !& . Suggested Defense: given an input !∗, see how the model behaves on .(!∗) where .(/) reverses transformations in ∆-space.
Coalescing by Feature Squeezing 84 Metric Space 1: Target Classifier Metric Space 2: “Oracle” Before: find a small perturbation that changes class for classifier, but imperceptible to oracle. Now: change class for both original and squeezed classifier, but imperceptible to oracle.
Other Potential Squeezers 87 C Xie, et al. Mitigating Adversarial Effects Through Randomization, ICLR 2018. J Buckman, et al. Thermometer Encoding: One Hot Way To Resist Adversarial Examples, ICLR 2018. D Meng and H Chen, MagNet: a Two-Pronged Defense against Adversarial Examples, CCS 2017; A Prakash, et al., Deflecting Adversarial Attacks with Pixel Deflection, CVPR 2018;... Thermometer Encoding (learnable bit depth reduction) Image denoising using autoencoder, wavelet, JPEG, etc. Image resizing ... Spatial Smoothers: median filter, non-local means
Other Potential Squeezers 88 C Xie, et al. Mitigating Adversarial Effects Through Randomization, ICLR 2018. J Buckman, et al. Thermometer Encoding: One Hot Way To Resist Adversarial Examples, ICLR 2018. D Meng and H Chen, MagNet: a Two-Pronged Defense against Adversarial Examples, CCS 2017; A Prakash, et al., Deflecting Adversarial Attacks with Pixel Deflection, CVPR 2018;... Thermometer Encoding (learnable bit depth reduction) Image denoising using autoencoder, wavelet, JPEG, etc. Image resizing ... Spatial Smoothers: median filter, non-local means Anish Athalye, Nicholas Carlini, David Wagner. Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples. ICML 2018.
“Feature Squeezing” (Vacuous) Conjecture For any distance-limited adversarial method, there exists some feature squeezer that accurately detects its adversarial examples. 89 Intuition: if the perturbation is small (in some simple metric space), there is some squeezer that coalesces original and adversarial example into same sample.
Feature Squeezing Detection Model (7-layer CNN) Model Model Bit Depth- 1 Median 2×2 Prediction0 Prediction1 Prediction2 Yes Input Adversarial No Legitimate max '( )* , )( , '( )* , )2 > -
92 0 200 400 600 800 0.0 0.4 0.8 1.2 1.6 2.0 Number of Examples Legitimate Adversarial Maximum !" distance between original and squeezed input threshold = 0.0029 detection: 98.2%, FP < 4% Training a detector (MNIST) set the detection threshold to keep false positive rate below target
ImageNet Configuration Model (MobileNet) Model Model Bit Depth- 5 Median 2×2 Prediction0 Prediction1 Prediction2 Yes Input Adversarial No Legitimate max(() (*+ , {*) , *. , */ }) > 3 Model Non-local Mean Prediction3
Instance Defense-Robustness 96 For an input !, the robust-defended region is the maximum region with no undetected adversarial example: sup % > 0 ∀)* ∈ Ball/ ) , 1 )* = 1 ) ⋁ 45657654(!*)} Defense Failure: For a test set, ;, and bound, %< : | ) ∈ ;, RobustDefendedRegion ) < %< } | ;| Can we verify a defense?
Formal Verification of Defense Instance exhaustively test all inputs in ∀"# ∈ Ball( " for correctness or detection Need to transform model into a function amenable to verification
Mixed Integer Linear Programming (MILP) Intractable in theory (NP-Complete) Efficient in practice (e.g., Gurobi solver) MIPVerify Vincent Tjeng, Kai Xiao, Russ Tedrake Verify NNs using MILP
Verified L ∞ Robustness Model Test Accuracy Robust Error ε = 0.1 Robust Error with Binary Filter Raghunathan et al. 95.82% 14.36%-30.81% 7.37% Wong & Kolter 98.11% 4.38% 4.25% Ours with binary filter 98.94% 2.66-6.63% - Even without detection, this helps!
Preliminary Experiments 105 Model (4-layer CNN) Model Bit Depth-1 Yes Input !’ Adversarial No y1 valid max_diff +, , +. > 0 Verification: for a seed !, there is no adversarial input !1 ∈ Ball5 ! for which +. ≠ 7 ! and not detected Adversarially robust retrained [Wong & Kolter] model 1000 test MNIST seeds, 8 = 0.1 (=> ) 970 infeasible (verified no adversarial example) 13 misclassified (original seed) 17 vulnerable Robust error: 0.3% Verification time ~0.2s (compared to 0.8s without binarization)
107 target class Original Model (no robustness training) seed class target class MNIST Model 2 convolutional layers 2 fully-connected layers (100, 10 units) ! = 0.2, '(
108 target class Original Model (no robustness training) seed class target class MNIST Model 2 convolutional layers 2 fully-connected layers (100, 10 units) ! = 0.2, '(
Training a Robust Network Eric Wong and J. Zico Kolter. Provable defenses against adversarial examples via the convex outer adversarial polytope. ICML 2018. replace loss with differentiable function based on outer bound using dual network ReLU (Rectified Linear Unit ) linear approximation ! "
Cost-Sensitive Robustness Training 111 Xiao Zhang Cost-matrix: cost of different adversarial transformations ! = − 0 1 − benign malware benign malware Incorporate a cost-matrix into robustness training
Security State-of-the-Art Attack success probability Threat models Proofs Cryptography !"#!$ information theoretic, resource bounded required System Security !"%! capabilities, motivations, rationality common Adversarial Machine Learning !&; !"#* artificially limited adversary making progress! 117 Huge gaps to close: threat models are unrealistic (but real threats unclear) verification techniques only work for tiny models experimental defenses often (quickly) broken
David Evans University of Virginia [email protected] EvadeML.org Weilin Xu Yanjun Qi Funding: NSF, Intel, Baidu Xiao Zhang Center for Trustworthy Machine Learning