19th International School on Foundations of Security Analysis and Design
Mini-course on "Trustworthy Machine Learning"
https://jeffersonswheel.org/fosad2019 David Evans
Trustworthy Machine Learning David Evans University of Virginia jeffersonswheel.org Bertinoro, Italy 26 August 2019 19th International School on Foundations of Security Analysis and Design 1: Introduction/Attacks
Plan for the Course Monday (Today) Introduction ML Background Attacks Tuesday (Tomorrow) Defenses Wednesday Privacy, Fairness, Abuse 1 Overall Goals: broad and whirlwind survey* of an exciting emerging research area explain a few of my favorite research results in enough detail to understand them at a high-level introduce some open problems that I hope you will work on and solve * but highly biased by my own interests
3 “Unfortunately, our translation systems made an error last week that misinterpreted what this individual posted. Even though our translations are getting better each day, mistakes like these might happen from time to time and we’ve taken steps to address this particular issue. We apologize to him and his family for the mistake and the disruption this caused.”
Risks from Artificial Intelligence 6 Benign developers and operators AI out of control AI inadvertently causes harm Malicious operators Build AI to do harm Malicious abuse of benign AI On Robots Joe Berger and Pascal Wyse (The Guardian, 21 July 2018)
Harmful AI Benign developers AI out of control AI causes harm (without creators objecting) Malicious developers Using AI to do harm 14 Malice is (often) in the eye of the beholder (e.g., mass surveillance, pop-up ads, etc.)
Automated Spear Phishing 15 “It’s slightly less effective [than manually generated] but it’s dramatically more efficient” (John Seymour) More malicious use of AI in 3rd lecture?
Risks from Artificial Intelligence Benign developers and operators AI out of control AI inadvertently causes harm Malicious operators Build AI to do harm Malicious abuse of benign AI systems 16 rest of today and tomorrow
More Ambition 19 “The human race will have a new kind of instrument which will increase the power of the mind much more than optical lenses strengthen the eyes and which will be as far superior to microscopes or telescopes as reason is superior to sight.”
21 Gottfried Wilhelm Leibniz (Universitat Altdorf, 1666) who advised: Jacob Bernoulli (Universitdt Basel, 1684) who advised: Johann Bernoulli (Universitdt Basel, 1694) who advised: Leonhard Euler (Universitat Basel, 1726) who advised: Joseph Louis Lagrange who advised: Simeon Denis Poisson who advised: Michel Chasles (Ecole Polytechnique, 1814) who advised: H. A. (Hubert Anson) Newton (Yale, 1850) who advised: E. H. Moore (Yale, 1885) who advised: Oswald Veblen (U. of Chicago, 1903) who advised: Philip Franklin (Princeton 1921) who advised: Alan Perlis (MIT Math PhD 1950) who advised: Jerry Feldman (CMU Math 1966) who advised: Jim Horning (Stanford CS PhD 1969) who advised: John Guttag (U. of Toronto CS PhD 1975) who advised: David Evans (MIT CS PhD 2000) my academic great- great-great-great- great-great-great- great-great-great- great-great-great- great-great- grandparent!
More Precision 22 “The human race will have a new kind of instrument which will increase the power of the mind much more than optical lenses strengthen the eyes and which will be as far superior to microscopes or telescopes as reason is superior to sight.” Gottfried Wilhelm Leibniz (1679) Normal computing amplifies (quadrillions of times faster) and aggregates (enables millions of humans to work together) human cognitive abilities; AI goes beyond what humans can do.
Operational Definition 23 If it is explainable, its not ML! “Artificial Intelligence” means making computers do things their programmers don’t understand well enough to program explicitly.
Inherent Paradox of “Trustworthy” ML 24 If we could specify precisely what the model should do, we wouldn’t need ML to do it! “Artificial Intelligence” means making computers do things their programmers don’t understand well enough to program explicitly.
Inherent Paradox of “Trustworthy” ML 25 If we could specify precisely what the model should do, we wouldn’t need ML to do it! Best we hope for is verifying certain properties M 1 M 2 ∀": $% " = $' (") DeepXplore: Automated Whitebox Testing of Deep Learning Systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana. SOSP 2017 Model Similarity
Inherent Paradox of “Trustworthy” ML 26 Best we hope for is verifying certain properties M 1 M 2 ∀" ∈ $: &' " ≈ &) (") DeepXplore: Automated Whitebox Testing of Deep Learning Systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana. SOSP 2017 Model Similarity M ∀" ∈ $, ∀∆ ∈ .: & " ≈ &(" + ∆) " " + ∆ Model Robustness 0 M 0∗
Adversarial Robustness 27 M ∀" ∈ $, ∀∆ ∈ ': ) " ≈ )(" + ∆) " " + ∆ . M .∗ Adversary’s Goal: find a “small” perturbation that changes model output targeted attack: in some desired way Defender’s Goal: Robust Model: find model where this is hard Detection: detect inputs that are adversarial
Not a new problem... 28 Or do you think any Greek gift’s free of treachery? Is that Ulysses’s reputation? Either there are Greeks in hiding, concealed by the wood, or it’s been built as a machine to use against our walls, or spy on our homes, or fall on the city from above, or it hides some other trick: Trojans, don’t trust this horse. Whatever it is, I’m afraid of Greeks even those bearing gifts.’ Virgil, The Aenid (Book II)
Progress in MNIST 44 Year Error Rate 1998 [Yann LeCun, et al.] 5% error rate (12.1% rejection for 1% error rate) 2013 [..., Yann Le Cun, ...] 0.21% (21 out of 10,000 tests)
48 Image from Deep Residual Learning for Image Recognition, Kaiming He Xiangyu Zhang Shaoqing Ren Jian Sun, 2015 Test error Training error Accuracy on CIFAR-10
Inception 49 https://arxiv.org/pdf/1905.11946.pdf Image from Mingxing Tan, Quoc V. Le. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. ICML 2019.
Training a Network 52 select a network architecture, ! " ← initialize with random parameters while (still improving): " ← adjust parameters(!, ", &, ')
Training a Network 54 select a network architecture, ! " ← initialize with random parameters while (still improving): " ← adjust parameters(!, ", &, ')
Training a Network 55 select a network architecture, ! " ← initialize with random parameters while ($%&&(!( , *, +) > goal and funding > 0): " ← adjust parameters(!, ", *, +)
while (available_students > 0 and funding > 0): Finding a Good Architecture 56 select a network architecture, ! " ← initialize with random parameters while ($%&&(!( , *, +) > goal and funding > 0): " ← adjust parameters(!, ", *, +)
! Gradient Descent: Non-Convex Loss 59 ℒ#,% (!) Pick a random starting point Follow gradient (first derivative): to minimize, negative direction ℒ′#,% (!) !) = !)+, − . / ∇ℒ#,% (!)+, ) Repeat many times, hopefully find global minimum
! Mini-Batch Stochastic Gradient Descent 60 ℒ#,% (!) Pick a random starting point Follow gradient (first derivative): to minimize, negative direction ℒ′#,% (!) !) = !)+, − . / ∇ℒ#,% (!)+, ) Repeat many times, hopefully find global minimum To reduce computation, evaluate gradient of loss on randomly selected subset (“mini-batch”)
Labelled Training Data ML Algorithm Feature Extraction Vectors Deployment Malicious / Benign Operational Data Trained Classifier Training (supervised learning) Statistical Machine Learning
Labelled Training Data ML Algorithm Feature Extraction Vectors Deployment Malicious / Benign Operational Data Trained Classifier Training (supervised learning) Assumption: Training Data is Representative
Adversarial Examples for DNNs 69 0.007 × [&'()*] + = “panda” “gibbon” Example from: Ian J. Goodfellow, Jonathon Shlens, Christian Szegedy. Explaining and Harnessing Adversarial Examples. 2014 (in ICLR 2015)
Defining Adversarial Example 74 Assumption: small perturbation does not change class in “Reality Space” (human perception) Given seed sample, !, !" is an adversarial example iff: # !" = % Class is % (targeted) or # !" ≠ #(!) Class is different (untargeted) ∆ !, !" ≤ , Similar to seed ! Difference below threshold
78 Weilin Xu et al. “Magic Tricks for Self- driving Cars”, Defcon-CAAD, 2018. Benign Malignant Melanoma Diagnosis Samuel G Finlayson et al. “Adversarial attacks on medical machine learning”, Science, 2019. Mahmood Sharif et al. “Accessorize to a Crime: Real and Stealthy Attacks on State-of-the-Art Face Recognition”, ACM CCS, 2016. =
Natural Language 79 Examples by Hannah Chen Prediction: Positive (Confidence = 99.22) IMDB Movie Review Dataset Hilarious film, I had a great time watching it. The star (Cuneyt Arkin, sometimes credited as Steve Arkin) is a popular actor from Turkey. He has played in lots of tough-guy roles, epic-sword films, and romances. It was fun to see him with an international cast and some real lousy looking pair of gloves. If I remember it was also dubbed in English which made things even more funnier. (kinda like seeing John Wayne speak Turkish).
Natural Language 80 Examples by Hannah Chen Prediction: Positive (Confidence = 91.06) IMDB Movie Review Dataset Hilarious film, I had a great time watching it. The star (Cuneyt Arkin, sometimes credited as Steve Arkin) is a popular actor from Turkey. He has played in lots of tough-guy roles, epic-sword films, and romances. It was fun to see him with an international cast and some real lousy looking pair of gloves. If I remember it was also dubbed in English which made things even more funnier. (kinda like seeing John Wayne speak Turkish). movies Target: Negative (Confidence = 8.94)
Natural Language 81 Examples by Hannah Chen Prediction: Positive (Confidence = 92.28) IMDB Movie Review Dataset Hilarious film, I had a great time watching it. The star (Cuneyt Arkin, sometimes credited as Steve Arkin) is a popular actor from Turkey. He has played in lots of tough-guy roles, epic-sword films, and romances. It was fun to see him with an international cast and some real lousy looking pair of gloves. If I remember it was also dubbed in English which made things even more funnier. (kinda like seeing John Wayne speak Turkish). researching Target: Negative (Confidence = 7.72)
Natural Language 82 Examples by Hannah Chen Prediction: Negative (Confidence = 73.33) IMDB Movie Review Dataset Hilarious film, I had a great time watching it. The star (Cuneyt Arkin, sometimes credited as Steve Arkin) is a popular actor from Turkey. He has played in lots of tough-guy roles, epic-sword films, and romances. It was fun to see him with an international cast and some real lousy looking pair of gloves. If I remember it was also dubbed in English which made things even more funnier. (kinda like seeing John Wayne speak Turkish). researching Target: Negative movies
Defining Adversarial Example 83 Given seed sample, !, !" is an adversarial example iff: # !" = % Class is % (targeted) or # !" ≠ #(!) Class is different (untargeted) ∆ !, !" ≤ , Similar to seed ! Difference below threshold ∆ -, -" is defined in some (simple!) metric space.
Other Distance Metrics 88 Set of transformations: rotate, scale, “fog”, color, etc. NLP: word substitutions (synonym constraints) Semantic distance: ℬ "′) = ℬ(" Behavior we care about is the same Malware: it still behaves maliciously Vision: still looks like a “cat” to most humans We’ll get back to these...for now, let’s assume '( norms (like most research) despite flaws.
Impact of Adversarial Perturbations 93 Distance between layer output and its output for original seed FGSM ! = 0.0245 CIFAR-10 DenseNet 95th percentile 5th percentile
Impact of Adversarial Perturbations 94 Distance between layer output and its output for original seed Random noise (same amount) FGSM ! = 0.0245 CIFAR-10 DenseNet
Carlini/Wagner 97 min $ (∆(', ' + *) + , ⋅ .(' + *)) such that (/ + *) ∈ 0, 1 3 Formulate optimization problem where . is defined objective function: . /4 ≥ 0 iff 7 /4 = 9 model output matches target Nicholas Carlini, David Wagner IEEE S&P 2017 Optimization problem that can be solved by standard optimizers Adam (SGD + momentum) [Kingman, Ba 2015]
Carlini/Wagner 98 Formulate optimization problem where ! is defined objective function: ! "# ≥ 0 iff ( "# = * model output matches target ! "# = max . /0 Z "# . − Z "# 0 ( 3 = ( 4 ( 5 67 … ( 9 ( 7 ((3) Z(") Nicholas Carlini, David Wagner IEEE S&P 2017 min = (∆(3, 3 + A) + B ⋅ !(3 + A)) such that (" + A) ∈ 0, 1 F softmax
Impact of Adversarial Perturbations 102 Distance between layer output and its output for original seed Random noise (same amount) Carlini- Wagner L2 CIFAR-10 DenseNet
Finding Evasive Malware 105 Given seed sample, !, with desired malicious behavior find an adversarial example !" that satisfies: # !" = “&'()*(” Model misclassifies ℬ !′) = ℬ(! Malicious behavior preserved Generic attack: heuristically explore input space for !′ that satisfies definition. No requirement that ! ~ !′ except through ℬ.
PDF Malware Classifiers Random Forest Random Forest Support Vector Machine Features Object counts, lengths, positions, … Object structural paths Very robust against “strongest conceivable mimicry attack”. Automated Features Manual Features PDFrate [ACSA 2012] Hidost16 [JIS 2016] Hidost13 [NDSS 2013]
Discovered Evasive Variants Malicious Label Threshold Original Malicious Seeds Adjust threshold? Charles Smutz, Angelos Stavrou. When a Tree Falls: Using Diversity in Ensemble Classifiers to Identify Evasion in Malware Detectors. NDSS 2016. Classification Score Malware Seed (sorted by original score)
Variants found with threshold = 0.25 Variants found with threshold = 0.50 Adjust threshold? Classification Score Malware Seed (sorted by original score)
Labelled Training Data ML Algorithm Feature Extraction Vectors Deployment Malicious / Benign Operational Data Trained Classifier Training (supervised learning) Retrain Classifier
Malware Classification Moral To build robust, effective malware classifiers need robust features that are strong signals for malware. 136 If you have features like this – don’t need ML! There are scenarios where adversarial training “works” [more tomorrow].