Slide 1

Slide 1 text

Trustworthy Machine Learning David Evans University of Virginia jeffersonswheel.org Bertinoro, Italy 26 August 2019 19th International School on Foundations of Security Analysis and Design 1: Introduction/Attacks

Slide 2

Slide 2 text

Plan for the Course Monday (Today) Introduction ML Background Attacks Tuesday (Tomorrow) Defenses Wednesday Privacy, Fairness, Abuse 1 Overall Goals: broad and whirlwind survey* of an exciting emerging research area explain a few of my favorite research results in enough detail to understand them at a high-level introduce some open problems that I hope you will work on and solve * but highly biased by my own interests

Slide 3

Slide 3 text

2 Why should we care about Trustworthy Machine Learning?

Slide 4

Slide 4 text

3 “Unfortunately, our translation systems made an error last week that misinterpreted what this individual posted. Even though our translations are getting better each day, mistakes like these might happen from time to time and we’ve taken steps to address this particular issue. We apologize to him and his family for the mistake and the disruption this caused.”

Slide 5

Slide 5 text

4

Slide 6

Slide 6 text

Amazon Employment 5

Slide 7

Slide 7 text

Risks from Artificial Intelligence 6 Benign developers and operators AI out of control AI inadvertently causes harm Malicious operators Build AI to do harm Malicious abuse of benign AI On Robots Joe Berger and Pascal Wyse (The Guardian, 21 July 2018)

Slide 8

Slide 8 text

Harmful AI Benign developers and operators AI out of control AI causes harm (without creators objecting) Malicious operators Build AI to do harm 7

Slide 9

Slide 9 text

Out-of-Control AI 8 HAL, 2001: A Space Odyssey SkyNet, The Terminator

Slide 10

Slide 10 text

Alignment Problem 9 Bostrom’s Paperclip Maximizer

Slide 11

Slide 11 text

Harmful AI Benign developers and operators AI out of control AI inadvertently causes harm to humanity Malicious operators Build AI to do harm 10

Slide 12

Slide 12 text

Lost Jobs and Dignity 11

Slide 13

Slide 13 text

12 On Robots Joe Berger and Pascal Wyse (The Guardian, 21 July 2018) Human Jobs of the Future

Slide 14

Slide 14 text

Inadvertent Bias and Discrimination 13 3rd lecture

Slide 15

Slide 15 text

Harmful AI Benign developers AI out of control AI causes harm (without creators objecting) Malicious developers Using AI to do harm 14 Malice is (often) in the eye of the beholder (e.g., mass surveillance, pop-up ads, etc.)

Slide 16

Slide 16 text

Automated Spear Phishing 15 “It’s slightly less effective [than manually generated] but it’s dramatically more efficient” (John Seymour) More malicious use of AI in 3rd lecture?

Slide 17

Slide 17 text

Risks from Artificial Intelligence Benign developers and operators AI out of control AI inadvertently causes harm Malicious operators Build AI to do harm Malicious abuse of benign AI systems 16 rest of today and tomorrow

Slide 18

Slide 18 text

Crash Course in Machine Learning 17

Slide 19

Slide 19 text

18

Slide 20

Slide 20 text

More Ambition 19 “The human race will have a new kind of instrument which will increase the power of the mind much more than optical lenses strengthen the eyes and which will be as far superior to microscopes or telescopes as reason is superior to sight.”

Slide 21

Slide 21 text

More Ambition 20 Gottfried Wilhelm Leibniz (1679)

Slide 22

Slide 22 text

21 Gottfried Wilhelm Leibniz (Universitat Altdorf, 1666) who advised: Jacob Bernoulli (Universitdt Basel, 1684) who advised: Johann Bernoulli (Universitdt Basel, 1694) who advised: Leonhard Euler (Universitat Basel, 1726) who advised: Joseph Louis Lagrange who advised: Simeon Denis Poisson who advised: Michel Chasles (Ecole Polytechnique, 1814) who advised: H. A. (Hubert Anson) Newton (Yale, 1850) who advised: E. H. Moore (Yale, 1885) who advised: Oswald Veblen (U. of Chicago, 1903) who advised: Philip Franklin (Princeton 1921) who advised: Alan Perlis (MIT Math PhD 1950) who advised: Jerry Feldman (CMU Math 1966) who advised: Jim Horning (Stanford CS PhD 1969) who advised: John Guttag (U. of Toronto CS PhD 1975) who advised: David Evans (MIT CS PhD 2000) my academic great- great-great-great- great-great-great- great-great-great- great-great-great- great-great- grandparent!

Slide 23

Slide 23 text

More Precision 22 “The human race will have a new kind of instrument which will increase the power of the mind much more than optical lenses strengthen the eyes and which will be as far superior to microscopes or telescopes as reason is superior to sight.” Gottfried Wilhelm Leibniz (1679) Normal computing amplifies (quadrillions of times faster) and aggregates (enables millions of humans to work together) human cognitive abilities; AI goes beyond what humans can do.

Slide 24

Slide 24 text

Operational Definition 23 If it is explainable, its not ML! “Artificial Intelligence” means making computers do things their programmers don’t understand well enough to program explicitly.

Slide 25

Slide 25 text

Inherent Paradox of “Trustworthy” ML 24 If we could specify precisely what the model should do, we wouldn’t need ML to do it! “Artificial Intelligence” means making computers do things their programmers don’t understand well enough to program explicitly.

Slide 26

Slide 26 text

Inherent Paradox of “Trustworthy” ML 25 If we could specify precisely what the model should do, we wouldn’t need ML to do it! Best we hope for is verifying certain properties M 1 M 2 ∀": $% " = $' (") DeepXplore: Automated Whitebox Testing of Deep Learning Systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana. SOSP 2017 Model Similarity

Slide 27

Slide 27 text

Inherent Paradox of “Trustworthy” ML 26 Best we hope for is verifying certain properties M 1 M 2 ∀" ∈ $: &' " ≈ &) (") DeepXplore: Automated Whitebox Testing of Deep Learning Systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana. SOSP 2017 Model Similarity M ∀" ∈ $, ∀∆ ∈ .: & " ≈ &(" + ∆) " " + ∆ Model Robustness 0 M 0∗

Slide 28

Slide 28 text

Adversarial Robustness 27 M ∀" ∈ $, ∀∆ ∈ ': ) " ≈ )(" + ∆) " " + ∆ . M .∗ Adversary’s Goal: find a “small” perturbation that changes model output targeted attack: in some desired way Defender’s Goal: Robust Model: find model where this is hard Detection: detect inputs that are adversarial

Slide 29

Slide 29 text

Not a new problem... 28 Or do you think any Greek gift’s free of treachery? Is that Ulysses’s reputation? Either there are Greeks in hiding, concealed by the wood, or it’s been built as a machine to use against our walls, or spy on our homes, or fall on the city from above, or it hides some other trick: Trojans, don’t trust this horse. Whatever it is, I’m afraid of Greeks even those bearing gifts.’ Virgil, The Aenid (Book II)

Slide 30

Slide 30 text

Introduction to Deep Learning 29

Slide 31

Slide 31 text

Generic Classifier 30 !: # → Y Input: % ∈ ℝ( Output (label): ) ∈ {1, … , .} Natural distribution: 0 ⊆ %, ) pairs

Slide 32

Slide 32 text

Neural Network 31 ! " = ! $ ! % &' … ! ) ! ' !(") “layer”: ! , : mostly from ℝ. → ℝ0

Slide 33

Slide 33 text

Activation Layer 32 . . . Layer t − 1 . . . #$,& '() * & ' = ,(∑ $.) /(123) # $,& (' ())5 $ ('())) 5 $ ('())

Slide 34

Slide 34 text

Activation Layer 33 . . . Layer ! − 1 . . . $%,' ()* + ' ( = -(∑ %/* 0(234) $ %,' (( )*)6 % (()*)) Activation function 6 % (()*) ReLU: Rectified Linear Unit - 6 = 7 0, 6 < 0 6, 6 ≥ 0

Slide 35

Slide 35 text

“Fancy” Layers: Convolution 34 . . . Layer ! − 1 $ % & = (() * (&,-)) . . . . . . . . . . . . . . . . . . . . . × 0-- ⋯ 02- ⋮ ⋱ ⋮ 0-2 ⋯ 022

Slide 36

Slide 36 text

“Fancy” Layers: Max Pooling 35 Layer ! − 1 $ % & = (() * (&,-)) . . . . . . . . . . . . . . . . . . . . . . . . × 0-- ⋯ 02- ⋮ ⋱ ⋮ 0-2 ⋯ 022

Slide 37

Slide 37 text

“Fancy” Layers: Max Pooling 36 max(%&& , %&( , %(& , %(( ) max(%*& , %*( , %+& , %+( ) max(%,& , %,( , %-& , %-( )

Slide 38

Slide 38 text

Final Layer: SoftMax 37 . . . Layer ! − 1 $%,' ()* + ' ( = -(/( )*) SoftMax function - / = 123 ∑ '5* 6 127 | 9 = 1, … , ; [0.03, 0.32, 0.01, A. BC, 0.00, 0.01] / % (E)*) It’s a “cat” (0.63 confidence).

Slide 39

Slide 39 text

DNNs in 1989 38 Backpropagation Applied to Handwritten Zip Code Recognition. Yann LeCun, et al., 1989.

Slide 40

Slide 40 text

Turing Award in 2018 39 Yann Lecun Geoffrey Hinton Yoshua Bengio AT&T → Facebook/NYU Google/U. Toronto U. Montreal

Slide 41

Slide 41 text

DNNs in 1989 40 Backpropagation Applied to Handwritten Zip Code Recognition. Yann LeCun, et al., 1989.

Slide 42

Slide 42 text

MNIST 41 https://www.usenix.org/conference/usenixsecurity18/presentation/mickens James Mickens’ USENIX Security Symposium 2018 (Keynote) MNIST Dataset

Slide 43

Slide 43 text

MNIST Dataset 42 2 8 7 6 8 6 5 9 70 000 images (60 000 training, 10 000 testing) 28×28 pixels, 8-bit grayscale scanned hand-written digits labeled by humans LeCun, Cortes, Burges [1998]

Slide 44

Slide 44 text

MNIST Dataset 43 2 8 7 6 8 6 5 9 70 000 images (60 000 training, 10 000 testing) 28×28 pixels, 8-bit grayscale scanned hand-written digits labeled by humans LeCun, Cortes, Burges [1998]

Slide 45

Slide 45 text

Progress in MNIST 44 Year Error Rate 1998 [Yann LeCun, et al.] 5% error rate (12.1% rejection for 1% error rate) 2013 [..., Yann Le Cun, ...] 0.21% (21 out of 10,000 tests)

Slide 46

Slide 46 text

CIFAR-10 (and CIFAR-100) 45 truck ship horse frog dog deer cat bird automobile airplane 60 000 images 32×32 pixels, 24-bit color human-labeled subset of images in 10 classes from Tiny Images Dataset Alex Krizhevsky [2009]

Slide 47

Slide 47 text

46 14M high-resolution full color images Manually annotated in WordNet ~20,000 synonym set (~1000 images in each)

Slide 48

Slide 48 text

Example CNN Architectures 47 Image from Deep Residual Learning for Image Recognition, Kaiming He Xiangyu Zhang Shaoqing Ren Jian Sun, 2015

Slide 49

Slide 49 text

48 Image from Deep Residual Learning for Image Recognition, Kaiming He Xiangyu Zhang Shaoqing Ren Jian Sun, 2015 Test error Training error Accuracy on CIFAR-10

Slide 50

Slide 50 text

Inception 49 https://arxiv.org/pdf/1905.11946.pdf Image from Mingxing Tan, Quoc V. Le. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. ICML 2019.

Slide 51

Slide 51 text

Training a DNN 50

Slide 52

Slide 52 text

51 https://youtu.be/TVmjjfTvnFs

Slide 53

Slide 53 text

Training a Network 52 select a network architecture, ! " ← initialize with random parameters while (still improving): " ← adjust parameters(!, ", &, ')

Slide 54

Slide 54 text

Goal of Training: Minimize Loss 53 Define a Loss Function: !"# = 1 & ' ()* + ,- .( − 0( 1 Mean Square Error: (Maximize) Likelihood Estimation: ℒ = 3 ()* + 4 0 5) log ℒ = ' ()* + log 4 0 5) (Maximize) Log-Likelihood Estimation:

Slide 55

Slide 55 text

Training a Network 54 select a network architecture, ! " ← initialize with random parameters while (still improving): " ← adjust parameters(!, ", &, ')

Slide 56

Slide 56 text

Training a Network 55 select a network architecture, ! " ← initialize with random parameters while ($%&&(!( , *, +) > goal and funding > 0): " ← adjust parameters(!, ", *, +)

Slide 57

Slide 57 text

while (available_students > 0 and funding > 0): Finding a Good Architecture 56 select a network architecture, ! " ← initialize with random parameters while ($%&&(!( , *, +) > goal and funding > 0): " ← adjust parameters(!, ", *, +)

Slide 58

Slide 58 text

Gradient Descent 57 ℒ",$ (&) & Goal: find & that minimizes ℒ",$ (&).

Slide 59

Slide 59 text

! Gradient Descent 58 ℒ#,% (!) Pick a random starting point Follow gradient (first derivative): to minimize, negative direction ℒ′#,% (!) !) = !)+, − . / ∇ℒ#,% (!)+, )

Slide 60

Slide 60 text

! Gradient Descent: Non-Convex Loss 59 ℒ#,% (!) Pick a random starting point Follow gradient (first derivative): to minimize, negative direction ℒ′#,% (!) !) = !)+, − . / ∇ℒ#,% (!)+, ) Repeat many times, hopefully find global minimum

Slide 61

Slide 61 text

! Mini-Batch Stochastic Gradient Descent 60 ℒ#,% (!) Pick a random starting point Follow gradient (first derivative): to minimize, negative direction ℒ′#,% (!) !) = !)+, − . / ∇ℒ#,% (!)+, ) Repeat many times, hopefully find global minimum To reduce computation, evaluate gradient of loss on randomly selected subset (“mini-batch”)

Slide 62

Slide 62 text

Cost of Training 61 https://openai.com/blog/ai-and-compute/

Slide 63

Slide 63 text

Cost of Training 62 https://openai.com/blog/ai-and-compute/

Slide 64

Slide 64 text

63

Slide 65

Slide 65 text

Adversarial Machine Learning 64

Slide 66

Slide 66 text

Labelled Training Data ML Algorithm Feature Extraction Vectors Deployment Malicious / Benign Operational Data Trained Classifier Training (supervised learning) Statistical Machine Learning

Slide 67

Slide 67 text

Labelled Training Data ML Algorithm Feature Extraction Vectors Deployment Malicious / Benign Operational Data Trained Classifier Training (supervised learning) Assumption: Training Data is Representative

Slide 68

Slide 68 text

Deployment Adversaries Don’t Cooperate Assumption: Training Data is Representative Training Poisoning

Slide 69

Slide 69 text

Adversaries Don’t Cooperate Assumption: Training Data is Representative Evading Deployment Training

Slide 70

Slide 70 text

Adversarial Examples for DNNs 69 0.007 × [&'()*] + = “panda” “gibbon” Example from: Ian J. Goodfellow, Jonathon Shlens, Christian Szegedy. Explaining and Harnessing Adversarial Examples. 2014 (in ICLR 2015)

Slide 71

Slide 71 text

0 200 400 600 800 1000 1200 1400 1600 1800 2018 2017 2016 2015 2014 2013 70 Papers on “Adversarial Examples” (Google Scholar) 1826.68 papers expected in 2018!

Slide 72

Slide 72 text

0 500 1000 1500 2000 2500 3000 2019 2018 2017 2016 2015 2014 2013 71 Papers on “Adversarial Examples” (Google Scholar) 2901.67 papers expected in 2019!

Slide 73

Slide 73 text

0 500 1000 1500 2000 2019 2018 2017 2016 2015 2014 2013 72 Dash of “Theory” ICML Workshop 2015 15% of 2018 and 2019 “adversarial examples” papers contain “theorem” and “proof”

Slide 74

Slide 74 text

73 Battista Biggio, et al. ECML-KDD 2013

Slide 75

Slide 75 text

Defining Adversarial Example 74 Assumption: small perturbation does not change class in “Reality Space” (human perception) Given seed sample, !, !" is an adversarial example iff: # !" = % Class is % (targeted) or # !" ≠ #(!) Class is different (untargeted) ∆ !, !" ≤ , Similar to seed ! Difference below threshold

Slide 76

Slide 76 text

75 Dog Random Direction Random Direction Slide by Nicholas Carlini

Slide 77

Slide 77 text

76 Dog Random Direction Random Direction Slide by Nicholas Carlini Truck

Slide 78

Slide 78 text

77 Dog Truck Adversarial Direction Random Direction Slide by Nicholas Carlini Airplane

Slide 79

Slide 79 text

78 Weilin Xu et al. “Magic Tricks for Self- driving Cars”, Defcon-CAAD, 2018. Benign Malignant Melanoma Diagnosis Samuel G Finlayson et al. “Adversarial attacks on medical machine learning”, Science, 2019. Mahmood Sharif et al. “Accessorize to a Crime: Real and Stealthy Attacks on State-of-the-Art Face Recognition”, ACM CCS, 2016. =

Slide 80

Slide 80 text

Natural Language 79 Examples by Hannah Chen Prediction: Positive (Confidence = 99.22) IMDB Movie Review Dataset Hilarious film, I had a great time watching it. The star (Cuneyt Arkin, sometimes credited as Steve Arkin) is a popular actor from Turkey. He has played in lots of tough-guy roles, epic-sword films, and romances. It was fun to see him with an international cast and some real lousy looking pair of gloves. If I remember it was also dubbed in English which made things even more funnier. (kinda like seeing John Wayne speak Turkish).

Slide 81

Slide 81 text

Natural Language 80 Examples by Hannah Chen Prediction: Positive (Confidence = 91.06) IMDB Movie Review Dataset Hilarious film, I had a great time watching it. The star (Cuneyt Arkin, sometimes credited as Steve Arkin) is a popular actor from Turkey. He has played in lots of tough-guy roles, epic-sword films, and romances. It was fun to see him with an international cast and some real lousy looking pair of gloves. If I remember it was also dubbed in English which made things even more funnier. (kinda like seeing John Wayne speak Turkish). movies Target: Negative (Confidence = 8.94)

Slide 82

Slide 82 text

Natural Language 81 Examples by Hannah Chen Prediction: Positive (Confidence = 92.28) IMDB Movie Review Dataset Hilarious film, I had a great time watching it. The star (Cuneyt Arkin, sometimes credited as Steve Arkin) is a popular actor from Turkey. He has played in lots of tough-guy roles, epic-sword films, and romances. It was fun to see him with an international cast and some real lousy looking pair of gloves. If I remember it was also dubbed in English which made things even more funnier. (kinda like seeing John Wayne speak Turkish). researching Target: Negative (Confidence = 7.72)

Slide 83

Slide 83 text

Natural Language 82 Examples by Hannah Chen Prediction: Negative (Confidence = 73.33) IMDB Movie Review Dataset Hilarious film, I had a great time watching it. The star (Cuneyt Arkin, sometimes credited as Steve Arkin) is a popular actor from Turkey. He has played in lots of tough-guy roles, epic-sword films, and romances. It was fun to see him with an international cast and some real lousy looking pair of gloves. If I remember it was also dubbed in English which made things even more funnier. (kinda like seeing John Wayne speak Turkish). researching Target: Negative movies

Slide 84

Slide 84 text

Defining Adversarial Example 83 Given seed sample, !, !" is an adversarial example iff: # !" = % Class is % (targeted) or # !" ≠ #(!) Class is different (untargeted) ∆ !, !" ≤ , Similar to seed ! Difference below threshold ∆ -, -" is defined in some (simple!) metric space.

Slide 85

Slide 85 text

Distance Metrics !" norms: 84 !" ($, $′) = ) *+ − *+ - " ./ “norm” (# different): ⋕ 1 *+ ≠ *+ -) !3 norm: ∑ |*+ − *+ -| .6 norm (“Euclidean”): ∑(*+ −* + -)7 .8 norm: max(*+ −*+ -) Useful for theory and experiments, but not realistic!

Slide 86

Slide 86 text

85 Images by Nicholas Carlini Original Image (!)

Slide 87

Slide 87 text

86 Images by Nicholas Carlini Original Image (!) Adversarial Image: "# !, !% = '(

Slide 88

Slide 88 text

87 Images by Nicholas Carlini Original Image (!) Adversarial Image: "# !, !% = '(

Slide 89

Slide 89 text

Other Distance Metrics 88 Set of transformations: rotate, scale, “fog”, color, etc. NLP: word substitutions (synonym constraints) Semantic distance: ℬ "′) = ℬ(" Behavior we care about is the same Malware: it still behaves maliciously Vision: still looks like a “cat” to most humans We’ll get back to these...for now, let’s assume '( norms (like most research) despite flaws.

Slide 90

Slide 90 text

89 Dog Truck Adversarial Direction Random Direction Slide by Nicholas Carlini Airplane How can we find nearby adversarial example?

Slide 91

Slide 91 text

90 Slide by Nicholas Carlini

Slide 92

Slide 92 text

91 Visualization by Nicholas Carlini

Slide 93

Slide 93 text

Fast Gradient Sign 92 original 0.1 0.2 0.3 0.4 0.5 Adversary Power: ! "# -bounded adversary: max(abs(*+ −*+ -)) ≤ ! *- = * − ! ⋅ sign(∇* 6(*, 8)) Goodfellow, Shlens, Szegedy 2014

Slide 94

Slide 94 text

Impact of Adversarial Perturbations 93 Distance between layer output and its output for original seed FGSM ! = 0.0245 CIFAR-10 DenseNet 95th percentile 5th percentile

Slide 95

Slide 95 text

Impact of Adversarial Perturbations 94 Distance between layer output and its output for original seed Random noise (same amount) FGSM ! = 0.0245 CIFAR-10 DenseNet

Slide 96

Slide 96 text

Basic Iterative Method (BIM) 95 !" # = ! for % iterations: !&'( # = clip-,/ (!& # − 2 ⋅ sign(∇ 8 !& # , 9 ) !# = !; ′ A. Kurakin, I. Goodfellow, and S. Bengio 2016

Slide 97

Slide 97 text

Projected Gradient Descent (PGD) 96 !" # = ! for % iterations: !&'( # = project0,2 (!& # − 5 ⋅ sign(∇ < !& # , = ) !# = !? ′ A. Kurakin, I. Goodfellow, and S. Bengio 2016

Slide 98

Slide 98 text

Carlini/Wagner 97 min $ (∆(', ' + *) + , ⋅ .(' + *)) such that (/ + *) ∈ 0, 1 3 Formulate optimization problem where . is defined objective function: . /4 ≥ 0 iff 7 /4 = 9 model output matches target Nicholas Carlini, David Wagner IEEE S&P 2017 Optimization problem that can be solved by standard optimizers Adam (SGD + momentum) [Kingman, Ba 2015]

Slide 99

Slide 99 text

Carlini/Wagner 98 Formulate optimization problem where ! is defined objective function: ! "# ≥ 0 iff ( "# = * model output matches target ! "# = max . /0 Z "# . − Z "# 0 ( 3 = ( 4 ( 5 67 … ( 9 ( 7 ((3) Z(") Nicholas Carlini, David Wagner IEEE S&P 2017 min = (∆(3, 3 + A) + B ⋅ !(3 + A)) such that (" + A) ∈ 0, 1 F softmax

Slide 100

Slide 100 text

Carlini/Wagner: !" Attack 99 # $% = max * +, Z $% * − Z $% , Nicholas Carlini, David Wagner IEEE S&P 2017 min 1 (∆(4, 4 + 7) + 9 ⋅ #(4 + 7)) such that ($ + 7) ∈ 0, 1 > 7? = 1 2 (tanh C* + 1) + $*

Slide 101

Slide 101 text

Carlini/Wagner: !" Attack 100 # $% = max * +, Z $% * − Z $% , Nicholas Carlini, David Wagner IEEE S&P 2017 min 1 ( 3 " (tanh 6 + 1) − : " " + ; ⋅ #(3 " (tanh 6 + 1) )) # $% = max(max * +, Z $% * − Z $% , , −>) confidence parameter

Slide 102

Slide 102 text

101

Slide 103

Slide 103 text

Impact of Adversarial Perturbations 102 Distance between layer output and its output for original seed Random noise (same amount) Carlini- Wagner L2 CIFAR-10 DenseNet

Slide 104

Slide 104 text

Content-Space Attacks 103 What is there is no gradient to follow?

Slide 105

Slide 105 text

Example: PDF Malware

Slide 106

Slide 106 text

Finding Evasive Malware 105 Given seed sample, !, with desired malicious behavior find an adversarial example !" that satisfies: # !" = “&'()*(” Model misclassifies ℬ !′) = ℬ(! Malicious behavior preserved Generic attack: heuristically explore input space for !′ that satisfies definition. No requirement that ! ~ !′ except through ℬ.

Slide 107

Slide 107 text

PDF Malware Classifiers Random Forest Random Forest Support Vector Machine Features Object counts, lengths, positions, … Object structural paths Very robust against “strongest conceivable mimicry attack”. Automated Features Manual Features PDFrate [ACSA 2012] Hidost16 [JIS 2016] Hidost13 [NDSS 2013]

Slide 108

Slide 108 text

Variants Evolutionary Search Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Benign Oracle Weilin Xu Yanjun Qi Fitness Selection Mutant Generation

Slide 109

Slide 109 text

Variants Generating Variants Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Fitness Selection Mutant Generation

Slide 110

Slide 110 text

PDF Structure

Slide 111

Slide 111 text

Variants Generating Variants Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Fitness Selection Mutant Generation

Slide 112

Slide 112 text

Variants Generating Variants Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Found Evasive ? 0 /JavaScript eval(‘…’); /Root /Catalog /Pages Select random node Randomly transform: delete, insert, replace

Slide 113

Slide 113 text

Variants Generating Variants Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants Variants Select Variants Found Evasive? Found Evasive ? Select random node Randomly transform: delete, insert, replace Nodes from Benign PDFs 0 /JavaScript eval(‘…’); /Root /Catalog /Pages 128 546 7 63 128

Slide 114

Slide 114 text

Variants Selecting Promising Variants Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Fitness Selection Mutant Generation

Slide 115

Slide 115 text

Variants Selecting Promising Variants Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Fitness Function Candidate Variant !(#$%&'() , #'(&++ ) Score Malicious 0 /JavaScript eval(‘…’); /Root /Catalog /Pages 128 Oracle Target Classifier

Slide 116

Slide 116 text

Oracle: ℬ "′) = ℬ(" ? Execute candidate in vulnerable Adobe Reader in virtual environment Behavioral signature: malicious if signature matches https://github.com/cuckoosandbox Simulated network: INetSim Cuckoo HTTP_URL + HOST extracted from API traces

Slide 117

Slide 117 text

Fitness Function Assumes lost malicious behavior will not be recovered !itness '′ = * 1 − classi!ier_score '3 if ℬ '′) = ℬ(' −∞ otherwise

Slide 118

Slide 118 text

0 100 200 300 400 500 0 100 200 300 Seeds Evaded (out of 500) PDFRate Number of Mutations Hidost

Slide 119

Slide 119 text

0 100 200 300 400 500 0 100 200 300 Seeds Evaded (out of 500) PDFRate Number of Mutations Hidost Simple transformations often worked

Slide 120

Slide 120 text

0 100 200 300 400 500 0 100 200 300 Seeds Evaded (out of 500) PDFRate Number of Mutations Hidost (insert, /Root/Pages/Kids, 3:/Root/Pages/Kids/4/Kids/5/) Works on 162/500 seeds

Slide 121

Slide 121 text

0 100 200 300 400 500 0 100 200 300 Seeds Evaded (out of 500) PDFRate Number of Mutations Hidost Some seeds required complex transformations

Slide 122

Slide 122 text

Malicious Label Threshold Original Malicious Seeds Evading PDFrate Classification Score Malware Seed (sorted by original score) Discovered Evasive Variants

Slide 123

Slide 123 text

Discovered Evasive Variants Malicious Label Threshold Original Malicious Seeds Adjust threshold? Charles Smutz, Angelos Stavrou. When a Tree Falls: Using Diversity in Ensemble Classifiers to Identify Evasion in Malware Detectors. NDSS 2016. Classification Score Malware Seed (sorted by original score)

Slide 124

Slide 124 text

Variants found with threshold = 0.25 Variants found with threshold = 0.50 Adjust threshold? Classification Score Malware Seed (sorted by original score)

Slide 125

Slide 125 text

Variants Hide the Classifier Score? Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Fitness Function Candidate Variant !(#$%&'() , #'(&++ ) Score Malicious 0 /JavaScript eval(‘…’); /Root /Catalog /Pages 128 Oracle Target Classifier

Slide 126

Slide 126 text

Variants Binary Classifier Output is Enough Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Fitness Function Candidate Variant !(#$%&'() , #'(&++ ) Score Malicious 0 /JavaScript eval(‘…’); /Root /Catalog /Pages 128 Oracle Target Classifier ACM CCS 2017

Slide 127

Slide 127 text

Labelled Training Data ML Algorithm Feature Extraction Vectors Deployment Malicious / Benign Operational Data Trained Classifier Training (supervised learning) Retrain Classifier

Slide 128

Slide 128 text

Labelled Training Data ML Algorithm Feature Extraction Vectors Training (supervised learning) Clone 01011001 101 EvadeML Deployment

Slide 129

Slide 129 text

0 100 200 300 400 500 0 200 400 600 800 Seeds Evaded (out of 500) Generations Hidost16 Original classifier: Takes 614 generations to evade all seeds

Slide 130

Slide 130 text

0 100 200 300 400 500 0 200 400 600 800 HidostR1 Seeds Evaded (out of 500) Generations Hidost16

Slide 131

Slide 131 text

0 100 200 300 400 500 0 200 400 600 800 HidostR1 Seeds Evaded (out of 500) Generations Hidost16

Slide 132

Slide 132 text

0 100 200 300 400 500 0 200 400 600 800 HidostR1 HidostR2 Seeds Evaded (out of 500) Generations Hidost16

Slide 133

Slide 133 text

0 100 200 300 400 500 0 200 400 600 800 HidostR1 HidostR2 Seeds Evaded (out of 500) Generations Hidost16

Slide 134

Slide 134 text

0 100 200 300 400 500 0 200 400 600 800 Hidost16 Genome Contagio Benign Hidost16 0.00 0.00 HidostR1 0.78 0.30 HidostR2 0.85 0.53 False Positive Rates HidostR1 Seeds Evaded (out of 500) Generations HidostR2

Slide 135

Slide 135 text

0 100 200 300 400 500 0 200 400 600 800 Hidost16 Genome Contagio Benign Hidost16 0.00 0.00 HidostR1 0.78 0.30 HidostR2 0.85 0.53 False Positive Rates HidostR1 Seeds Evaded (out of 500) Generations HidostR2

Slide 136

Slide 136 text

135 Only 8/6987 robust features (Hidost) Robust classifier High false positives /Names /Names /JavaScript /Names /JavaScript /Names /Names /JavaScript /JS /OpenAction /OpenAction /JS /OpenAction /S /Pages USENIX Security 2019

Slide 137

Slide 137 text

Malware Classification Moral To build robust, effective malware classifiers need robust features that are strong signals for malware. 136 If you have features like this – don’t need ML! There are scenarios where adversarial training “works” [more tomorrow].

Slide 138

Slide 138 text

Recap: Adversarial Examples across Domains 137 Domain Classifier Space “Reality” Space Trojan Wars Judgment of Trojans !(#) = “gift” Physical Reality !∗(#) = invading army Malware Malware Detector !(#) = “benign” Victim’s Execution !∗(#) = malicious behavior Image Classification DNN Classifier !(#) = ) Human Perception !∗(#) = *

Slide 139

Slide 139 text

Tomorrow: Defenses 138 David Evans University of Virginia [email protected] https://www.cs.virginia.edu/evans