FOSAD Trustworthy Machine Learning: Class 1

Trustworthy Machine Learning David Evans University of Virginia jeffersonswheel.org Bertinoro,
Italy 26 August 2019 19th International School on Foundations of Security Analysis and Design 1: Introduction/Attacks

Plan for the Course Monday (Today) Introduction ML Background Attacks
Tuesday (Tomorrow) Defenses Wednesday Privacy, Fairness, Abuse 1 Overall Goals: broad and whirlwind survey* of an exciting emerging research area explain a few of my favorite research results in enough detail to understand them at a high-level introduce some open problems that I hope you will work on and solve * but highly biased by my own interests

2 Why should we care about Trustworthy Machine Learning?

3 “Unfortunately, our translation systems made an error last week
that misinterpreted what this individual posted. Even though our translations are getting better each day, mistakes like these might happen from time to time and we’ve taken steps to address this particular issue. We apologize to him and his family for the mistake and the disruption this caused.”

Amazon Employment 5

Risks from Artificial Intelligence 6 Benign developers and operators AI
out of control AI inadvertently causes harm Malicious operators Build AI to do harm Malicious abuse of benign AI On Robots Joe Berger and Pascal Wyse (The Guardian, 21 July 2018)

Harmful AI Benign developers and operators AI out of control
AI causes harm (without creators objecting) Malicious operators Build AI to do harm 7

Out-of-Control AI 8 HAL, 2001: A Space Odyssey SkyNet, The
Terminator

Alignment Problem 9 Bostrom’s Paperclip Maximizer

Harmful AI Benign developers and operators AI out of control
AI inadvertently causes harm to humanity Malicious operators Build AI to do harm 10

Lost Jobs and Dignity 11

12 On Robots Joe Berger and Pascal Wyse (The Guardian,
21 July 2018) Human Jobs of the Future

Inadvertent Bias and Discrimination 13 3rd lecture

Harmful AI Benign developers AI out of control AI causes
harm (without creators objecting) Malicious developers Using AI to do harm 14 Malice is (often) in the eye of the beholder (e.g., mass surveillance, pop-up ads, etc.)

Automated Spear Phishing 15 “It’s slightly less effective [than manually
generated] but it’s dramatically more efficient” (John Seymour) More malicious use of AI in 3rd lecture?

Risks from Artificial Intelligence Benign developers and operators AI out
of control AI inadvertently causes harm Malicious operators Build AI to do harm Malicious abuse of benign AI systems 16 rest of today and tomorrow

Crash Course in Machine Learning 17

More Ambition 19 “The human race will have a new
kind of instrument which will increase the power of the mind much more than optical lenses strengthen the eyes and which will be as far superior to microscopes or telescopes as reason is superior to sight.”

More Ambition 20 Gottfried Wilhelm Leibniz (1679)

21 Gottfried Wilhelm Leibniz (Universitat Altdorf, 1666) who advised: Jacob
Bernoulli (Universitdt Basel, 1684) who advised: Johann Bernoulli (Universitdt Basel, 1694) who advised: Leonhard Euler (Universitat Basel, 1726) who advised: Joseph Louis Lagrange who advised: Simeon Denis Poisson who advised: Michel Chasles (Ecole Polytechnique, 1814) who advised: H. A. (Hubert Anson) Newton (Yale, 1850) who advised: E. H. Moore (Yale, 1885) who advised: Oswald Veblen (U. of Chicago, 1903) who advised: Philip Franklin (Princeton 1921) who advised: Alan Perlis (MIT Math PhD 1950) who advised: Jerry Feldman (CMU Math 1966) who advised: Jim Horning (Stanford CS PhD 1969) who advised: John Guttag (U. of Toronto CS PhD 1975) who advised: David Evans (MIT CS PhD 2000) my academic great- great-great-great- great-great-great- great-great-great- great-great-great- great-great- grandparent!

More Precision 22 “The human race will have a new
kind of instrument which will increase the power of the mind much more than optical lenses strengthen the eyes and which will be as far superior to microscopes or telescopes as reason is superior to sight.” Gottfried Wilhelm Leibniz (1679) Normal computing amplifies (quadrillions of times faster) and aggregates (enables millions of humans to work together) human cognitive abilities; AI goes beyond what humans can do.

Operational Definition 23 If it is explainable, its not ML!
“Artificial Intelligence” means making computers do things their programmers don’t understand well enough to program explicitly.

Inherent Paradox of “Trustworthy” ML 24 If we could specify
precisely what the model should do, we wouldn’t need ML to do it! “Artificial Intelligence” means making computers do things their programmers don’t understand well enough to program explicitly.

Inherent Paradox of “Trustworthy” ML 25 If we could specify
precisely what the model should do, we wouldn’t need ML to do it! Best we hope for is verifying certain properties M 1 M 2 ∀": $% " = $' (") DeepXplore: Automated Whitebox Testing of Deep Learning Systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana. SOSP 2017 Model Similarity

Inherent Paradox of “Trustworthy” ML 26 Best we hope for
is verifying certain properties M 1 M 2 ∀" ∈ $: &' " ≈ &) (") DeepXplore: Automated Whitebox Testing of Deep Learning Systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana. SOSP 2017 Model Similarity M ∀" ∈ $, ∀∆ ∈ .: & " ≈ &(" + ∆) " " + ∆ Model Robustness 0 M 0∗

Adversarial Robustness 27 M ∀" ∈ $, ∀∆ ∈ ':
) " ≈ )(" + ∆) " " + ∆ . M .∗ Adversary’s Goal: find a “small” perturbation that changes model output targeted attack: in some desired way Defender’s Goal: Robust Model: find model where this is hard Detection: detect inputs that are adversarial

Not a new problem... 28 Or do you think any
Greek gift’s free of treachery? Is that Ulysses’s reputation? Either there are Greeks in hiding, concealed by the wood, or it’s been built as a machine to use against our walls, or spy on our homes, or fall on the city from above, or it hides some other trick: Trojans, don’t trust this horse. Whatever it is, I’m afraid of Greeks even those bearing gifts.’ Virgil, The Aenid (Book II)

Introduction to Deep Learning 29

Generic Classifier 30 !: # → Y Input: % ∈
ℝ( Output (label): ) ∈ {1, … , .} Natural distribution: 0 ⊆ %, ) pairs

Neural Network 31 ! " = ! $ ! %
&' … ! ) ! ' !(") “layer”: ! , : mostly from ℝ. → ℝ0

Activation Layer 32 . . . Layer t − 1
. . . #$,& '() * & ' = ,(∑ $.) /(123) # $,& (' ())5 $ ('())) 5 $ ('())

Activation Layer 33 . . . Layer ! − 1
. . . $%,' ()* + ' ( = -(∑ %/* 0(234) $ %,' (( )*)6 % (()*)) Activation function 6 % (()*) ReLU: Rectified Linear Unit - 6 = 7 0, 6 < 0 6, 6 ≥ 0

“Fancy” Layers: Convolution 34 . . . Layer ! −
1 $ % & = (() * (&,-)) . . . . . . . . . . . . . . . . . . . . . × 0-- ⋯ 02- ⋮ ⋱ ⋮ 0-2 ⋯ 022

“Fancy” Layers: Max Pooling 35 Layer ! − 1 $
% & = (() * (&,-)) . . . . . . . . . . . . . . . . . . . . . . . . × 0-- ⋯ 02- ⋮ ⋱ ⋮ 0-2 ⋯ 022

“Fancy” Layers: Max Pooling 36 max(%&& , %&( , %(&
, %(( ) max(%*& , %*( , %+& , %+( ) max(%,& , %,( , %-& , %-( )

Final Layer: SoftMax 37 . . . Layer ! −
1 $%,' ()* + ' ( = -(/( )*) SoftMax function - / = 123 ∑ '5* 6 127 | 9 = 1, … , ; [0.03, 0.32, 0.01, A. BC, 0.00, 0.01] / % (E)*) It’s a “cat” (0.63 confidence).

DNNs in 1989 38 Backpropagation Applied to Handwritten Zip Code
Recognition. Yann LeCun, et al., 1989.

Turing Award in 2018 39 Yann Lecun Geoffrey Hinton Yoshua
Bengio AT&T → Facebook/NYU Google/U. Toronto U. Montreal

DNNs in 1989 40 Backpropagation Applied to Handwritten Zip Code
Recognition. Yann LeCun, et al., 1989.

MNIST 41 https://www.usenix.org/conference/usenixsecurity18/presentation/mickens James Mickens’ USENIX Security Symposium 2018 (Keynote)
MNIST Dataset

MNIST Dataset 42 2 8 7 6 8 6 5
9 70 000 images (60 000 training, 10 000 testing) 28×28 pixels, 8-bit grayscale scanned hand-written digits labeled by humans LeCun, Cortes, Burges [1998]

MNIST Dataset 43 2 8 7 6 8 6 5
9 70 000 images (60 000 training, 10 000 testing) 28×28 pixels, 8-bit grayscale scanned hand-written digits labeled by humans LeCun, Cortes, Burges [1998]

Progress in MNIST 44 Year Error Rate 1998 [Yann LeCun,
et al.] 5% error rate (12.1% rejection for 1% error rate) 2013 [..., Yann Le Cun, ...] 0.21% (21 out of 10,000 tests)

CIFAR-10 (and CIFAR-100) 45 truck ship horse frog dog deer
cat bird automobile airplane 60 000 images 32×32 pixels, 24-bit color human-labeled subset of images in 10 classes from Tiny Images Dataset Alex Krizhevsky [2009]

46 14M high-resolution full color images Manually annotated in WordNet
~20,000 synonym set (~1000 images in each)

Example CNN Architectures 47 Image from Deep Residual Learning for
Image Recognition, Kaiming He Xiangyu Zhang Shaoqing Ren Jian Sun, 2015

48 Image from Deep Residual Learning for Image Recognition, Kaiming
He Xiangyu Zhang Shaoqing Ren Jian Sun, 2015 Test error Training error Accuracy on CIFAR-10

Inception 49 https://arxiv.org/pdf/1905.11946.pdf Image from Mingxing Tan, Quoc V. Le.
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. ICML 2019.

Training a DNN 50

51 https://youtu.be/TVmjjfTvnFs

Training a Network 52 select a network architecture, ! "
← initialize with random parameters while (still improving): " ← adjust parameters(!, ", &, ')

Goal of Training: Minimize Loss 53 Define a Loss Function:
!"# = 1 & ' ()* + ,- .( − 0( 1 Mean Square Error: (Maximize) Likelihood Estimation: ℒ = 3 ()* + 4 0 5) log ℒ = ' ()* + log 4 0 5) (Maximize) Log-Likelihood Estimation:

← initialize with random parameters while (still improving): " ← adjust parameters(!, ", &, ')

← initialize with random parameters while ($%&&(!( , *, +) > goal and funding > 0): " ← adjust parameters(!, ", *, +)

while (available_students > 0 and funding > 0): Finding a
Good Architecture 56 select a network architecture, ! " ← initialize with random parameters while ($%&&(!( , *, +) > goal and funding > 0): " ← adjust parameters(!, ", *, +)

Gradient Descent 57 ℒ",$ (&) & Goal: find & that
minimizes ℒ",$ (&).

! Gradient Descent 58 ℒ#,% (!) Pick a random starting
point Follow gradient (first derivative): to minimize, negative direction ℒ′#,% (!) !) = !)+, − . / ∇ℒ#,% (!)+, )

! Gradient Descent: Non-Convex Loss 59 ℒ#,% (!) Pick a
random starting point Follow gradient (first derivative): to minimize, negative direction ℒ′#,% (!) !) = !)+, − . / ∇ℒ#,% (!)+, ) Repeat many times, hopefully find global minimum

! Mini-Batch Stochastic Gradient Descent 60 ℒ#,% (!) Pick a
random starting point Follow gradient (first derivative): to minimize, negative direction ℒ′#,% (!) !) = !)+, − . / ∇ℒ#,% (!)+, ) Repeat many times, hopefully find global minimum To reduce computation, evaluate gradient of loss on randomly selected subset (“mini-batch”)

Cost of Training 61 https://openai.com/blog/ai-and-compute/

Cost of Training 62 https://openai.com/blog/ai-and-compute/

Adversarial Machine Learning 64

Labelled Training Data ML Algorithm Feature Extraction Vectors Deployment Malicious
/ Benign Operational Data Trained Classifier Training (supervised learning) Statistical Machine Learning

/ Benign Operational Data Trained Classifier Training (supervised learning) Assumption: Training Data is Representative

Deployment Adversaries Don’t Cooperate Assumption: Training Data is Representative Training
Poisoning

Adversaries Don’t Cooperate Assumption: Training Data is Representative Evading Deployment
Training

Adversarial Examples for DNNs 69 0.007 × [&'()*] + =
“panda” “gibbon” Example from: Ian J. Goodfellow, Jonathon Shlens, Christian Szegedy. Explaining and Harnessing Adversarial Examples. 2014 (in ICLR 2015)

0 200 400 600 800 1000 1200 1400 1600 1800
2018 2017 2016 2015 2014 2013 70 Papers on “Adversarial Examples” (Google Scholar) 1826.68 papers expected in 2018!

0 500 1000 1500 2000 2500 3000 2019 2018 2017
2016 2015 2014 2013 71 Papers on “Adversarial Examples” (Google Scholar) 2901.67 papers expected in 2019!

0 500 1000 1500 2000 2019 2018 2017 2016 2015
2014 2013 72 Dash of “Theory” ICML Workshop 2015 15% of 2018 and 2019 “adversarial examples” papers contain “theorem” and “proof”

73 Battista Biggio, et al. ECML-KDD 2013

Defining Adversarial Example 74 Assumption: small perturbation does not change
class in “Reality Space” (human perception) Given seed sample, !, !" is an adversarial example iff: # !" = % Class is % (targeted) or # !" ≠ #(!) Class is different (untargeted) ∆ !, !" ≤ , Similar to seed ! Difference below threshold

75 Dog Random Direction Random Direction Slide by Nicholas Carlini

76 Dog Random Direction Random Direction Slide by Nicholas Carlini
Truck

77 Dog Truck Adversarial Direction Random Direction Slide by Nicholas
Carlini Airplane

78 Weilin Xu et al. “Magic Tricks for Self- driving
Cars”, Defcon-CAAD, 2018. Benign Malignant Melanoma Diagnosis Samuel G Finlayson et al. “Adversarial attacks on medical machine learning”, Science, 2019. Mahmood Sharif et al. “Accessorize to a Crime: Real and Stealthy Attacks on State-of-the-Art Face Recognition”, ACM CCS, 2016. =

Natural Language 79 Examples by Hannah Chen Prediction: Positive (Confidence
= 99.22) IMDB Movie Review Dataset Hilarious film, I had a great time watching it. The star (Cuneyt Arkin, sometimes credited as Steve Arkin) is a popular actor from Turkey. He has played in lots of tough-guy roles, epic-sword films, and romances. It was fun to see him with an international cast and some real lousy looking pair of gloves. If I remember it was also dubbed in English which made things even more funnier. (kinda like seeing John Wayne speak Turkish).

= 91.06) IMDB Movie Review Dataset Hilarious film, I had a great time watching it. The star (Cuneyt Arkin, sometimes credited as Steve Arkin) is a popular actor from Turkey. He has played in lots of tough-guy roles, epic-sword films, and romances. It was fun to see him with an international cast and some real lousy looking pair of gloves. If I remember it was also dubbed in English which made things even more funnier. (kinda like seeing John Wayne speak Turkish). movies Target: Negative (Confidence = 8.94)

= 92.28) IMDB Movie Review Dataset Hilarious film, I had a great time watching it. The star (Cuneyt Arkin, sometimes credited as Steve Arkin) is a popular actor from Turkey. He has played in lots of tough-guy roles, epic-sword films, and romances. It was fun to see him with an international cast and some real lousy looking pair of gloves. If I remember it was also dubbed in English which made things even more funnier. (kinda like seeing John Wayne speak Turkish). researching Target: Negative (Confidence = 7.72)

Natural Language 82 Examples by Hannah Chen Prediction: Negative (Confidence
= 73.33) IMDB Movie Review Dataset Hilarious film, I had a great time watching it. The star (Cuneyt Arkin, sometimes credited as Steve Arkin) is a popular actor from Turkey. He has played in lots of tough-guy roles, epic-sword films, and romances. It was fun to see him with an international cast and some real lousy looking pair of gloves. If I remember it was also dubbed in English which made things even more funnier. (kinda like seeing John Wayne speak Turkish). researching Target: Negative movies

Defining Adversarial Example 83 Given seed sample, !, !" is
an adversarial example iff: # !" = % Class is % (targeted) or # !" ≠ #(!) Class is different (untargeted) ∆ !, !" ≤ , Similar to seed ! Difference below threshold ∆ -, -" is defined in some (simple!) metric space.

Distance Metrics !" norms: 84 !" ($, $′) = )
*+ − *+ - " ./ “norm” (# different): ⋕ 1 *+ ≠ *+ -) !3 norm: ∑ |*+ − *+ -| .6 norm (“Euclidean”): ∑(*+ −* + -)7 .8 norm: max(*+ −*+ -) Useful for theory and experiments, but not realistic!

85 Images by Nicholas Carlini Original Image (!)

86 Images by Nicholas Carlini Original Image (!) Adversarial Image:
"# !, !% = '(

87 Images by Nicholas Carlini Original Image (!) Adversarial Image:
"# !, !% = '(

Other Distance Metrics 88 Set of transformations: rotate, scale, “fog”,
color, etc. NLP: word substitutions (synonym constraints) Semantic distance: ℬ "′) = ℬ(" Behavior we care about is the same Malware: it still behaves maliciously Vision: still looks like a “cat” to most humans We’ll get back to these...for now, let’s assume '( norms (like most research) despite flaws.

89 Dog Truck Adversarial Direction Random Direction Slide by Nicholas
Carlini Airplane How can we find nearby adversarial example?

90 Slide by Nicholas Carlini

91 Visualization by Nicholas Carlini

Fast Gradient Sign 92 original 0.1 0.2 0.3 0.4 0.5
Adversary Power: ! "# -bounded adversary: max(abs(*+ −*+ -)) ≤ ! *- = * − ! ⋅ sign(∇* 6(*, 8)) Goodfellow, Shlens, Szegedy 2014

Impact of Adversarial Perturbations 93 Distance between layer output and
its output for original seed FGSM ! = 0.0245 CIFAR-10 DenseNet 95th percentile 5th percentile

its output for original seed Random noise (same amount) FGSM ! = 0.0245 CIFAR-10 DenseNet

Basic Iterative Method (BIM) 95 !" # = ! for
% iterations: !&'( # = clip-,/ (!& # − 2 ⋅ sign(∇ 8 !& # , 9 ) !# = !; ′ A. Kurakin, I. Goodfellow, and S. Bengio 2016

Projected Gradient Descent (PGD) 96 !" # = ! for
% iterations: !&'( # = project0,2 (!& # − 5 ⋅ sign(∇ < !& # , = ) !# = !? ′ A. Kurakin, I. Goodfellow, and S. Bengio 2016

Carlini/Wagner 97 min $ (∆(', ' + *) + ,
⋅ .(' + *)) such that (/ + *) ∈ 0, 1 3 Formulate optimization problem where . is defined objective function: . /4 ≥ 0 iff 7 /4 = 9 model output matches target Nicholas Carlini, David Wagner IEEE S&P 2017 Optimization problem that can be solved by standard optimizers Adam (SGD + momentum) [Kingman, Ba 2015]

Carlini/Wagner 98 Formulate optimization problem where ! is defined objective
function: ! "# ≥ 0 iff ( "# = * model output matches target ! "# = max . /0 Z "# . − Z "# 0 ( 3 = ( 4 ( 5 67 … ( 9 ( 7 ((3) Z(") Nicholas Carlini, David Wagner IEEE S&P 2017 min = (∆(3, 3 + A) + B ⋅ !(3 + A)) such that (" + A) ∈ 0, 1 F softmax

Carlini/Wagner: !" Attack 99 # $% = max * +,
Z $% * − Z $% , Nicholas Carlini, David Wagner IEEE S&P 2017 min 1 (∆(4, 4 + 7) + 9 ⋅ #(4 + 7)) such that ($ + 7) ∈ 0, 1 > 7? = 1 2 (tanh C* + 1) + $*

Carlini/Wagner: !" Attack 100 # $% = max * +,
Z $% * − Z $% , Nicholas Carlini, David Wagner IEEE S&P 2017 min 1 ( 3 " (tanh 6 + 1) − : " " + ; ⋅ #(3 " (tanh 6 + 1) )) # $% = max(max * +, Z $% * − Z $% , , −>) confidence parameter

its output for original seed Random noise (same amount) Carlini- Wagner L2 CIFAR-10 DenseNet

Content-Space Attacks 103 What is there is no gradient to
follow?

Example: PDF Malware

Finding Evasive Malware 105 Given seed sample, !, with desired
malicious behavior find an adversarial example !" that satisfies: # !" = “&'()*(” Model misclassifies ℬ !′) = ℬ(! Malicious behavior preserved Generic attack: heuristically explore input space for !′ that satisfies definition. No requirement that ! ~ !′ except through ℬ.

PDF Malware Classifiers Random Forest Random Forest Support Vector Machine
Features Object counts, lengths, positions, … Object structural paths Very robust against “strongest conceivable mimicry attack”. Automated Features Manual Features PDFrate [ACSA 2012] Hidost16 [JIS 2016] Hidost13 [NDSS 2013]

Variants Evolutionary Search Clone Benign PDFs Malicious PDF Mutation 01011001101
Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Benign Oracle Weilin Xu Yanjun Qi Fitness Selection Mutant Generation

Variants Generating Variants Clone Benign PDFs Malicious PDF Mutation 01011001101
Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Fitness Selection Mutant Generation

PDF Structure

Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Fitness Selection Mutant Generation

Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Found Evasive ? 0 /JavaScript eval(‘…’); /Root /Catalog /Pages Select random node Randomly transform: delete, insert, replace

Variants Variants Select Variants Found Evasive? Found Evasive ? Select random node Randomly transform: delete, insert, replace Nodes from Benign PDFs 0 /JavaScript eval(‘…’); /Root /Catalog /Pages 128 546 7 63 128

Variants Selecting Promising Variants Clone Benign PDFs Malicious PDF Mutation
01011001101 Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Fitness Selection Mutant Generation

Variants Selecting Promising Variants Clone Benign PDFs Malicious PDF Mutation
01011001101 Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Fitness Function Candidate Variant !(#$%&'() , #'(&++ ) Score Malicious 0 /JavaScript eval(‘…’); /Root /Catalog /Pages 128 Oracle Target Classifier

Oracle: ℬ "′) = ℬ(" ? Execute candidate in vulnerable
Adobe Reader in virtual environment Behavioral signature: malicious if signature matches https://github.com/cuckoosandbox Simulated network: INetSim Cuckoo HTTP_URL + HOST extracted from API traces

Fitness Function Assumes lost malicious behavior will not be recovered
!itness '′ = * 1 − classi!ier_score '3 if ℬ '′) = ℬ(' −∞ otherwise

0 100 200 300 400 500 0 100 200 300
Seeds Evaded (out of 500) PDFRate Number of Mutations Hidost

0 100 200 300 400 500 0 100 200 300
Seeds Evaded (out of 500) PDFRate Number of Mutations Hidost Simple transformations often worked

0 100 200 300 400 500 0 100 200 300
Seeds Evaded (out of 500) PDFRate Number of Mutations Hidost (insert, /Root/Pages/Kids, 3:/Root/Pages/Kids/4/Kids/5/) Works on 162/500 seeds

0 100 200 300 400 500 0 100 200 300
Seeds Evaded (out of 500) PDFRate Number of Mutations Hidost Some seeds required complex transformations

Malicious Label Threshold Original Malicious Seeds Evading PDFrate Classification Score
Malware Seed (sorted by original score) Discovered Evasive Variants

Discovered Evasive Variants Malicious Label Threshold Original Malicious Seeds Adjust
threshold? Charles Smutz, Angelos Stavrou. When a Tree Falls: Using Diversity in Ensemble Classifiers to Identify Evasion in Malware Detectors. NDSS 2016. Classification Score Malware Seed (sorted by original score)

Variants found with threshold = 0.25 Variants found with threshold
= 0.50 Adjust threshold? Classification Score Malware Seed (sorted by original score)

Variants Hide the Classifier Score? Clone Benign PDFs Malicious PDF
Mutation 01011001101 Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Fitness Function Candidate Variant !(#$%&'() , #'(&++ ) Score Malicious 0 /JavaScript eval(‘…’); /Root /Catalog /Pages 128 Oracle Target Classifier

Variants Binary Classifier Output is Enough Clone Benign PDFs Malicious
PDF Mutation 01011001101 Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Fitness Function Candidate Variant !(#$%&'() , #'(&++ ) Score Malicious 0 /JavaScript eval(‘…’); /Root /Catalog /Pages 128 Oracle Target Classifier ACM CCS 2017

/ Benign Operational Data Trained Classifier Training (supervised learning) Retrain Classifier

Labelled Training Data ML Algorithm Feature Extraction Vectors Training (supervised
learning) Clone 01011001 101 EvadeML Deployment

0 100 200 300 400 500 0 200 400 600
800 Seeds Evaded (out of 500) Generations Hidost16 Original classifier: Takes 614 generations to evade all seeds

0 100 200 300 400 500 0 200 400 600
800 HidostR1 Seeds Evaded (out of 500) Generations Hidost16

0 100 200 300 400 500 0 200 400 600
800 HidostR1 HidostR2 Seeds Evaded (out of 500) Generations Hidost16

0 100 200 300 400 500 0 200 400 600
800 Hidost16 Genome Contagio Benign Hidost16 0.00 0.00 HidostR1 0.78 0.30 HidostR2 0.85 0.53 False Positive Rates HidostR1 Seeds Evaded (out of 500) Generations HidostR2

135 Only 8/6987 robust features (Hidost) Robust classifier High false
positives /Names /Names /JavaScript /Names /JavaScript /Names /Names /JavaScript /JS /OpenAction /OpenAction /JS /OpenAction /S /Pages USENIX Security 2019

Malware Classification Moral To build robust, effective malware classifiers need
robust features that are strong signals for malware. 136 If you have features like this – don’t need ML! There are scenarios where adversarial training “works” [more tomorrow].

Recap: Adversarial Examples across Domains 137 Domain Classifier Space “Reality”
Space Trojan Wars Judgment of Trojans !(#) = “gift” Physical Reality !∗(#) = invading army Malware Malware Detector !(#) = “benign” Victim’s Execution !∗(#) = malicious behavior Image Classification DNN Classifier !(#) = ) Human Perception !∗(#) = *

Tomorrow: Defenses 138 David Evans University of Virginia [email protected] https://www.cs.virginia.edu/evans

FOSAD Trustworthy Machine Learning: Class 1

FOSAD Trustworthy Machine Learning: Class 1

More Decks by David Evans

Other Decks in Education

Featured

Transcript