FOSAD Trustworthy Machine Learning: Class 1

40e37c08199ed4d3866ce6e1ff0be06d?s=47 David Evans
August 26, 2019

FOSAD Trustworthy Machine Learning: Class 1

19th International School on Foundations of Security Analysis and Design
Mini-course on "Trustworthy Machine Learning"
https://jeffersonswheel.org/fosad2019
David Evans

Class 1: Introduction/Attacks

40e37c08199ed4d3866ce6e1ff0be06d?s=128

David Evans

August 26, 2019
Tweet

Transcript

  1. Trustworthy Machine Learning David Evans University of Virginia jeffersonswheel.org Bertinoro,

    Italy 26 August 2019 19th International School on Foundations of Security Analysis and Design 1: Introduction/Attacks
  2. Plan for the Course Monday (Today) Introduction ML Background Attacks

    Tuesday (Tomorrow) Defenses Wednesday Privacy, Fairness, Abuse 1 Overall Goals: broad and whirlwind survey* of an exciting emerging research area explain a few of my favorite research results in enough detail to understand them at a high-level introduce some open problems that I hope you will work on and solve * but highly biased by my own interests
  3. 2 Why should we care about Trustworthy Machine Learning?

  4. 3 “Unfortunately, our translation systems made an error last week

    that misinterpreted what this individual posted. Even though our translations are getting better each day, mistakes like these might happen from time to time and we’ve taken steps to address this particular issue. We apologize to him and his family for the mistake and the disruption this caused.”
  5. 4

  6. Amazon Employment 5

  7. Risks from Artificial Intelligence 6 Benign developers and operators AI

    out of control AI inadvertently causes harm Malicious operators Build AI to do harm Malicious abuse of benign AI On Robots Joe Berger and Pascal Wyse (The Guardian, 21 July 2018)
  8. Harmful AI Benign developers and operators AI out of control

    AI causes harm (without creators objecting) Malicious operators Build AI to do harm 7
  9. Out-of-Control AI 8 HAL, 2001: A Space Odyssey SkyNet, The

    Terminator
  10. Alignment Problem 9 Bostrom’s Paperclip Maximizer

  11. Harmful AI Benign developers and operators AI out of control

    AI inadvertently causes harm to humanity Malicious operators Build AI to do harm 10
  12. Lost Jobs and Dignity 11

  13. 12 On Robots Joe Berger and Pascal Wyse (The Guardian,

    21 July 2018) Human Jobs of the Future
  14. Inadvertent Bias and Discrimination 13 3rd lecture

  15. Harmful AI Benign developers AI out of control AI causes

    harm (without creators objecting) Malicious developers Using AI to do harm 14 Malice is (often) in the eye of the beholder (e.g., mass surveillance, pop-up ads, etc.)
  16. Automated Spear Phishing 15 “It’s slightly less effective [than manually

    generated] but it’s dramatically more efficient” (John Seymour) More malicious use of AI in 3rd lecture?
  17. Risks from Artificial Intelligence Benign developers and operators AI out

    of control AI inadvertently causes harm Malicious operators Build AI to do harm Malicious abuse of benign AI systems 16 rest of today and tomorrow
  18. Crash Course in Machine Learning 17

  19. 18

  20. More Ambition 19 “The human race will have a new

    kind of instrument which will increase the power of the mind much more than optical lenses strengthen the eyes and which will be as far superior to microscopes or telescopes as reason is superior to sight.”
  21. More Ambition 20 Gottfried Wilhelm Leibniz (1679)

  22. 21 Gottfried Wilhelm Leibniz (Universitat Altdorf, 1666) who advised: Jacob

    Bernoulli (Universitdt Basel, 1684) who advised: Johann Bernoulli (Universitdt Basel, 1694) who advised: Leonhard Euler (Universitat Basel, 1726) who advised: Joseph Louis Lagrange who advised: Simeon Denis Poisson who advised: Michel Chasles (Ecole Polytechnique, 1814) who advised: H. A. (Hubert Anson) Newton (Yale, 1850) who advised: E. H. Moore (Yale, 1885) who advised: Oswald Veblen (U. of Chicago, 1903) who advised: Philip Franklin (Princeton 1921) who advised: Alan Perlis (MIT Math PhD 1950) who advised: Jerry Feldman (CMU Math 1966) who advised: Jim Horning (Stanford CS PhD 1969) who advised: John Guttag (U. of Toronto CS PhD 1975) who advised: David Evans (MIT CS PhD 2000) my academic great- great-great-great- great-great-great- great-great-great- great-great-great- great-great- grandparent!
  23. More Precision 22 “The human race will have a new

    kind of instrument which will increase the power of the mind much more than optical lenses strengthen the eyes and which will be as far superior to microscopes or telescopes as reason is superior to sight.” Gottfried Wilhelm Leibniz (1679) Normal computing amplifies (quadrillions of times faster) and aggregates (enables millions of humans to work together) human cognitive abilities; AI goes beyond what humans can do.
  24. Operational Definition 23 If it is explainable, its not ML!

    “Artificial Intelligence” means making computers do things their programmers don’t understand well enough to program explicitly.
  25. Inherent Paradox of “Trustworthy” ML 24 If we could specify

    precisely what the model should do, we wouldn’t need ML to do it! “Artificial Intelligence” means making computers do things their programmers don’t understand well enough to program explicitly.
  26. Inherent Paradox of “Trustworthy” ML 25 If we could specify

    precisely what the model should do, we wouldn’t need ML to do it! Best we hope for is verifying certain properties M 1 M 2 ∀": $% " = $' (") DeepXplore: Automated Whitebox Testing of Deep Learning Systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana. SOSP 2017 Model Similarity
  27. Inherent Paradox of “Trustworthy” ML 26 Best we hope for

    is verifying certain properties M 1 M 2 ∀" ∈ $: &' " ≈ &) (") DeepXplore: Automated Whitebox Testing of Deep Learning Systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana. SOSP 2017 Model Similarity M ∀" ∈ $, ∀∆ ∈ .: & " ≈ &(" + ∆) " " + ∆ Model Robustness 0 M 0∗
  28. Adversarial Robustness 27 M ∀" ∈ $, ∀∆ ∈ ':

    ) " ≈ )(" + ∆) " " + ∆ . M .∗ Adversary’s Goal: find a “small” perturbation that changes model output targeted attack: in some desired way Defender’s Goal: Robust Model: find model where this is hard Detection: detect inputs that are adversarial
  29. Not a new problem... 28 Or do you think any

    Greek gift’s free of treachery? Is that Ulysses’s reputation? Either there are Greeks in hiding, concealed by the wood, or it’s been built as a machine to use against our walls, or spy on our homes, or fall on the city from above, or it hides some other trick: Trojans, don’t trust this horse. Whatever it is, I’m afraid of Greeks even those bearing gifts.’ Virgil, The Aenid (Book II)
  30. Introduction to Deep Learning 29

  31. Generic Classifier 30 !: # → Y Input: % ∈

    ℝ( Output (label): ) ∈ {1, … , .} Natural distribution: 0 ⊆ %, ) pairs
  32. Neural Network 31 ! " = ! $ ! %

    &' … ! ) ! ' !(") “layer”: ! , : mostly from ℝ. → ℝ0
  33. Activation Layer 32 . . . Layer t − 1

    . . . #$,& '() * & ' = ,(∑ $.) /(123) # $,& (' ())5 $ ('())) 5 $ ('())
  34. Activation Layer 33 . . . Layer ! − 1

    . . . $%,' ()* + ' ( = -(∑ %/* 0(234) $ %,' (( )*)6 % (()*)) Activation function 6 % (()*) ReLU: Rectified Linear Unit - 6 = 7 0, 6 < 0 6, 6 ≥ 0
  35. “Fancy” Layers: Convolution 34 . . . Layer ! −

    1 $ % & = (() * (&,-)) . . . . . . . . . . . . . . . . . . . . . × 0-- ⋯ 02- ⋮ ⋱ ⋮ 0-2 ⋯ 022
  36. “Fancy” Layers: Max Pooling 35 Layer ! − 1 $

    % & = (() * (&,-)) . . . . . . . . . . . . . . . . . . . . . . . . × 0-- ⋯ 02- ⋮ ⋱ ⋮ 0-2 ⋯ 022
  37. “Fancy” Layers: Max Pooling 36 max(%&& , %&( , %(&

    , %(( ) max(%*& , %*( , %+& , %+( ) max(%,& , %,( , %-& , %-( )
  38. Final Layer: SoftMax 37 . . . Layer ! −

    1 $%,' ()* + ' ( = -(/( )*) SoftMax function - / = 123 ∑ '5* 6 127 | 9 = 1, … , ; [0.03, 0.32, 0.01, A. BC, 0.00, 0.01] / % (E)*) It’s a “cat” (0.63 confidence).
  39. DNNs in 1989 38 Backpropagation Applied to Handwritten Zip Code

    Recognition. Yann LeCun, et al., 1989.
  40. Turing Award in 2018 39 Yann Lecun Geoffrey Hinton Yoshua

    Bengio AT&T → Facebook/NYU Google/U. Toronto U. Montreal
  41. DNNs in 1989 40 Backpropagation Applied to Handwritten Zip Code

    Recognition. Yann LeCun, et al., 1989.
  42. MNIST 41 https://www.usenix.org/conference/usenixsecurity18/presentation/mickens James Mickens’ USENIX Security Symposium 2018 (Keynote)

    MNIST Dataset
  43. MNIST Dataset 42 2 8 7 6 8 6 5

    9 70 000 images (60 000 training, 10 000 testing) 28×28 pixels, 8-bit grayscale scanned hand-written digits labeled by humans LeCun, Cortes, Burges [1998]
  44. MNIST Dataset 43 2 8 7 6 8 6 5

    9 70 000 images (60 000 training, 10 000 testing) 28×28 pixels, 8-bit grayscale scanned hand-written digits labeled by humans LeCun, Cortes, Burges [1998]
  45. Progress in MNIST 44 Year Error Rate 1998 [Yann LeCun,

    et al.] 5% error rate (12.1% rejection for 1% error rate) 2013 [..., Yann Le Cun, ...] 0.21% (21 out of 10,000 tests)
  46. CIFAR-10 (and CIFAR-100) 45 truck ship horse frog dog deer

    cat bird automobile airplane 60 000 images 32×32 pixels, 24-bit color human-labeled subset of images in 10 classes from Tiny Images Dataset Alex Krizhevsky [2009]
  47. 46 14M high-resolution full color images Manually annotated in WordNet

    ~20,000 synonym set (~1000 images in each)
  48. Example CNN Architectures 47 Image from Deep Residual Learning for

    Image Recognition, Kaiming He Xiangyu Zhang Shaoqing Ren Jian Sun, 2015
  49. 48 Image from Deep Residual Learning for Image Recognition, Kaiming

    He Xiangyu Zhang Shaoqing Ren Jian Sun, 2015 Test error Training error Accuracy on CIFAR-10
  50. Inception 49 https://arxiv.org/pdf/1905.11946.pdf Image from Mingxing Tan, Quoc V. Le.

    EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. ICML 2019.
  51. Training a DNN 50

  52. 51 https://youtu.be/TVmjjfTvnFs

  53. Training a Network 52 select a network architecture, ! "

    ← initialize with random parameters while (still improving): " ← adjust parameters(!, ", &, ')
  54. Goal of Training: Minimize Loss 53 Define a Loss Function:

    !"# = 1 & ' ()* + ,- .( − 0( 1 Mean Square Error: (Maximize) Likelihood Estimation: ℒ = 3 ()* + 4 0 5) log ℒ = ' ()* + log 4 0 5) (Maximize) Log-Likelihood Estimation:
  55. Training a Network 54 select a network architecture, ! "

    ← initialize with random parameters while (still improving): " ← adjust parameters(!, ", &, ')
  56. Training a Network 55 select a network architecture, ! "

    ← initialize with random parameters while ($%&&(!( , *, +) > goal and funding > 0): " ← adjust parameters(!, ", *, +)
  57. while (available_students > 0 and funding > 0): Finding a

    Good Architecture 56 select a network architecture, ! " ← initialize with random parameters while ($%&&(!( , *, +) > goal and funding > 0): " ← adjust parameters(!, ", *, +)
  58. Gradient Descent 57 ℒ",$ (&) & Goal: find & that

    minimizes ℒ",$ (&).
  59. ! Gradient Descent 58 ℒ#,% (!) Pick a random starting

    point Follow gradient (first derivative): to minimize, negative direction ℒ′#,% (!) !) = !)+, − . / ∇ℒ#,% (!)+, )
  60. ! Gradient Descent: Non-Convex Loss 59 ℒ#,% (!) Pick a

    random starting point Follow gradient (first derivative): to minimize, negative direction ℒ′#,% (!) !) = !)+, − . / ∇ℒ#,% (!)+, ) Repeat many times, hopefully find global minimum
  61. ! Mini-Batch Stochastic Gradient Descent 60 ℒ#,% (!) Pick a

    random starting point Follow gradient (first derivative): to minimize, negative direction ℒ′#,% (!) !) = !)+, − . / ∇ℒ#,% (!)+, ) Repeat many times, hopefully find global minimum To reduce computation, evaluate gradient of loss on randomly selected subset (“mini-batch”)
  62. Cost of Training 61 https://openai.com/blog/ai-and-compute/

  63. Cost of Training 62 https://openai.com/blog/ai-and-compute/

  64. 63

  65. Adversarial Machine Learning 64

  66. Labelled Training Data ML Algorithm Feature Extraction Vectors Deployment Malicious

    / Benign Operational Data Trained Classifier Training (supervised learning) Statistical Machine Learning
  67. Labelled Training Data ML Algorithm Feature Extraction Vectors Deployment Malicious

    / Benign Operational Data Trained Classifier Training (supervised learning) Assumption: Training Data is Representative
  68. Deployment Adversaries Don’t Cooperate Assumption: Training Data is Representative Training

    Poisoning
  69. Adversaries Don’t Cooperate Assumption: Training Data is Representative Evading Deployment

    Training
  70. Adversarial Examples for DNNs 69 0.007 × [&'()*] + =

    “panda” “gibbon” Example from: Ian J. Goodfellow, Jonathon Shlens, Christian Szegedy. Explaining and Harnessing Adversarial Examples. 2014 (in ICLR 2015)
  71. 0 200 400 600 800 1000 1200 1400 1600 1800

    2018 2017 2016 2015 2014 2013 70 Papers on “Adversarial Examples” (Google Scholar) 1826.68 papers expected in 2018!
  72. 0 500 1000 1500 2000 2500 3000 2019 2018 2017

    2016 2015 2014 2013 71 Papers on “Adversarial Examples” (Google Scholar) 2901.67 papers expected in 2019!
  73. 0 500 1000 1500 2000 2019 2018 2017 2016 2015

    2014 2013 72 Dash of “Theory” ICML Workshop 2015 15% of 2018 and 2019 “adversarial examples” papers contain “theorem” and “proof”
  74. 73 Battista Biggio, et al. ECML-KDD 2013

  75. Defining Adversarial Example 74 Assumption: small perturbation does not change

    class in “Reality Space” (human perception) Given seed sample, !, !" is an adversarial example iff: # !" = % Class is % (targeted) or # !" ≠ #(!) Class is different (untargeted) ∆ !, !" ≤ , Similar to seed ! Difference below threshold
  76. 75 Dog Random Direction Random Direction Slide by Nicholas Carlini

  77. 76 Dog Random Direction Random Direction Slide by Nicholas Carlini

    Truck
  78. 77 Dog Truck Adversarial Direction Random Direction Slide by Nicholas

    Carlini Airplane
  79. 78 Weilin Xu et al. “Magic Tricks for Self- driving

    Cars”, Defcon-CAAD, 2018. Benign Malignant Melanoma Diagnosis Samuel G Finlayson et al. “Adversarial attacks on medical machine learning”, Science, 2019. Mahmood Sharif et al. “Accessorize to a Crime: Real and Stealthy Attacks on State-of-the-Art Face Recognition”, ACM CCS, 2016. =
  80. Natural Language 79 Examples by Hannah Chen Prediction: Positive (Confidence

    = 99.22) IMDB Movie Review Dataset Hilarious film, I had a great time watching it. The star (Cuneyt Arkin, sometimes credited as Steve Arkin) is a popular actor from Turkey. He has played in lots of tough-guy roles, epic-sword films, and romances. It was fun to see him with an international cast and some real lousy looking pair of gloves. If I remember it was also dubbed in English which made things even more funnier. (kinda like seeing John Wayne speak Turkish).
  81. Natural Language 80 Examples by Hannah Chen Prediction: Positive (Confidence

    = 91.06) IMDB Movie Review Dataset Hilarious film, I had a great time watching it. The star (Cuneyt Arkin, sometimes credited as Steve Arkin) is a popular actor from Turkey. He has played in lots of tough-guy roles, epic-sword films, and romances. It was fun to see him with an international cast and some real lousy looking pair of gloves. If I remember it was also dubbed in English which made things even more funnier. (kinda like seeing John Wayne speak Turkish). movies Target: Negative (Confidence = 8.94)
  82. Natural Language 81 Examples by Hannah Chen Prediction: Positive (Confidence

    = 92.28) IMDB Movie Review Dataset Hilarious film, I had a great time watching it. The star (Cuneyt Arkin, sometimes credited as Steve Arkin) is a popular actor from Turkey. He has played in lots of tough-guy roles, epic-sword films, and romances. It was fun to see him with an international cast and some real lousy looking pair of gloves. If I remember it was also dubbed in English which made things even more funnier. (kinda like seeing John Wayne speak Turkish). researching Target: Negative (Confidence = 7.72)
  83. Natural Language 82 Examples by Hannah Chen Prediction: Negative (Confidence

    = 73.33) IMDB Movie Review Dataset Hilarious film, I had a great time watching it. The star (Cuneyt Arkin, sometimes credited as Steve Arkin) is a popular actor from Turkey. He has played in lots of tough-guy roles, epic-sword films, and romances. It was fun to see him with an international cast and some real lousy looking pair of gloves. If I remember it was also dubbed in English which made things even more funnier. (kinda like seeing John Wayne speak Turkish). researching Target: Negative movies
  84. Defining Adversarial Example 83 Given seed sample, !, !" is

    an adversarial example iff: # !" = % Class is % (targeted) or # !" ≠ #(!) Class is different (untargeted) ∆ !, !" ≤ , Similar to seed ! Difference below threshold ∆ -, -" is defined in some (simple!) metric space.
  85. Distance Metrics !" norms: 84 !" ($, $′) = )

    *+ − *+ - " ./ “norm” (# different): ⋕ 1 *+ ≠ *+ -) !3 norm: ∑ |*+ − *+ -| .6 norm (“Euclidean”): ∑(*+ −* + -)7 .8 norm: max(*+ −*+ -) Useful for theory and experiments, but not realistic!
  86. 85 Images by Nicholas Carlini Original Image (!)

  87. 86 Images by Nicholas Carlini Original Image (!) Adversarial Image:

    "# !, !% = '(
  88. 87 Images by Nicholas Carlini Original Image (!) Adversarial Image:

    "# !, !% = '(
  89. Other Distance Metrics 88 Set of transformations: rotate, scale, “fog”,

    color, etc. NLP: word substitutions (synonym constraints) Semantic distance: ℬ "′) = ℬ(" Behavior we care about is the same Malware: it still behaves maliciously Vision: still looks like a “cat” to most humans We’ll get back to these...for now, let’s assume '( norms (like most research) despite flaws.
  90. 89 Dog Truck Adversarial Direction Random Direction Slide by Nicholas

    Carlini Airplane How can we find nearby adversarial example?
  91. 90 Slide by Nicholas Carlini

  92. 91 Visualization by Nicholas Carlini

  93. Fast Gradient Sign 92 original 0.1 0.2 0.3 0.4 0.5

    Adversary Power: ! "# -bounded adversary: max(abs(*+ −*+ -)) ≤ ! *- = * − ! ⋅ sign(∇* 6(*, 8)) Goodfellow, Shlens, Szegedy 2014
  94. Impact of Adversarial Perturbations 93 Distance between layer output and

    its output for original seed FGSM ! = 0.0245 CIFAR-10 DenseNet 95th percentile 5th percentile
  95. Impact of Adversarial Perturbations 94 Distance between layer output and

    its output for original seed Random noise (same amount) FGSM ! = 0.0245 CIFAR-10 DenseNet
  96. Basic Iterative Method (BIM) 95 !" # = ! for

    % iterations: !&'( # = clip-,/ (!& # − 2 ⋅ sign(∇ 8 !& # , 9 ) !# = !; ′ A. Kurakin, I. Goodfellow, and S. Bengio 2016
  97. Projected Gradient Descent (PGD) 96 !" # = ! for

    % iterations: !&'( # = project0,2 (!& # − 5 ⋅ sign(∇ < !& # , = ) !# = !? ′ A. Kurakin, I. Goodfellow, and S. Bengio 2016
  98. Carlini/Wagner 97 min $ (∆(', ' + *) + ,

    ⋅ .(' + *)) such that (/ + *) ∈ 0, 1 3 Formulate optimization problem where . is defined objective function: . /4 ≥ 0 iff 7 /4 = 9 model output matches target Nicholas Carlini, David Wagner IEEE S&P 2017 Optimization problem that can be solved by standard optimizers Adam (SGD + momentum) [Kingman, Ba 2015]
  99. Carlini/Wagner 98 Formulate optimization problem where ! is defined objective

    function: ! "# ≥ 0 iff ( "# = * model output matches target ! "# = max . /0 Z "# . − Z "# 0 ( 3 = ( 4 ( 5 67 … ( 9 ( 7 ((3) Z(") Nicholas Carlini, David Wagner IEEE S&P 2017 min = (∆(3, 3 + A) + B ⋅ !(3 + A)) such that (" + A) ∈ 0, 1 F softmax
  100. Carlini/Wagner: !" Attack 99 # $% = max * +,

    Z $% * − Z $% , Nicholas Carlini, David Wagner IEEE S&P 2017 min 1 (∆(4, 4 + 7) + 9 ⋅ #(4 + 7)) such that ($ + 7) ∈ 0, 1 > 7? = 1 2 (tanh C* + 1) + $*
  101. Carlini/Wagner: !" Attack 100 # $% = max * +,

    Z $% * − Z $% , Nicholas Carlini, David Wagner IEEE S&P 2017 min 1 ( 3 " (tanh 6 + 1) − : " " + ; ⋅ #(3 " (tanh 6 + 1) )) # $% = max(max * +, Z $% * − Z $% , , −>) confidence parameter
  102. 101

  103. Impact of Adversarial Perturbations 102 Distance between layer output and

    its output for original seed Random noise (same amount) Carlini- Wagner L2 CIFAR-10 DenseNet
  104. Content-Space Attacks 103 What is there is no gradient to

    follow?
  105. Example: PDF Malware

  106. Finding Evasive Malware 105 Given seed sample, !, with desired

    malicious behavior find an adversarial example !" that satisfies: # !" = “&'()*(” Model misclassifies ℬ !′) = ℬ(! Malicious behavior preserved Generic attack: heuristically explore input space for !′ that satisfies definition. No requirement that ! ~ !′ except through ℬ.
  107. PDF Malware Classifiers Random Forest Random Forest Support Vector Machine

    Features Object counts, lengths, positions, … Object structural paths Very robust against “strongest conceivable mimicry attack”. Automated Features Manual Features PDFrate [ACSA 2012] Hidost16 [JIS 2016] Hidost13 [NDSS 2013]
  108. Variants Evolutionary Search Clone Benign PDFs Malicious PDF Mutation 01011001101

    Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Benign Oracle Weilin Xu Yanjun Qi Fitness Selection Mutant Generation
  109. Variants Generating Variants Clone Benign PDFs Malicious PDF Mutation 01011001101

    Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Fitness Selection Mutant Generation
  110. PDF Structure

  111. Variants Generating Variants Clone Benign PDFs Malicious PDF Mutation 01011001101

    Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Fitness Selection Mutant Generation
  112. Variants Generating Variants Clone Benign PDFs Malicious PDF Mutation 01011001101

    Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Found Evasive ? 0 /JavaScript eval(‘…’); /Root /Catalog /Pages Select random node Randomly transform: delete, insert, replace
  113. Variants Generating Variants Clone Benign PDFs Malicious PDF Mutation 01011001101

    Variants Variants Select Variants Found Evasive? Found Evasive ? Select random node Randomly transform: delete, insert, replace Nodes from Benign PDFs 0 /JavaScript eval(‘…’); /Root /Catalog /Pages 128 546 7 63 128
  114. Variants Selecting Promising Variants Clone Benign PDFs Malicious PDF Mutation

    01011001101 Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Fitness Selection Mutant Generation
  115. Variants Selecting Promising Variants Clone Benign PDFs Malicious PDF Mutation

    01011001101 Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Fitness Function Candidate Variant !(#$%&'() , #'(&++ ) Score Malicious 0 /JavaScript eval(‘…’); /Root /Catalog /Pages 128 Oracle Target Classifier
  116. Oracle: ℬ "′) = ℬ(" ? Execute candidate in vulnerable

    Adobe Reader in virtual environment Behavioral signature: malicious if signature matches https://github.com/cuckoosandbox Simulated network: INetSim Cuckoo HTTP_URL + HOST extracted from API traces
  117. Fitness Function Assumes lost malicious behavior will not be recovered

    !itness '′ = * 1 − classi!ier_score '3 if ℬ '′) = ℬ(' −∞ otherwise
  118. 0 100 200 300 400 500 0 100 200 300

    Seeds Evaded (out of 500) PDFRate Number of Mutations Hidost
  119. 0 100 200 300 400 500 0 100 200 300

    Seeds Evaded (out of 500) PDFRate Number of Mutations Hidost Simple transformations often worked
  120. 0 100 200 300 400 500 0 100 200 300

    Seeds Evaded (out of 500) PDFRate Number of Mutations Hidost (insert, /Root/Pages/Kids, 3:/Root/Pages/Kids/4/Kids/5/) Works on 162/500 seeds
  121. 0 100 200 300 400 500 0 100 200 300

    Seeds Evaded (out of 500) PDFRate Number of Mutations Hidost Some seeds required complex transformations
  122. Malicious Label Threshold Original Malicious Seeds Evading PDFrate Classification Score

    Malware Seed (sorted by original score) Discovered Evasive Variants
  123. Discovered Evasive Variants Malicious Label Threshold Original Malicious Seeds Adjust

    threshold? Charles Smutz, Angelos Stavrou. When a Tree Falls: Using Diversity in Ensemble Classifiers to Identify Evasion in Malware Detectors. NDSS 2016. Classification Score Malware Seed (sorted by original score)
  124. Variants found with threshold = 0.25 Variants found with threshold

    = 0.50 Adjust threshold? Classification Score Malware Seed (sorted by original score)
  125. Variants Hide the Classifier Score? Clone Benign PDFs Malicious PDF

    Mutation 01011001101 Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Fitness Function Candidate Variant !(#$%&'() , #'(&++ ) Score Malicious 0 /JavaScript eval(‘…’); /Root /Catalog /Pages 128 Oracle Target Classifier
  126. Variants Binary Classifier Output is Enough Clone Benign PDFs Malicious

    PDF Mutation 01011001101 Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Fitness Function Candidate Variant !(#$%&'() , #'(&++ ) Score Malicious 0 /JavaScript eval(‘…’); /Root /Catalog /Pages 128 Oracle Target Classifier ACM CCS 2017
  127. Labelled Training Data ML Algorithm Feature Extraction Vectors Deployment Malicious

    / Benign Operational Data Trained Classifier Training (supervised learning) Retrain Classifier
  128. Labelled Training Data ML Algorithm Feature Extraction Vectors Training (supervised

    learning) Clone 01011001 101 EvadeML Deployment
  129. 0 100 200 300 400 500 0 200 400 600

    800 Seeds Evaded (out of 500) Generations Hidost16 Original classifier: Takes 614 generations to evade all seeds
  130. 0 100 200 300 400 500 0 200 400 600

    800 HidostR1 Seeds Evaded (out of 500) Generations Hidost16
  131. 0 100 200 300 400 500 0 200 400 600

    800 HidostR1 Seeds Evaded (out of 500) Generations Hidost16
  132. 0 100 200 300 400 500 0 200 400 600

    800 HidostR1 HidostR2 Seeds Evaded (out of 500) Generations Hidost16
  133. 0 100 200 300 400 500 0 200 400 600

    800 HidostR1 HidostR2 Seeds Evaded (out of 500) Generations Hidost16
  134. 0 100 200 300 400 500 0 200 400 600

    800 Hidost16 Genome Contagio Benign Hidost16 0.00 0.00 HidostR1 0.78 0.30 HidostR2 0.85 0.53 False Positive Rates HidostR1 Seeds Evaded (out of 500) Generations HidostR2
  135. 0 100 200 300 400 500 0 200 400 600

    800 Hidost16 Genome Contagio Benign Hidost16 0.00 0.00 HidostR1 0.78 0.30 HidostR2 0.85 0.53 False Positive Rates HidostR1 Seeds Evaded (out of 500) Generations HidostR2
  136. 135 Only 8/6987 robust features (Hidost) Robust classifier High false

    positives /Names /Names /JavaScript /Names /JavaScript /Names /Names /JavaScript /JS /OpenAction /OpenAction /JS /OpenAction /S /Pages USENIX Security 2019
  137. Malware Classification Moral To build robust, effective malware classifiers need

    robust features that are strong signals for malware. 136 If you have features like this – don’t need ML! There are scenarios where adversarial training “works” [more tomorrow].
  138. Recap: Adversarial Examples across Domains 137 Domain Classifier Space “Reality”

    Space Trojan Wars Judgment of Trojans !(#) = “gift” Physical Reality !∗(#) = invading army Malware Malware Detector !(#) = “benign” Victim’s Execution !∗(#) = malicious behavior Image Classification DNN Classifier !(#) = ) Human Perception !∗(#) = *
  139. Tomorrow: Defenses 138 David Evans University of Virginia evans@virginia.edu https://www.cs.virginia.edu/evans