Lock in $30 Savings on PRO—Offer Ends Soon! ⏳

FOSAD Trustworthy Machine Learning: Class 1

FOSAD Trustworthy Machine Learning: Class 1

19th International School on Foundations of Security Analysis and Design
Mini-course on "Trustworthy Machine Learning"
https://jeffersonswheel.org/fosad2019
David Evans

Class 1: Introduction/Attacks

David Evans

August 26, 2019
Tweet

More Decks by David Evans

Other Decks in Education

Transcript

  1. Trustworthy Machine Learning David Evans University of Virginia jeffersonswheel.org Bertinoro,

    Italy 26 August 2019 19th International School on Foundations of Security Analysis and Design 1: Introduction/Attacks
  2. Plan for the Course Monday (Today) Introduction ML Background Attacks

    Tuesday (Tomorrow) Defenses Wednesday Privacy, Fairness, Abuse 1 Overall Goals: broad and whirlwind survey* of an exciting emerging research area explain a few of my favorite research results in enough detail to understand them at a high-level introduce some open problems that I hope you will work on and solve * but highly biased by my own interests
  3. 3 “Unfortunately, our translation systems made an error last week

    that misinterpreted what this individual posted. Even though our translations are getting better each day, mistakes like these might happen from time to time and we’ve taken steps to address this particular issue. We apologize to him and his family for the mistake and the disruption this caused.”
  4. 4

  5. Risks from Artificial Intelligence 6 Benign developers and operators AI

    out of control AI inadvertently causes harm Malicious operators Build AI to do harm Malicious abuse of benign AI On Robots Joe Berger and Pascal Wyse (The Guardian, 21 July 2018)
  6. Harmful AI Benign developers and operators AI out of control

    AI causes harm (without creators objecting) Malicious operators Build AI to do harm 7
  7. Harmful AI Benign developers and operators AI out of control

    AI inadvertently causes harm to humanity Malicious operators Build AI to do harm 10
  8. 12 On Robots Joe Berger and Pascal Wyse (The Guardian,

    21 July 2018) Human Jobs of the Future
  9. Harmful AI Benign developers AI out of control AI causes

    harm (without creators objecting) Malicious developers Using AI to do harm 14 Malice is (often) in the eye of the beholder (e.g., mass surveillance, pop-up ads, etc.)
  10. Automated Spear Phishing 15 “It’s slightly less effective [than manually

    generated] but it’s dramatically more efficient” (John Seymour) More malicious use of AI in 3rd lecture?
  11. Risks from Artificial Intelligence Benign developers and operators AI out

    of control AI inadvertently causes harm Malicious operators Build AI to do harm Malicious abuse of benign AI systems 16 rest of today and tomorrow
  12. 18

  13. More Ambition 19 “The human race will have a new

    kind of instrument which will increase the power of the mind much more than optical lenses strengthen the eyes and which will be as far superior to microscopes or telescopes as reason is superior to sight.”
  14. 21 Gottfried Wilhelm Leibniz (Universitat Altdorf, 1666) who advised: Jacob

    Bernoulli (Universitdt Basel, 1684) who advised: Johann Bernoulli (Universitdt Basel, 1694) who advised: Leonhard Euler (Universitat Basel, 1726) who advised: Joseph Louis Lagrange who advised: Simeon Denis Poisson who advised: Michel Chasles (Ecole Polytechnique, 1814) who advised: H. A. (Hubert Anson) Newton (Yale, 1850) who advised: E. H. Moore (Yale, 1885) who advised: Oswald Veblen (U. of Chicago, 1903) who advised: Philip Franklin (Princeton 1921) who advised: Alan Perlis (MIT Math PhD 1950) who advised: Jerry Feldman (CMU Math 1966) who advised: Jim Horning (Stanford CS PhD 1969) who advised: John Guttag (U. of Toronto CS PhD 1975) who advised: David Evans (MIT CS PhD 2000) my academic great- great-great-great- great-great-great- great-great-great- great-great-great- great-great- grandparent!
  15. More Precision 22 “The human race will have a new

    kind of instrument which will increase the power of the mind much more than optical lenses strengthen the eyes and which will be as far superior to microscopes or telescopes as reason is superior to sight.” Gottfried Wilhelm Leibniz (1679) Normal computing amplifies (quadrillions of times faster) and aggregates (enables millions of humans to work together) human cognitive abilities; AI goes beyond what humans can do.
  16. Operational Definition 23 If it is explainable, its not ML!

    “Artificial Intelligence” means making computers do things their programmers don’t understand well enough to program explicitly.
  17. Inherent Paradox of “Trustworthy” ML 24 If we could specify

    precisely what the model should do, we wouldn’t need ML to do it! “Artificial Intelligence” means making computers do things their programmers don’t understand well enough to program explicitly.
  18. Inherent Paradox of “Trustworthy” ML 25 If we could specify

    precisely what the model should do, we wouldn’t need ML to do it! Best we hope for is verifying certain properties M 1 M 2 ∀": $% " = $' (") DeepXplore: Automated Whitebox Testing of Deep Learning Systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana. SOSP 2017 Model Similarity
  19. Inherent Paradox of “Trustworthy” ML 26 Best we hope for

    is verifying certain properties M 1 M 2 ∀" ∈ $: &' " ≈ &) (") DeepXplore: Automated Whitebox Testing of Deep Learning Systems. Kexin Pei, Yinzhi Cao, Junfeng Yang, Suman Jana. SOSP 2017 Model Similarity M ∀" ∈ $, ∀∆ ∈ .: & " ≈ &(" + ∆) " " + ∆ Model Robustness 0 M 0∗
  20. Adversarial Robustness 27 M ∀" ∈ $, ∀∆ ∈ ':

    ) " ≈ )(" + ∆) " " + ∆ . M .∗ Adversary’s Goal: find a “small” perturbation that changes model output targeted attack: in some desired way Defender’s Goal: Robust Model: find model where this is hard Detection: detect inputs that are adversarial
  21. Not a new problem... 28 Or do you think any

    Greek gift’s free of treachery? Is that Ulysses’s reputation? Either there are Greeks in hiding, concealed by the wood, or it’s been built as a machine to use against our walls, or spy on our homes, or fall on the city from above, or it hides some other trick: Trojans, don’t trust this horse. Whatever it is, I’m afraid of Greeks even those bearing gifts.’ Virgil, The Aenid (Book II)
  22. Generic Classifier 30 !: # → Y Input: % ∈

    ℝ( Output (label): ) ∈ {1, … , .} Natural distribution: 0 ⊆ %, ) pairs
  23. Neural Network 31 ! " = ! $ ! %

    &' … ! ) ! ' !(") “layer”: ! , : mostly from ℝ. → ℝ0
  24. Activation Layer 32 . . . Layer t − 1

    . . . #$,& '() * & ' = ,(∑ $.) /(123) # $,& (' ())5 $ ('())) 5 $ ('())
  25. Activation Layer 33 . . . Layer ! − 1

    . . . $%,' ()* + ' ( = -(∑ %/* 0(234) $ %,' (( )*)6 % (()*)) Activation function 6 % (()*) ReLU: Rectified Linear Unit - 6 = 7 0, 6 < 0 6, 6 ≥ 0
  26. “Fancy” Layers: Convolution 34 . . . Layer ! −

    1 $ % & = (() * (&,-)) . . . . . . . . . . . . . . . . . . . . . × 0-- ⋯ 02- ⋮ ⋱ ⋮ 0-2 ⋯ 022
  27. “Fancy” Layers: Max Pooling 35 Layer ! − 1 $

    % & = (() * (&,-)) . . . . . . . . . . . . . . . . . . . . . . . . × 0-- ⋯ 02- ⋮ ⋱ ⋮ 0-2 ⋯ 022
  28. “Fancy” Layers: Max Pooling 36 max(%&& , %&( , %(&

    , %(( ) max(%*& , %*( , %+& , %+( ) max(%,& , %,( , %-& , %-( )
  29. Final Layer: SoftMax 37 . . . Layer ! −

    1 $%,' ()* + ' ( = -(/( )*) SoftMax function - / = 123 ∑ '5* 6 127 | 9 = 1, … , ; [0.03, 0.32, 0.01, A. BC, 0.00, 0.01] / % (E)*) It’s a “cat” (0.63 confidence).
  30. Turing Award in 2018 39 Yann Lecun Geoffrey Hinton Yoshua

    Bengio AT&T → Facebook/NYU Google/U. Toronto U. Montreal
  31. MNIST Dataset 42 2 8 7 6 8 6 5

    9 70 000 images (60 000 training, 10 000 testing) 28×28 pixels, 8-bit grayscale scanned hand-written digits labeled by humans LeCun, Cortes, Burges [1998]
  32. MNIST Dataset 43 2 8 7 6 8 6 5

    9 70 000 images (60 000 training, 10 000 testing) 28×28 pixels, 8-bit grayscale scanned hand-written digits labeled by humans LeCun, Cortes, Burges [1998]
  33. Progress in MNIST 44 Year Error Rate 1998 [Yann LeCun,

    et al.] 5% error rate (12.1% rejection for 1% error rate) 2013 [..., Yann Le Cun, ...] 0.21% (21 out of 10,000 tests)
  34. CIFAR-10 (and CIFAR-100) 45 truck ship horse frog dog deer

    cat bird automobile airplane 60 000 images 32×32 pixels, 24-bit color human-labeled subset of images in 10 classes from Tiny Images Dataset Alex Krizhevsky [2009]
  35. Example CNN Architectures 47 Image from Deep Residual Learning for

    Image Recognition, Kaiming He Xiangyu Zhang Shaoqing Ren Jian Sun, 2015
  36. 48 Image from Deep Residual Learning for Image Recognition, Kaiming

    He Xiangyu Zhang Shaoqing Ren Jian Sun, 2015 Test error Training error Accuracy on CIFAR-10
  37. Inception 49 https://arxiv.org/pdf/1905.11946.pdf Image from Mingxing Tan, Quoc V. Le.

    EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. ICML 2019.
  38. Training a Network 52 select a network architecture, ! "

    ← initialize with random parameters while (still improving): " ← adjust parameters(!, ", &, ')
  39. Goal of Training: Minimize Loss 53 Define a Loss Function:

    !"# = 1 & ' ()* + ,- .( − 0( 1 Mean Square Error: (Maximize) Likelihood Estimation: ℒ = 3 ()* + 4 0 5) log ℒ = ' ()* + log 4 0 5) (Maximize) Log-Likelihood Estimation:
  40. Training a Network 54 select a network architecture, ! "

    ← initialize with random parameters while (still improving): " ← adjust parameters(!, ", &, ')
  41. Training a Network 55 select a network architecture, ! "

    ← initialize with random parameters while ($%&&(!( , *, +) > goal and funding > 0): " ← adjust parameters(!, ", *, +)
  42. while (available_students > 0 and funding > 0): Finding a

    Good Architecture 56 select a network architecture, ! " ← initialize with random parameters while ($%&&(!( , *, +) > goal and funding > 0): " ← adjust parameters(!, ", *, +)
  43. ! Gradient Descent 58 ℒ#,% (!) Pick a random starting

    point Follow gradient (first derivative): to minimize, negative direction ℒ′#,% (!) !) = !)+, − . / ∇ℒ#,% (!)+, )
  44. ! Gradient Descent: Non-Convex Loss 59 ℒ#,% (!) Pick a

    random starting point Follow gradient (first derivative): to minimize, negative direction ℒ′#,% (!) !) = !)+, − . / ∇ℒ#,% (!)+, ) Repeat many times, hopefully find global minimum
  45. ! Mini-Batch Stochastic Gradient Descent 60 ℒ#,% (!) Pick a

    random starting point Follow gradient (first derivative): to minimize, negative direction ℒ′#,% (!) !) = !)+, − . / ∇ℒ#,% (!)+, ) Repeat many times, hopefully find global minimum To reduce computation, evaluate gradient of loss on randomly selected subset (“mini-batch”)
  46. 63

  47. Labelled Training Data ML Algorithm Feature Extraction Vectors Deployment Malicious

    / Benign Operational Data Trained Classifier Training (supervised learning) Statistical Machine Learning
  48. Labelled Training Data ML Algorithm Feature Extraction Vectors Deployment Malicious

    / Benign Operational Data Trained Classifier Training (supervised learning) Assumption: Training Data is Representative
  49. Adversarial Examples for DNNs 69 0.007 × [&'()*] + =

    “panda” “gibbon” Example from: Ian J. Goodfellow, Jonathon Shlens, Christian Szegedy. Explaining and Harnessing Adversarial Examples. 2014 (in ICLR 2015)
  50. 0 200 400 600 800 1000 1200 1400 1600 1800

    2018 2017 2016 2015 2014 2013 70 Papers on “Adversarial Examples” (Google Scholar) 1826.68 papers expected in 2018!
  51. 0 500 1000 1500 2000 2500 3000 2019 2018 2017

    2016 2015 2014 2013 71 Papers on “Adversarial Examples” (Google Scholar) 2901.67 papers expected in 2019!
  52. 0 500 1000 1500 2000 2019 2018 2017 2016 2015

    2014 2013 72 Dash of “Theory” ICML Workshop 2015 15% of 2018 and 2019 “adversarial examples” papers contain “theorem” and “proof”
  53. Defining Adversarial Example 74 Assumption: small perturbation does not change

    class in “Reality Space” (human perception) Given seed sample, !, !" is an adversarial example iff: # !" = % Class is % (targeted) or # !" ≠ #(!) Class is different (untargeted) ∆ !, !" ≤ , Similar to seed ! Difference below threshold
  54. 78 Weilin Xu et al. “Magic Tricks for Self- driving

    Cars”, Defcon-CAAD, 2018. Benign Malignant Melanoma Diagnosis Samuel G Finlayson et al. “Adversarial attacks on medical machine learning”, Science, 2019. Mahmood Sharif et al. “Accessorize to a Crime: Real and Stealthy Attacks on State-of-the-Art Face Recognition”, ACM CCS, 2016. =
  55. Natural Language 79 Examples by Hannah Chen Prediction: Positive (Confidence

    = 99.22) IMDB Movie Review Dataset Hilarious film, I had a great time watching it. The star (Cuneyt Arkin, sometimes credited as Steve Arkin) is a popular actor from Turkey. He has played in lots of tough-guy roles, epic-sword films, and romances. It was fun to see him with an international cast and some real lousy looking pair of gloves. If I remember it was also dubbed in English which made things even more funnier. (kinda like seeing John Wayne speak Turkish).
  56. Natural Language 80 Examples by Hannah Chen Prediction: Positive (Confidence

    = 91.06) IMDB Movie Review Dataset Hilarious film, I had a great time watching it. The star (Cuneyt Arkin, sometimes credited as Steve Arkin) is a popular actor from Turkey. He has played in lots of tough-guy roles, epic-sword films, and romances. It was fun to see him with an international cast and some real lousy looking pair of gloves. If I remember it was also dubbed in English which made things even more funnier. (kinda like seeing John Wayne speak Turkish). movies Target: Negative (Confidence = 8.94)
  57. Natural Language 81 Examples by Hannah Chen Prediction: Positive (Confidence

    = 92.28) IMDB Movie Review Dataset Hilarious film, I had a great time watching it. The star (Cuneyt Arkin, sometimes credited as Steve Arkin) is a popular actor from Turkey. He has played in lots of tough-guy roles, epic-sword films, and romances. It was fun to see him with an international cast and some real lousy looking pair of gloves. If I remember it was also dubbed in English which made things even more funnier. (kinda like seeing John Wayne speak Turkish). researching Target: Negative (Confidence = 7.72)
  58. Natural Language 82 Examples by Hannah Chen Prediction: Negative (Confidence

    = 73.33) IMDB Movie Review Dataset Hilarious film, I had a great time watching it. The star (Cuneyt Arkin, sometimes credited as Steve Arkin) is a popular actor from Turkey. He has played in lots of tough-guy roles, epic-sword films, and romances. It was fun to see him with an international cast and some real lousy looking pair of gloves. If I remember it was also dubbed in English which made things even more funnier. (kinda like seeing John Wayne speak Turkish). researching Target: Negative movies
  59. Defining Adversarial Example 83 Given seed sample, !, !" is

    an adversarial example iff: # !" = % Class is % (targeted) or # !" ≠ #(!) Class is different (untargeted) ∆ !, !" ≤ , Similar to seed ! Difference below threshold ∆ -, -" is defined in some (simple!) metric space.
  60. Distance Metrics !" norms: 84 !" ($, $′) = )

    *+ − *+ - " ./ “norm” (# different): ⋕ 1 *+ ≠ *+ -) !3 norm: ∑ |*+ − *+ -| .6 norm (“Euclidean”): ∑(*+ −* + -)7 .8 norm: max(*+ −*+ -) Useful for theory and experiments, but not realistic!
  61. Other Distance Metrics 88 Set of transformations: rotate, scale, “fog”,

    color, etc. NLP: word substitutions (synonym constraints) Semantic distance: ℬ "′) = ℬ(" Behavior we care about is the same Malware: it still behaves maliciously Vision: still looks like a “cat” to most humans We’ll get back to these...for now, let’s assume '( norms (like most research) despite flaws.
  62. 89 Dog Truck Adversarial Direction Random Direction Slide by Nicholas

    Carlini Airplane How can we find nearby adversarial example?
  63. Fast Gradient Sign 92 original 0.1 0.2 0.3 0.4 0.5

    Adversary Power: ! "# -bounded adversary: max(abs(*+ −*+ -)) ≤ ! *- = * − ! ⋅ sign(∇* 6(*, 8)) Goodfellow, Shlens, Szegedy 2014
  64. Impact of Adversarial Perturbations 93 Distance between layer output and

    its output for original seed FGSM ! = 0.0245 CIFAR-10 DenseNet 95th percentile 5th percentile
  65. Impact of Adversarial Perturbations 94 Distance between layer output and

    its output for original seed Random noise (same amount) FGSM ! = 0.0245 CIFAR-10 DenseNet
  66. Basic Iterative Method (BIM) 95 !" # = ! for

    % iterations: !&'( # = clip-,/ (!& # − 2 ⋅ sign(∇ 8 !& # , 9 ) !# = !; ′ A. Kurakin, I. Goodfellow, and S. Bengio 2016
  67. Projected Gradient Descent (PGD) 96 !" # = ! for

    % iterations: !&'( # = project0,2 (!& # − 5 ⋅ sign(∇ < !& # , = ) !# = !? ′ A. Kurakin, I. Goodfellow, and S. Bengio 2016
  68. Carlini/Wagner 97 min $ (∆(', ' + *) + ,

    ⋅ .(' + *)) such that (/ + *) ∈ 0, 1 3 Formulate optimization problem where . is defined objective function: . /4 ≥ 0 iff 7 /4 = 9 model output matches target Nicholas Carlini, David Wagner IEEE S&P 2017 Optimization problem that can be solved by standard optimizers Adam (SGD + momentum) [Kingman, Ba 2015]
  69. Carlini/Wagner 98 Formulate optimization problem where ! is defined objective

    function: ! "# ≥ 0 iff ( "# = * model output matches target ! "# = max . /0 Z "# . − Z "# 0 ( 3 = ( 4 ( 5 67 … ( 9 ( 7 ((3) Z(") Nicholas Carlini, David Wagner IEEE S&P 2017 min = (∆(3, 3 + A) + B ⋅ !(3 + A)) such that (" + A) ∈ 0, 1 F softmax
  70. Carlini/Wagner: !" Attack 99 # $% = max * +,

    Z $% * − Z $% , Nicholas Carlini, David Wagner IEEE S&P 2017 min 1 (∆(4, 4 + 7) + 9 ⋅ #(4 + 7)) such that ($ + 7) ∈ 0, 1 > 7? = 1 2 (tanh C* + 1) + $*
  71. Carlini/Wagner: !" Attack 100 # $% = max * +,

    Z $% * − Z $% , Nicholas Carlini, David Wagner IEEE S&P 2017 min 1 ( 3 " (tanh 6 + 1) − : " " + ; ⋅ #(3 " (tanh 6 + 1) )) # $% = max(max * +, Z $% * − Z $% , , −>) confidence parameter
  72. 101

  73. Impact of Adversarial Perturbations 102 Distance between layer output and

    its output for original seed Random noise (same amount) Carlini- Wagner L2 CIFAR-10 DenseNet
  74. Finding Evasive Malware 105 Given seed sample, !, with desired

    malicious behavior find an adversarial example !" that satisfies: # !" = “&'()*(” Model misclassifies ℬ !′) = ℬ(! Malicious behavior preserved Generic attack: heuristically explore input space for !′ that satisfies definition. No requirement that ! ~ !′ except through ℬ.
  75. PDF Malware Classifiers Random Forest Random Forest Support Vector Machine

    Features Object counts, lengths, positions, … Object structural paths Very robust against “strongest conceivable mimicry attack”. Automated Features Manual Features PDFrate [ACSA 2012] Hidost16 [JIS 2016] Hidost13 [NDSS 2013]
  76. Variants Evolutionary Search Clone Benign PDFs Malicious PDF Mutation 01011001101

    Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Benign Oracle Weilin Xu Yanjun Qi Fitness Selection Mutant Generation
  77. Variants Generating Variants Clone Benign PDFs Malicious PDF Mutation 01011001101

    Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Fitness Selection Mutant Generation
  78. Variants Generating Variants Clone Benign PDFs Malicious PDF Mutation 01011001101

    Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Fitness Selection Mutant Generation
  79. Variants Generating Variants Clone Benign PDFs Malicious PDF Mutation 01011001101

    Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Found Evasive ? 0 /JavaScript eval(‘…’); /Root /Catalog /Pages Select random node Randomly transform: delete, insert, replace
  80. Variants Generating Variants Clone Benign PDFs Malicious PDF Mutation 01011001101

    Variants Variants Select Variants Found Evasive? Found Evasive ? Select random node Randomly transform: delete, insert, replace Nodes from Benign PDFs 0 /JavaScript eval(‘…’); /Root /Catalog /Pages 128 546 7 63 128
  81. Variants Selecting Promising Variants Clone Benign PDFs Malicious PDF Mutation

    01011001101 Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Fitness Selection Mutant Generation
  82. Variants Selecting Promising Variants Clone Benign PDFs Malicious PDF Mutation

    01011001101 Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Fitness Function Candidate Variant !(#$%&'() , #'(&++ ) Score Malicious 0 /JavaScript eval(‘…’); /Root /Catalog /Pages 128 Oracle Target Classifier
  83. Oracle: ℬ "′) = ℬ(" ? Execute candidate in vulnerable

    Adobe Reader in virtual environment Behavioral signature: malicious if signature matches https://github.com/cuckoosandbox Simulated network: INetSim Cuckoo HTTP_URL + HOST extracted from API traces
  84. Fitness Function Assumes lost malicious behavior will not be recovered

    !itness '′ = * 1 − classi!ier_score '3 if ℬ '′) = ℬ(' −∞ otherwise
  85. 0 100 200 300 400 500 0 100 200 300

    Seeds Evaded (out of 500) PDFRate Number of Mutations Hidost
  86. 0 100 200 300 400 500 0 100 200 300

    Seeds Evaded (out of 500) PDFRate Number of Mutations Hidost Simple transformations often worked
  87. 0 100 200 300 400 500 0 100 200 300

    Seeds Evaded (out of 500) PDFRate Number of Mutations Hidost (insert, /Root/Pages/Kids, 3:/Root/Pages/Kids/4/Kids/5/) Works on 162/500 seeds
  88. 0 100 200 300 400 500 0 100 200 300

    Seeds Evaded (out of 500) PDFRate Number of Mutations Hidost Some seeds required complex transformations
  89. Malicious Label Threshold Original Malicious Seeds Evading PDFrate Classification Score

    Malware Seed (sorted by original score) Discovered Evasive Variants
  90. Discovered Evasive Variants Malicious Label Threshold Original Malicious Seeds Adjust

    threshold? Charles Smutz, Angelos Stavrou. When a Tree Falls: Using Diversity in Ensemble Classifiers to Identify Evasion in Malware Detectors. NDSS 2016. Classification Score Malware Seed (sorted by original score)
  91. Variants found with threshold = 0.25 Variants found with threshold

    = 0.50 Adjust threshold? Classification Score Malware Seed (sorted by original score)
  92. Variants Hide the Classifier Score? Clone Benign PDFs Malicious PDF

    Mutation 01011001101 Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Fitness Function Candidate Variant !(#$%&'() , #'(&++ ) Score Malicious 0 /JavaScript eval(‘…’); /Root /Catalog /Pages 128 Oracle Target Classifier
  93. Variants Binary Classifier Output is Enough Clone Benign PDFs Malicious

    PDF Mutation 01011001101 Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Fitness Function Candidate Variant !(#$%&'() , #'(&++ ) Score Malicious 0 /JavaScript eval(‘…’); /Root /Catalog /Pages 128 Oracle Target Classifier ACM CCS 2017
  94. Labelled Training Data ML Algorithm Feature Extraction Vectors Deployment Malicious

    / Benign Operational Data Trained Classifier Training (supervised learning) Retrain Classifier
  95. 0 100 200 300 400 500 0 200 400 600

    800 Seeds Evaded (out of 500) Generations Hidost16 Original classifier: Takes 614 generations to evade all seeds
  96. 0 100 200 300 400 500 0 200 400 600

    800 HidostR1 Seeds Evaded (out of 500) Generations Hidost16
  97. 0 100 200 300 400 500 0 200 400 600

    800 HidostR1 Seeds Evaded (out of 500) Generations Hidost16
  98. 0 100 200 300 400 500 0 200 400 600

    800 HidostR1 HidostR2 Seeds Evaded (out of 500) Generations Hidost16
  99. 0 100 200 300 400 500 0 200 400 600

    800 HidostR1 HidostR2 Seeds Evaded (out of 500) Generations Hidost16
  100. 0 100 200 300 400 500 0 200 400 600

    800 Hidost16 Genome Contagio Benign Hidost16 0.00 0.00 HidostR1 0.78 0.30 HidostR2 0.85 0.53 False Positive Rates HidostR1 Seeds Evaded (out of 500) Generations HidostR2
  101. 0 100 200 300 400 500 0 200 400 600

    800 Hidost16 Genome Contagio Benign Hidost16 0.00 0.00 HidostR1 0.78 0.30 HidostR2 0.85 0.53 False Positive Rates HidostR1 Seeds Evaded (out of 500) Generations HidostR2
  102. 135 Only 8/6987 robust features (Hidost) Robust classifier High false

    positives /Names /Names /JavaScript /Names /JavaScript /Names /Names /JavaScript /JS /OpenAction /OpenAction /JS /OpenAction /S /Pages USENIX Security 2019
  103. Malware Classification Moral To build robust, effective malware classifiers need

    robust features that are strong signals for malware. 136 If you have features like this – don’t need ML! There are scenarios where adversarial training “works” [more tomorrow].
  104. Recap: Adversarial Examples across Domains 137 Domain Classifier Space “Reality”

    Space Trojan Wars Judgment of Trojans !(#) = “gift” Physical Reality !∗(#) = invading army Malware Malware Detector !(#) = “benign” Victim’s Execution !∗(#) = malicious behavior Image Classification DNN Classifier !(#) = ) Human Perception !∗(#) = *