Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Adversarial Machine Learning: Are We Playing the Wrong Game?

David Evans
July 10, 2017

Adversarial Machine Learning: Are We Playing the Wrong Game?

CISPA Distinguished Lecture
Center for IT-Security, Privacy and Accountability
Universität des Saarlandes
10 July 2017

https://privacy-sfb.cispa.saarland/blog/distinguished-lecture-adversarial-machine-learning-are-we-playing-the-wrong-game/

David Evans

July 10, 2017
Tweet

More Decks by David Evans

Other Decks in Science

Transcript

  1. Adversarial Machine Learning: Are We Playing the Wrong Game? David

    Evans University of Virginia work mostly with Weilin Xu and Yanjun Qi evadeML.org Center for IT-Security, Privacy and Accountability, Universität des Saarlandes 10 July 2017
  2. 2

  3. … and can solve all Security Problems! Fake Spam IDS

    Malware Fake Accounts … “Fake News”
  4. Labelled Training Data ML Algorithm Feature Extraction Vectors Deployment Malicious

    / Benign Operational Data Trained Classifier Training (supervised learning)
  5. Labelled Training Data ML Algorithm Feature Extraction Vectors Deployment Malicious

    / Benign Operational Data Trained Classifier Training (supervised learning) Assumption: Training Data is Representative
  6. Adversarial Examples 10 0.007 × [] + = “panda” “gibbon”

    Example from: Ian J. Goodfellow, Jonathon Shlens, Christian Szegedy. Explaining and Harnessing Adversarial Examples. ICLR 2015.
  7. Goal of Machine Learning Classifier 11 Metric Space 1: Target

    Classifier Metric Space 2: “Oracle” Model and visualization based on work by Beilun Wang, Ji Gao and Yanjun Qi (ICLR 2017 Workshop)
  8. Well-Trained Classifier 12 Metric Space 1: Target Classifier Metric Space

    2: “Oracle” Model and visualization based on work by Beilun Wang, Ji Gao and Yanjun Qi (ICLR 2017 Workshop)
  9. Adversarial Examples 13 Metric Space 1: Target Classifier Metric Space

    2: “Oracle” Model and visualization based on work by Beilun Wang, Ji Gao and Yanjun Qi (ICLR 2017 Workshop)
  10. Adversarial Examples 14 Metric Space 1: Target Classifier Metric Space

    2: “Oracle” Adversary’s goal: find a small perturbation that changes class for classifier, but imperceptible to oracle.
  11. Misleading Visualization 15 Metric Space 1: Target Classifier Cartoon Reality

    2 dimensions thousands of dimensions few samples near boundaries all samples near boundaries every sample near 1-3 classes every sample near all classes
  12. Formalizing Adversarial Examples Game 16 Given seed sample, , find

    0 where: 0 ≠ () Class is different ∆ , 0 ≤ Difference below threshold
  13. Formalizing Adversarial Examples Game 17 Given seed sample, , find

    0 where: 0 ≠ () Class is different ∆ , 0 ≤ Difference below threshold ∆ is defined in some metric space: 9 “norm” (# different): ⋕ < ≠ < 0) >norm: ∑ |< − < 0| Cnorm (“Euclidean”): ∑(< −< 0)C Dnorm: max(< −< 0)
  14. Targeted Attacks 18 Given seed sample, , find 0 where:

    0 ≠ () Class is different ∆ , 0 ≤ Difference below threshold Untargeted Attack Given seed sample, , and target class, , find 0 where: 0 = Class is ∆ , 0 ≤ Difference below threshold Targeted Attack
  15. Datasets MNIST 19 2 8 7 6 8 6 5

    9 70 000 images 28×28 pixels, 8-bit grayscale scanned hand-written digits labeled by humans LeCun, Cortes, Burges [1998]
  16. Datasets MNIST CIFAR-10 20 2 8 7 6 8 6

    5 9 70 000 images 28×28 pixels, 8-bit grayscale scanned hand-written digits labeled by humans truck ship horse frog dog deer cat bird automobile airplane 60 000 images 32×32 pixels, 24-bit color human-labeled subset of images in 10 classes from Tiny Images Dataset Alex Krizhevsky [2009] LeCun, Cortes, Burges [1998]
  17. ImageNet 21 14 Million high-resolution, full color images Manually annotated

    in WordNet ~20,000 synonym sets (~1000 images in each) Models: MobileNet (Top-1 accuracy: .684 / Top-5: .882) Inception v3 (Top-1: .763 / Top-5: .930)
  18. 22

  19. D Adversary (Fast Gradient Sign) 23 original 0.1 0.2 0.3

    0.4 0.5 Adversary Power: Dnorm adversary: max(< −< 0) < < 0 = < − ⋅ sign(lossS ())
  20. D Adversary: Binary Filter 24 original 0.1 0.2 0.3 0.4

    0.5 Adversary Power: 1-bit filter
  21. AdversarialDNN Playground 25 Andrew Norton and Yanjun Qi Live demo:

    https://evadeML.org/playground Will be integrated with EvadeML-Zoo models and attacks soon! L
  22. 26 Given seed sample, , find 0 where: 0 ≠

    () Class is different or 0 = Class is target class ∆ , 0 ≤ Difference below threshold Is this the right game?
  23. Arms Race 28 ICLR 2014 NDSS 2013 ICLR 2015 S&P

    2016 S&P 2017 NDSS 2016 NDSS 2016 This Talk Feb 2017
  24. New Idea: Detect Adversarial Examples 29 Given seed sample, ,

    find 0 where: 0 ≠ () Class is different ∆ , 0 ≤ Difference below threshold Deployed classifier only sees 0 - can we search for “”?
  25. 30 Model Model Model Filter 1 Filter 2 Prediction Prediction′

    Prediction′′ Compare Predictions Difference exceeds threshold Reject Prediction Ok Input Need filters that do not affect predictions on normal inputs, but that reverse malicious perturbations.
  26. “Feature Squeezing” 31 0 0 ≠ () [0.054, 0.4894, 0.9258,

    0.0116, 0.2898, 0.5222, 0.5074, …] [0.0491, 0.4903, 0.9292, 0.009, 0.2942, 0.5243, 0.5078, …]
  27. “Feature Squeezing” 32 [0.054, 0.4894, 0.9258, 0.0116, 0.2898, 0.5222, 0.5074,

    …] [0.0491, 0.4903, 0.9292, 0.009, 0.2942, 0.5243, 0.5078, …] [0.0, 0.5, 1.0, 0.0, 0.25, 0.5, 0.5, …] 0 Squeeze: < = round(< ×4)/4 Squeeze: < = round(< ×4)/4 [0.0, 0.5, 1.0, 0.0, 0.25, 0.5, 0.5, …] squeeze 0 ≈ squeeze ⟹ (squeeze 0 ) ≈ (squeeze )
  28. Squeezing Images 34 Reduce Color Depth Median Smoothing 8-bit greyscale

    1-bit monochrome 3x3 smoothing: Replace with median of pixels and its neighbors
  29. MNIST Results: Accuracy 35 Original (8) 7 6 5 4

    3 2 1 .9930 .9930 .9930 .9930 .9930 .9928 .9926 .9924 Reducing bit depth (all the way to 1) barely reduces model accuracy! Correct on original image, wrong on 1-bit filtered image (19) Wrong on original image, correct on 1-bit filtered image (13) (out of 10 000 MNIST test images) Both wrong, but differently
  30. Robustness Results (MNIST) 36 bit depth accuracy .00 .25 .50

    .75 1.00 8 7 6 5 4 3 2 1 non-adversarial (ε=0.0) ε=0.3 ε=0.2 ε=0.1 adversary strength (ε) .987 .944 .640 .107 0.0 0.1 0.2 0.3 0.4 0.5 0.6 8-bit (unfiltered) 1-bit filtered Even for strong adversaries, 1-bit filter effectively removes adversarial perturbations
  31. 9 Adversary (Jacobian-based Saliency Map) 37 original JSMA 9 “norm”

    (# different): ⋕ < ≠ < 0) Adversary strength = 0.1 (can modify up to 10% of pixels)
  32. Smoothing Results (MNIST) 39 .993 .988 .991 .980 .943 .845

    .650 .479 .014 .700 .976 .953 .906 .791 .616 .454 .00 .25 .50 .75 1.00 1 2 3 4 5 6 7 8 Adversarial (JSMA) Original accuracy smoothing window (×) No smoothing: adversary succeeds 98.6% of time
  33. Smoothing Results 40 .993 .988 .991 .980 .943 .845 .650

    .479 .014 .700 .976 .953 .906 .791 .616 .454 .00 .25 .50 .75 1.00 1 2 3 4 5 6 7 8 Adversarial (JSMA) Original accuracy smoothing window (×) .9257 .8592 .7812 .0100 .8400 .7500 1 2 3 4 MNIST CIFAR-10 2 × 2 smoothing defeats adversary, but reduces accuracy
  34. Carlini/Wagner Untargeted Attacks 41 Data Set Attack Accuracy on Adversarial

    Examples MNIST C 0.0 D 0.0 9 0.0 CIFAR-10 C 0.0 D 0.0 9 0.0 Nicholas Carlini, David Wagner. Oakland 2017 (Best Student Paper) Adversary suceeds 100% of the time with very small perturbations “Our D attacks on ImageNet are so successful that we can change the classification of an image to any desired label by only flipping the lowest bit of each pixel, a change that would be impossible to detect visually.”
  35. Squeezing Results (2x2 Median Smoothing) 42 Weilin Xu, David Evans,

    Yanjun Qi. https://arxiv.org/1705.10686 Data Set Attack Accuracy on Adversarial Examples Original Squeezed MNIST C 0.0 0.904 D 0.0 0.942 9 0.0 0.817 CIFAR-10 C 0.0 0.682 D 0.0 0.661 9 0.0 0.706
  36. Results on Carlini/Wagner Untargeted Attacks 43 Weilin Xu, David Evans,

    Yanjun Qi. https://arxiv.org/1705.10686 Data Set Attack Accuracy on Adversarial Examples Original Squeezed MNIST C 0.0 0.904 D 0.0 0.942 9 0.0 0.817 CIFAR-10 C 0.0 0.682 D 0.0 0.661 9 0.0 0.706 Accuracy on legitimate examples: 0.783
  37. Results on Carlini/Wagner Targeted Attacks 44 Weilin Xu, David Evans,

    Yanjun Qi. https://arxiv.org/1705.10686 Data Set Attack Adversary Success Rate Original Squeezed MNIST C 0.999 0.022 D 1.0 0.011 9 1.0 0.057 CIFAR-10 C 1.0 0.033 D 1.0 0.037 9 1.0 0.037
  38. Detecting Adversarial Examples 45 Model Model Model Squeeze Bit Depth

    Median Smoothing Prediction Prediction′ Prediction′′ Compare Predictions (> distance) Difference exceeds threshold Adversarial Normal Ok Input
  39. 46 0 50 100 150 200 250 300 350 0.0

    0.5 1.0 1.5 2.0 FGSM Non-adversarial > between normal and squeezed predictions MNIST binning-histogram over 1000 samples Detection Results JSMA
  40. 47 0 50 100 150 200 250 300 350 0.0

    0.5 1.0 1.5 2.0 FGSM Non-adversarial JSMA > between normal and squeezed predictions MNIST binning-histogram over 1000 samples Detection Results With max > threshold 0.362, detect 99.5% of adversarial examples with false positive rate of 1.5%.
  41. Detection Results on C/W Attacks Dataset ROC-AUC L1 Distance Threshold

    Accuracy True Positive Rate False Positive Rate MNIST 0.9950 0.1147 0.9880 0.9933 0.0173 CIFAR-10 0.8711 0.7423 0.8750 0.9527 0.2027 Weilin Xu, David Evans, Yanjun Qi. https://arxiv.org/1705.10686 (Validated results – use half samples to determine threshold, test with other half)
  42. 0 5 10 15 20 25 0.0 0.2 0.4 0.6

    0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 Non-adversarial FGSM > between normal and squeezed predictions ImageNet with MobileNet (histogram for 68 seeds)
  43. 0 5 10 15 20 25 0.0 0.2 0.4 0.6

    0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 DeepFool Carlini/Wagner (_2) FGSM Non-adversarial > between normal and squeezed predictions ImageNet with MobileNet (histogram for 68 seeds) Adversarial Success Rate: 100% Adversarial Success Rate: 47%
  44. Arms Race 51 ICLR 2014 ICLR 2015 S&P 2016 S&P

    2017 NDSS 2013 NDSS 2016 NDSS 2016 Feature Squeezing 15 June 2017 (arXiv) Quick Hack (not yet published) Weilin Xu, and others A new tweak Authors TBD Delta, my Epsilon! Authors TBD
  45. Raising the Bar or Changing the Game? 52 Metric Space

    1: Target Classifier Metric Space 2: “Oracle” Before: find a small perturbation that changes class for classifier, but imperceptible to oracle.
  46. Raising the Bar or Changing the Game? 53 Metric Space

    1: Target Classifier Metric Space 2: “Oracle” Before: find a small perturbation that changes class for classifier, but imperceptible to oracle. Now: change class for both original and squeezed classifier, but imperceptible to oracle.
  47. “Feature Squeezing” Conjecture For any distance-limited adversarial method, there exists

    some feature squeezer that accurately detects its adversarial examples. 54 Intuition: if the perturbation is small (in some simple metric space), there is some squeezer that coalesces original and adversarial example into same sample.
  48. Entropy Advantage Model Model Model Randomized Squeezer #1 Prediction Prediction′

    Prediction′′ Compare Predictions (> distance) Difference exceeds threshold Adversarial Normal Ok Input Randomized Squeezer #2 Squeezers can be selected randomly, and behave randomly different for each feature
  49. Changing the Game Option 1: Find distance-limited adversarial methods for

    which it is intractable to find effective feature squeezer. Option 2: Redefine adversarial examples so distance is not limited (in simple metric space). 56 focus of rest of the talk
  50. Faraway Adversarial Examples 58 Metric Space 1: Target Classifier Metric

    Space 2: “Oracle” Need a domain where we know Metric Space 2: “Oracle”
  51. 0 50 100 150 200 250 2006 2007 2008 2009

    2010 2011 2012 2013 2014 2015 2016 2017 Vulnerabilities reported in Adobe Acrobat Reader Source: http://www.cvedetails.com/vulnerability-list.php?vendor_id=53&product_id=921
  52. PDF Malware Classifiers Random Forest Random Forest Support Vector Machine

    Features Object counts, lengths, positions, … Object structural paths Very robust against “strongest conceivable mimicry attack”. Automated Features Manual Features PDFrate [ACSA 2012] Hidost16 [JIS 2016] Hidost13 [NDSS 2013]
  53. Variants Automated Classifier Evasion Using Genetic Programming Clone Benign PDFs

    Malicious PDF Mutation Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Benign Oracle
  54. Variants Generating Variants Clone Benign PDFs Malicious PDF Mutation Variants

    Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive?
  55. Variants Generating Variants Clone Benign PDFs Malicious PDF Mutation Variants

    Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Found Evasive ? 0 /JavaScript eval(‘…’); /Root /Catalog /Pages
  56. Variants Generating Variants Clone Benign PDFs Malicious PDF Mutation Variants

    Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Found Evasive ? 0 /JavaScript eval(‘…’); /Root /Catalog /Pages Select random node
  57. Variants Generating Variants Clone Benign PDFs Malicious PDF Mutation Variants

    Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Found Evasive ? 0 /JavaScript eval(‘…’); /Root /Catalog /Pages Select random node Randomly transform: delete, insert, replace
  58. Variants Generating Variants Clone Benign PDFs Malicious PDF Mutation Variants

    Variants Select Variants Found Evasive? Found Evasive ? Select random node Randomly transform: delete, insert, replace Nodes from Benign PDFs 0 /JavaScript eval(‘…’); /Root /Catalog /Pages 128 546 7 63 128
  59. Variants Selecting Promising Variants Clone Benign PDFs Malicious PDF Mutation

    Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive?
  60. Variants Selecting Promising Variants Clone Benign PDFs Malicious PDF Mutation

    Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Fitness Function Candidate Variant (efghij , higkk ) Score Malicious 0 /JavaScript eval(‘…’); /Root /Catalog /Pages 128 Oracle Target Classifier
  61. Oracle Execute candidate in vulnerable Adobe Reader in virtual environment

    Behavioral signature: malicious if signature matches https://github.com/cuckoosandbox Simulated network: INetSim Cuckoo HTTP_URL + HOST extracted from API traces Advantage: we know the target malware behavior
  62. Variants Selecting Promising Variants Clone Benign PDFs Malicious PDF Mutation

    Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Fitness Function Candidate Variant (efghij , higkk ) Score Malicious 0 /JavaScript eval(‘…’); /Root /Catalog /Pages 128 Oracle Target Classifier
  63. Fitness Function Assumes lost malicious behavior will not be recovered

    = m .5 − classifier_score if oracle = "malicious" −∞ otherwise classifier_score ≥ 0.5: labeled malicious
  64. Classifier Performance PDFrate Hidost Accuracy 0.9976 0.9996 False Negative Rate

    0.0000 0.0056 False Negative Rate against Adversary 1.0000 1.0000
  65. 0 100 200 300 400 500 0 100 200 300

    Seeds Evaded (out of 500) PDFRate Number of Mutations Hidost
  66. 0 100 200 300 400 500 0 100 200 300

    Seeds Evaded (out of 500) PDFRate Number of Mutations Hidost Simple transformations often worked
  67. 0 100 200 300 400 500 0 100 200 300

    Seeds Evaded (out of 500) PDFRate Number of Mutations Hidost (insert, /Root/Pages/Kids, 3:/Root/Pages/Kids/4/Kids/5/) Works on 162/500 seeds
  68. 0 100 200 300 400 500 0 100 200 300

    Seeds Evaded (out of 500) PDFRate Number of Mutations Hidost Works on 162/500 seeds Some seeds required complex transformations
  69. Insert: Threads, ViewerPreferences/Direction, Metadata, Metadata/Length, Metadata/Subtype, Metadata/Type, OpenAction/Contents, OpenAction/Contents/Filter, OpenAction/Contents/Length,

    Pages/MediaBox Delete: AcroForm, Names/JavaSCript/Names/S, AcroForm/DR/Encoding/PDFDocEncoding, AcroForm/DR/Encoding/PDFDocEncoding/Differences, AcroForm/DR/Encoding/PDFDocEncoding/Type, Pages/Rotate, AcroForm/Fields, AcroForm/DA, Outlines/Type, Outlines, Outlines/Count, Pages/Resources/ProcSet, Pages/Resources 85-step mutation trace evading Hidost Effective for 198/500 seeds
  70. 0 20 40 60 80 100 120 Hidost PDFrate Oracle

    Execution Cost Hours to find all 500 variants on one desktop PC Oracle Mutation Classifier
  71. Possible Defense: Adjust Threshold Charles Smutz, Angelos Stavrou. When a

    Tree Falls: Using Diversity in Ensemble Classifiers to Identify Evasion in Malware Detectors. NDSS 2016.
  72. Labelled Training Data ML Algorithm Feature Extraction Vectors Deployment Malicious

    / Benign Operational Data Trained Classifier Training (supervised learning) Retrain Classifier
  73. 0 100 200 300 400 500 0 200 400 600

    800 Seeds Evaded (out of 500) Generations Hidost16 Original classifier: Takes 614 generations to evade all seeds
  74. 0 100 200 300 400 500 0 200 400 600

    800 HidostR1 Seeds Evaded (out of 500) Generations Hidost16
  75. 0 100 200 300 400 500 0 200 400 600

    800 HidostR1 Seeds Evaded (out of 500) Generations Hidost16
  76. 0 100 200 300 400 500 0 200 400 600

    800 HidostR1 HidostR2 Seeds Evaded (out of 500) Generations Hidost16
  77. 0 100 200 300 400 500 0 200 400 600

    800 HidostR1 HidostR2 Seeds Evaded (out of 500) Generations Hidost16
  78. 0 100 200 300 400 500 0 200 400 600

    800 Hidost16 Genome Contagio Benign Hidost16 0.00 0.00 HidostR1 0.78 0.30 HidostR2 0.85 0.53 False Positive Rates HidostR1 Seeds Evaded (out of 500) Generations HidostR2
  79. Variants Hiding the Classifier Clone Benign PDFs Malicious PDF Mutation

    Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Fitness Function Candidate Variant (efghij , higkk ) Score Malicious 0 /JavaScript eval(‘…’); /Root /Catalog /Pages 128 Oracle Target Classifier
  80. Cross-Evasion Effects PDF Malware Seeds Hidost 13 Evasive PDF Malware

    (against PDFrate) Automated Evasion PDFrate 2/500 Evasive (0.4% Success) Potentially Good News?
  81. Evasive PDF Malware (against PDFrate) Cross-Evasion Effects PDF Malware Seeds

    Hidost 13 Automated Evasion PDFrate 2/500 Evasive (0.4% Success) Evasive PDF Malware (against Hidost) 387/500 Evasive (77.4% Success)
  82. Cross-Evasion Effects PDF Malware Seeds Automated Evasion 6/500 Evasive (1.2%

    Success) Hidost 13 Evasive PDF Malware (against Hidost)
  83. Evading Gmail’s Classifier Evasion rate on Gmail: 179/380 (47.1%) for

    javascript in pdf.all_js: javascript.append_code("var ucb=1;“) if pdf.get_size() < 7050000: pdf.add_padding(7050000 – pdf.get_size())
  84. Hopeful Conclusions Domain Knowledge is not Dead • Classifiers trained

    without understanding vulnerable • Adversaries can exploit unnecessary features Trust Requires Understanding • Good results against test data do not apply to adaptive adversaries but there is hope for building robust ML models!
  85. Credits Funding: National Science Foundation, Air Force Office of Scientific

    Research, Google, Microsoft, Amazon Weilin Xu Security Research Group Yanjun Qi