Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Adversarial Machine Learning: Are We Playing the Wrong Game?

Adversarial Machine Learning: Are We Playing the Wrong Game?

https://www.icsi.berkeley.edu/icsi/events/2017/06/adversarial-machine-learning

Presented by David Evans

Thursday, June 8, 2017
10:30 a.m.
ICSI Lecture Hall

Abstract:

Machine learning classifiers are increasingly popular for security applications, and often achieve outstanding performance in testing. When deployed, however, classifiers can be thwarted by motivated adversaries who adaptively construct adversarial examples that exploit flaws in the classifier's model. Much work on adversarial examples, including Carlini and Wagner’s attacks which are the best results to date, has focused on finding small distortions to inputs that fool a classifier. Previous defenses have been both ineffective and very expensive in practice. In this talk, I'll describe a new very simple strategy, feature squeezing, that can be used to harden classifiers by detecting adversarial examples. Feature squeezing reduces the search space available to an adversary by coalescing samples that correspond to many different inputs in the original space into a single sample. Adversarial examples can be detected by comparing the model's predictions on the original and squeezed sample. In practice, of course, adversaries are not limited to small distortions in a particular metric space. Indeed, it may be possible to make large changes to an input without losing its intended malicious behavior. We have developed an evolutionary framework to search for such adversarial examples, and demonstrated that it can automatically find evasive variants against state-of-the-art classifiers. This suggests that work on adversarial machine learning needs a better definition of adversarial examples, and to make progress towards understanding how classifiers and oracles perceive samples differently.

Speaker Bio:

David Evans (https://www.cs.virginia.edu/evans/) is a Professor of Computer Science at the University of Virginia and leader of the Security Research Group. He is the author of an open computer science textbook and a children's book on combinatorics and computability. He is Program Co-Chair for ACM Conference on Computer and Communications Security (CCS) 2017, and previously was Program Co-Chair for the 31st (2009) and 32nd (2010) IEEE Symposia on Security and Privacy (where he initiated the SoK papers). He has SB, SM and PhD degrees in Computer Science from MIT and has been a faculty member at the University of Virginia since 1999.

Project site: https://evademl.org
Demo: https://github.com/QData/AdversarialDNN-Playground

David Evans

June 08, 2017
Tweet

More Decks by David Evans

Other Decks in Science

Transcript

  1. Adversarial Machine Learning: Are We Playing the Wrong Game? David

    Evans University of Virginia work with Weilin Xu and Yanjun Qi evadeML.org ICSI, Berkeley CA 8 June 2017
  2. … and can solve all Security Problems! Fake Spam IDS

    Malware Fake Accounts … “Fake News”
  3. Labelled Training Data ML Algorithm Feature Extraction Vectors Deployment Malicious

    / Benign Operational Data Trained Classifier Training (supervised learning)
  4. Labelled Training Data ML Algorithm Feature Extraction Vectors Deployment Malicious

    / Benign Operational Data Trained Classifier Training (supervised learning) Assumption: Training Data is Representative
  5. Adversarial Examples 7 0.007 × [] + = “panda” “gibbon”

    Example from: Ian J. Goodfellow, Jonathon Shlens, Christian Szegedy. Explaining and Harnessing Adversarial Examples. ICLR 2015.
  6. Goal of Machine Learning Classifier 8 Metric Space 1: Target

    Classifier Metric Space 2: “Oracle” Model and visualization based on work by Beilun Wang, Ji Gao and Yanjun Qi (ICLR 2017 Workshop)
  7. Well-Trained Classifier 9 Metric Space 1: Target Classifier Metric Space

    2: “Oracle” Model and visualization based on work by Beilun Wang, Ji Gao and Yanjun Qi (ICLR 2017 Workshop)
  8. Adversarial Examples 10 Metric Space 1: Target Classifier Metric Space

    2: “Oracle” Model and visualization based on work by Beilun Wang, Ji Gao and Yanjun Qi (ICLR 2017 Workshop)
  9. Adversarial Examples 11 Metric Space 1: Target Classifier Metric Space

    2: “Oracle” Adversary’s goal: find a small perturbation that changes class for classifier, but imperceptible to oracle.
  10. Formalizing Adversarial Examples Game 12 Given seed sample, , find

    0 where: 0 ≠ () Class is different ∆ , 0 ≤ Difference below threshold
  11. Formalizing Adversarial Examples Game 13 Given seed sample, , find

    0 where: 0 ≠ () Class is different ∆ , 0 ≤ Difference below threshold ∆ is defined in some metric space: 9 “norm” (# different): ⋕ < ≠ < 0) >norm: ∑ |< − < 0| Cnorm (“Euclidean”): ∑(< −< 0)C Dnorm: max(< −< 0)
  12. Targeted Attacks 14 Given seed sample, , find 0 where:

    0 ≠ () Class is different ∆ , 0 ≤ Difference below threshold Untargeted Attack Given seed sample, , and target class, , find 0 where: 0 = Class is ∆ , 0 ≤ Difference below threshold Targeted Attack
  13. Datasets MNIST 15 2 8 7 6 8 6 5

    9 70 000 images 28×28 pixels, 8-bit grayscale scanned hand-written digits labeled by humans LeCun, Cortes, Burges [1998]
  14. Datasets MNIST CIFAR-10 16 2 8 7 6 8 6

    5 9 70 000 images 28×28 pixels, 8-bit grayscale scanned hand-written digits labeled by humans truck ship horse frog dog deer cat bird automobile airplane 60 000 images 32×32 pixels, 24-bit color human-labeled subset of images in 10 classes from Tiny Images Dataset Alex Krizhevsky [2009] LeCun, Cortes, Burges [1998]
  15. D Adversary (Fast Gradient Sign) 17 original 0.1 0.2 0.3

    0.4 0.5 Adversary Power: Dnorm adversary: max(< −< 0) < < 0 = < − ⋅ sign(lossS ())
  16. D Adversary: Binary Filter 18 original 0.1 0.2 0.3 0.4

    0.5 Adversary Power: 1-bit filter
  17. 20 Is this the right game? Given seed sample, ,

    find 0 where: 0 ≠ () Class is different ∆ , 0 ≤ Difference below threshold
  18. Arms Race 22 ICLR 2014 ICLR 2015 S&P 2016 S&P

    2017 NDSS 2013 NDSS 2016 NDSS 2016 This Talk
  19. New Idea: Detect Adversarial Examples 23 Given seed sample, ,

    find 0 where: 0 ≠ () Class is different ∆ , 0 ≤ Difference below threshold Deployed classifier only sees ′ - can we search for “”?
  20. 24 Model Model Model Filter 1 Filter 2 Prediction Prediction′

    Prediction′′ Compare Predictions Difference exceeds threshold Reject Prediction Ok Input Need filters that do not affect predictions on normal inputs, but that reverse malicious perturbations.
  21. “Feature Squeezing” 25 0 0 ≠ () [0.054, 0.4894, 0.9258,

    0.0116, 0.2898, 0.5222, 0.5074, …] [0.0491, 0.4903, 0.9292, 0.009, 0.2942, 0.5243, 0.5078, …]
  22. “Feature Squeezing” 26 [0.054, 0.4894, 0.9258, 0.0116, 0.2898, 0.5222, 0.5074,

    …] [0.0491, 0.4903, 0.9292, 0.009, 0.2942, 0.5243, 0.5078, …] [0.0, 0.5, 1.0, 0.0, 0.25, 0.5, 0.5, …] 0 Squeeze: < = round(< ×4)/4 Squeeze: < = round(< ×4)/4 [0.0, 0.5, 1.0, 0.0, 0.25, 0.5, 0.5, …] squeeze 0 ≈ squeeze ⟹ (squeeze 0 ) ≈ (squeeze )
  23. Squeezing Images 28 Reduce Color Depth Median Smoothing 8-bit greyscale

    1-bit monochrome 3x3 smoothing: Replace with median of pixels and its neighbors
  24. MNIST Results: Accuracy 29 Original (8) 7 6 5 4

    3 2 1 .9930 .9930 .9930 .9930 .9930 .9928 .9926 .9924 Reducing bit depth (all the way to 1) barely reduces model accuracy! Correct on original image, wrong on 1-bit filtered image (19) Wrong on original image, correct on 1-bit filtered image (13) (out of 10 000 MNIST test images) Both wrong, but differently
  25. Robustness Results (MNIST) 30 bit depth accuracy .00 .25 .50

    .75 1.00 8 7 6 5 4 3 2 1 non-adversarial (ε=0.0) ε=0.3 ε=0.2 ε=0.1 adversary strength (ε) .987 .944 .640 .107 0.0 0.1 0.2 0.3 0.4 0.5 0.6 8-bit (unfiltered) 1-bit filtered Even for strong adversaries, 1-bit filter effectively removes adversarial perturbations
  26. 9 Adversary (Jacobian-based Saliency Map) 31 original JSMA 9 “norm”

    (# different): ⋕ < ≠ < 0) Adversary strength = 0.1 (can modify up to 10% of pixels)
  27. Smoothing Results (MNIST) 33 .993 .988 .991 .980 .943 .845

    .650 .479 .014 .700 .976 .953 .906 .791 .616 .454 .00 .25 .50 .75 1.00 1 2 3 4 5 6 7 8 Adversarial (JSMA) Original accuracy smoothing window (×) No smoothing: adversary succeeds 98.6% of time
  28. Smoothing Results 34 .993 .988 .991 .980 .943 .845 .650

    .479 .014 .700 .976 .953 .906 .791 .616 .454 .00 .25 .50 .75 1.00 1 2 3 4 5 6 7 8 Adversarial (JSMA) Original accuracy smoothing window (×) .9257 .8592 .7812 .0100 .8400 .7500 1 2 3 4 MNIST CIFAR-10 2 × 2 smoothing defeats adversary, but reduces accuracy
  29. Carlini/Wagner Untargeted Attacks 35 Data Set Attack Accuracy on Adversarial

    Examples MNIST 2 0.0 ∞ 0.0 0 0.0 CIFAR-10 2 0.0 ∞ 0.0 0 0.0 Nicholas Carlini, David Wagner. Oakland 2017 (Best Student Paper) Adversary suceeds 100% of the time with very small perturbations “Our D attacks on ImageNet are so successful that we can change the classification of an image to any desired label by only flipping the lowest bit of each pixel, a change that would be impossible to detect visually.”
  30. Squeezing Results (2x2 Median Smoothing) 36 Weilin Xu, David Evans,

    Yanjun Qi. https://arxiv.org/1705.10686 Data Set Attack Accuracy on Adversarial Examples Original Squeezed MNIST 2 0.0 0.904 ∞ 0.0 0.942 0 0.0 0.817 CIFAR-10 2 0.0 0.682 ∞ 0.0 0.661 0 0.0 0.706
  31. Results on Carlini/Wagner Untargeted Attacks 37 Weilin Xu, David Evans,

    Yanjun Qi. https://arxiv.org/1705.10686 Data Set Attack Accuracy on Adversarial Examples Original Squeezed MNIST 2 0.0 0.904 ∞ 0.0 0.942 0 0.0 0.817 CIFAR-10 2 0.0 0.682 ∞ 0.0 0.661 0 0.0 0.706 Accuracy on legitimate examples: 0.783
  32. Results on Carlini/Wagner Targeted Attacks 38 Weilin Xu, David Evans,

    Yanjun Qi. https://arxiv.org/1705.10686 Data Set Attack Adversary Success Rate Original Squeezed MNIST 2 0.999 0.022 ∞ 1.0 0.011 0 1.0 0.057 CIFAR-10 2 1.0 0.033 ∞ 1.0 0.037 0 1.0 0.037
  33. Detecting Adversarial Examples 39 Model Model Model Squeeze Bit Depth

    Median Smoothing Prediction Prediction′ Prediction′′ Compare Predictions (> distance) Difference exceeds threshold Adversarial Normal Ok Input
  34. 40 0 50 100 150 200 250 300 350 0.0

    0.5 1.0 1.5 2.0 FGSM Non-adversarial > between normal and squeezed predictions MNIST binning-histogram over 1000 samples Detection Results JSMA
  35. 41 0 50 100 150 200 250 300 350 0.0

    0.5 1.0 1.5 2.0 FGSM Non-adversarial JSMA > between normal and squeezed predictions MNIST binning-histogram over 1000 samples Detection Results With max > threshold 0.362, detect 99.5% of adversarial examples with false positive rate of 1.5%.
  36. Detection Results on C/W Attacks Dataset ROC-AUC L1 Distance Threshold

    Accuracy True Positive Rate False Positive Rate MNIST 0.9950 0.1147 0.9880 0.9933 0.0173 CIFAR-10 0.8711 0.7423 0.8750 0.9527 0.2027 Weilin Xu, David Evans, Yanjun Qi. https://arxiv.org/1705.10686 (Validated results – use half samples to determine threshold, test with other half)
  37. Arms Race 43 ICLR 2014 ICLR 2015 S&P 2016 S&P

    2017 NDSS 2013 NDSS 2016 NDSS 2016 Feature Squeezing Warren He, James Wei, Xinyun Chen, Nicholas Carlini, Dawn Song (upcoming paper) Authors TBD (not yet started paper)
  38. Raising the Bar or Changing the Game? 44 Metric Space

    1: Target Classifier Metric Space 2: “Oracle” Before: find a small perturbation that changes class for classifier, but imperceptible to oracle.
  39. Raising the Bar or Changing the Game? 45 Metric Space

    1: Target Classifier Metric Space 2: “Oracle” Before: find a small perturbation that changes class for classifier, but imperceptible to oracle. Now: change class for both original and squeezed classifier, but imperceptible to oracle.
  40. “Feature Squeezing” Conjecture For any distance-limited adversarial method, there exists

    some feature squeezer that accurately detects its adversarial examples. 46 Intuition: if the perturbation is small (in some simple metric space), there is some squeezer that coalesces original and adversarial example into same sample.
  41. Changing the Game Option 1: Find distance-limited adversarial methods for

    which it is intractable to find effective feature squeezer. Option 2: Redefine adversarial examples so distance is not limited (in simple metric space). 47 focus of rest of the talk
  42. Faraway Adversarial Examples 49 Metric Space 1: Target Classifier Metric

    Space 2: “Oracle” Need a domain where we know Metric Space 2: “Oracle”
  43. 0 50 100 150 200 250 2006 2007 2008 2009

    2010 2011 2012 2013 2014 2015 2016 2017 Vulnerabilities reported in Adobe Acrobat Reader Source: http://www.cvedetails.com/vulnerability-list.php?vendor_id=53&product_id=921
  44. PDF Malware Classifiers Random Forest Random Forest Support Vector Machine

    Features Object counts, lengths, positions, … Object structural paths Very robust against “strongest conceivable mimicry attack”. Automated Features Manual Features PDFrate [ACSA 2012] Hidost16 [JIS 2016] Hidost13 [NDSS 2013]
  45. Variants Automated Classifier Evasion Using Genetic Programming Clone Benign PDFs

    Malicious PDF Mutation Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Benign Oracle
  46. Variants Generating Variants Clone Benign PDFs Malicious PDF Mutation Variants

    Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive?
  47. Variants Generating Variants Clone Benign PDFs Malicious PDF Mutation Variants

    Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Found Evasive ? 0 /JavaScript eval(‘…’); /Root /Catalog /Pages Select random node
  48. Variants Generating Variants Clone Benign PDFs Malicious PDF Mutation Variants

    Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Found Evasive ? 0 /JavaScript eval(‘…’); /Root /Catalog /Pages Select random node Randomly transform: delete, insert, replace
  49. Variants Generating Variants Clone Benign PDFs Malicious PDF Mutation Variants

    Variants Select Variants Found Evasive? Found Evasive ? Select random node Randomly transform: delete, insert, replace Nodes from Benign PDFs 0 /JavaScript eval(‘…’); /Root /Catalog /Pages 128 546 7 63 128
  50. Variants Selecting Promising Variants Clone Benign PDFs Malicious PDF Mutation

    Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive?
  51. Variants Selecting Promising Variants Clone Benign PDFs Malicious PDF Mutation

    Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Fitness Function Candidate Variant (efghij , higkk ) Score Malicious 0 /JavaScript eval(‘…’); /Root /Catalog /Pages 128 Oracle Target Classifier
  52. Oracle Execute candidate in vulnerable Adobe Reader in virtual environment

    Behavioral signature: malicious if signature matches https://github.com/cuckoosandbox Simulated network: INetSim Cuckoo HTTP_URL + HOST extracted from API traces Advantage: we know the target malware behavior
  53. Fitness Function Assumes lost malicious behavior will not be recovered

    = m .5 − classifier_score if oracle = "malicious" −∞ otherwise classifier_score ≥ 0.5: labeled malicious
  54. Classifier Performance PDFrate Hidost Accuracy 0.9976 0.9996 False Negative Rate

    0.0000 0.0056 False Negative Rate against Adversary 1.0000 1.0000
  55. 0 100 200 300 400 500 0 100 200 300

    Seeds Evaded (out of 500) PDFRate Number of Mutations Hidost
  56. 0 100 200 300 400 500 0 100 200 300

    Seeds Evaded (out of 500) PDFRate Number of Mutations Hidost Simple transformations often worked
  57. 0 100 200 300 400 500 0 100 200 300

    Seeds Evaded (out of 500) PDFRate Number of Mutations Hidost (insert, /Root/Pages/Kids, 3:/Root/Pages/Kids/4/Kids/5/) Works on 162/500 seeds
  58. 0 100 200 300 400 500 0 100 200 300

    Seeds Evaded (out of 500) PDFRate Number of Mutations Hidost Works on 162/500 seeds Some seeds required complex transformations
  59. Insert: Threads, ViewerPreferences/Direction, Metadata, Metadata/Length, Metadata/Subtype, Metadata/Type, OpenAction/Contents, OpenAction/Contents/Filter, OpenAction/Contents/Length,

    Pages/MediaBox Delete: AcroForm, Names/JavaSCript/Names/S, AcroForm/DR/Encoding/PDFDocEncoding, AcroForm/DR/Encoding/PDFDocEncoding/Differences, AcroForm/DR/Encoding/PDFDocEncoding/Type, Pages/Rotate, AcroForm/Fields, AcroForm/DA, Outlines/Type, Outlines, Outlines/Count, Pages/Resources/ProcSet, Pages/Resources 85-step mutation trace evading Hidost Effective for 198/500 seeds
  60. 0 20 40 60 80 100 120 Hidost PDFrate Oracle

    Execution Cost Hours to find all 500 variants on one desktop PC Oracle Mutation Classifier
  61. Possible Defense: Adjust Threshold Charles Smutz, Angelos Stavrou. When a

    Tree Falls: Using Diversity in Ensemble Classifiers to Identify Evasion in Malware Detectors. NDSS 2016.
  62. Labelled Training Data ML Algorithm Feature Extraction Vectors Deployment Malicious

    / Benign Operational Data Trained Classifier Training (supervised learning) Retrain Classifier
  63. 0 100 200 300 400 500 0 200 400 600

    800 Seeds Evaded (out of 500) Generations Hidost16 Original classifier: Takes 614 generations to evade all seeds
  64. 0 100 200 300 400 500 0 200 400 600

    800 HidostR1 Seeds Evaded (out of 500) Generations Hidost16
  65. 0 100 200 300 400 500 0 200 400 600

    800 HidostR1 Seeds Evaded (out of 500) Generations Hidost16
  66. 0 100 200 300 400 500 0 200 400 600

    800 HidostR1 HidostR2 Seeds Evaded (out of 500) Generations Hidost16
  67. 0 100 200 300 400 500 0 200 400 600

    800 HidostR1 HidostR2 Seeds Evaded (out of 500) Generations Hidost16
  68. 0 100 200 300 400 500 0 200 400 600

    800 Hidost16 Genome Contagio Benign Hidost16 0.00 0.00 HidostR1 0.78 0.30 HidostR2 0.85 0.53 False Positive Rates HidostR1 Seeds Evaded (out of 500) Generations HidostR2
  69. Variants Hiding the Classifier Clone Benign PDFs Malicious PDF Mutation

    Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Fitness Function Candidate Variant (efghij , higkk ) Score Malicious 0 /JavaScript eval(‘…’); /Root /Catalog /Pages 128 Oracle Target Classifier
  70. Cross-Evasion Effects PDF Malware Seeds Hidost 13 Evasive PDF Malware

    (against PDFrate) Automated Evasion PDFrate 2/500 Evasive (0.4% Success) Potentially Good News?
  71. Evasive PDF Malware (against PDFrate) Cross-Evasion Effects PDF Malware Seeds

    Hidost 13 Automated Evasion PDFrate 2/500 Evasive (0.4% Success) Evasive PDF Malware (against Hidost) 387/500 Evasive (77.4% Success)
  72. Cross-Evasion Effects PDF Malware Seeds Automated Evasion 6/500 Evasive (0.6%

    Success) Hidost 13 Evasive PDF Malware (against Hidost)
  73. Evading Gmail’s Classifier Evasion rate on Gmail: 179/380 (47.1%) for

    javascript in pdf.all_js: javascript.append_code("var ucb=1;“) if pdf.get_size() < 7050000: pdf.add_padding(7050000 – pdf.get_size())
  74. Conclusions Domain Knowledge is not Dead • Classifiers trained without

    understanding vulnerable • Adversaries can exploit unnecessary features Trust Requires Understanding • Good results against test data do not apply to adaptive adversaries but there is hope for building robust ML models!
  75. Credits Funding: National Science Foundation, Air Force Office of Scientific

    Research, Google, Microsoft, Amazon Weilin Xu Security Research Group Yanjun Qi