Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Shrinking and Exploring Adversarial Search Spaces

David Evans
September 19, 2017

Shrinking and Exploring Adversarial Search Spaces

ARO Workshop on Adversarial Learning
Stanford, 14 Sept 2017
https://evadeML.org

David Evans

September 19, 2017
Tweet

More Decks by David Evans

Other Decks in Research

Transcript

  1. Shrinking and Exploring Adversarial Search Spaces David Evans University of

    Virginia ARO Workshop on Adversarial Learning Stanford, 14 Sept 2017 Weilin Xu Yanjun Qi evadeML.org
  2. Security State-of-the-Art Random guessing attack success probability Threat models Proofs

    Cryptography " information theoretic, resource bounded required System Security " capabilities, motivations, rationality common Adversarial Machine Learning " *; " white-box, black-box rare! 2
  3. Adversarial Examples 3 0.007 × [] + = “panda” “gibbon”

    Example from: Ian J. Goodfellow, Jonathon Shlens, Christian Szegedy. Explaining and Harnessing Adversarial Examples. 2014.
  4. Adversarial Examples Game 4 Given seed sample, , find 6

    where: 6 ≠ () Class is different (untargeted) 6 = Class is (targeted) ∆ , 6 ≤ Difference below threshold ∆ , 6 is defined in some (simple!) metric space: @ “norm (# different), Anorm, Bnorm (“Euclidean”), Cnorm:
  5. Model Model Model Squeezer1 Squeezer2 Prediction0 Prediction1 Prediction2 (@ ,

    A , … , K ) Yes Input Adversarial No Legitimate Model’ Squeezerk … Predictionk Detecting Adversarial Examples
  6. “Feature Squeezing” 6 [0.054, 0.4894, 0.9258, 0.0116, 0.2898, 0.5222, 0.5074,

    …] [0.0491, 0.4903, 0.9292, 0.009, 0.2942, 0.5243, 0.5078, …] [0.0, 0.5, 1.0, 0.0, 0.25, 0.5, 0.5, …] 6 Squeeze: O = round(O ×4)/4 Squeeze: O = round(O ×4)/4 [0.0, 0.5, 1.0, 0.0, 0.25, 0.5, 0.5, …] squeeze 6 ≈ squeeze ⟹ (squeeze 6 ) ≈ (squeeze )
  7. Example Squeezers 7 Reduce Color Depth Median Smoothing 8-bit greyscale

    1-bit monochrome 3x3 smoothing: Replace with median of pixels and its neighbors
  8. Simple Instantiation Model (7-layer CNN) Model Model Bit Depth- 1

    Median 2×2 Prediction0 Prediction1 Prediction2 Yes Input Adversarial No Legitimate max A @ , A , A @ , 2 >
  9. 9 0 200 400 600 800 0.0 0.4 0.8 1.2

    1.6 2.0 Number of Examples Legitimate Adversarial Maximum A distance between original and squeezed input threshold = 0.0029 detection: 98.2%, FP < 4%
  10. ImageNet Configuration Model (MobileNet) Model Model Bit Depth- 5 Median

    2×2 Prediction0 Prediction1 Prediction2 Yes Input Adversarial No Legitimate max(A (@ , {A , B , b }) > Model Non-local Mean Prediction3
  11. 11 0 20 40 60 80 100 120 140 0.0

    0.4 0.8 1.2 1.6 2.0 Legitimate Adversarial Maximum A distance between original and squeezed input threshold = 1.24 detection: 85%, FP < 5%
  12. Detection Performance Summary Dataset Most Effective Squeezers Found Threshold Detection

    Rate False Positive Rate ROC-AUC Successful AEs Failed AEs Including FAEs Excluding FAEs MNIST Bit Depth (1-bit), Median (2x2) 0.0029 98.2% 20.0% 3.98% 94.5% 99.6% CIFAR-10 Bit Depth (5-bit), Median (2x2), Non-local Mean (13-3-2) 1.1402 85.0% 9.1% 4.93% 95.7% 95.9% ImageNet Bit Depth (5-bit), Median (2x2), Non-local Mean (11-3-4) 1.2476 85.2% 25.0% 4.70% 94.0% 94.5%
  13. 0% 10% 20% 30% 40% 50% 60% 70% 80% 90%

    100% Detection Performance JSMA (LL) JSMA (Next) CW0 (LL) CW0 (Next) CW2 (LL) CW2 (Next) DeepFool CW∞ (LL) CW∞ (Next) BIM FGSM MNIST CIFAR-10 ImageNet
  14. Arms Race? 15 WOOT (August 2017) Incorporate A squeezed distance

    into loss function Untargeted Targeted (Next) Targeted (Least Likely) 64% 41% 21% (Adversary success rate on MNIST)
  15. Raising the Bar or Changing the Game? 16 Metric Space

    1: Target Classifier Metric Space 2: “Oracle” Before: find a small perturbation that changes class for classifier, but imperceptible to oracle. Now: change class for both original and squeezed classifier, but imperceptible to oracle.
  16. “Feature Squeezing” Conjecture For any distance-limited adversarial method, there exists

    some feature squeezer that accurately detects its adversarial examples. 17 Intuition: if the perturbation is small (in some simple metric space), there is some squeezer that coalesces original and adversarial example into same sample.
  17. Model Model Model Squeezer1 Squeezer2 Prediction0 Prediction1 Prediction2 (@ ,

    A , … , K ) Yes Input Adversarial No Legitimate Model’ Squeezerk … Predictionk Defender’s Entropy Advantage random seed
  18. Changing the Game Option 1: Find distance-limited adversarial methods for

    which it is intractable to find effective feature squeezers. Option 2: Redefine adversarial examples so distance is not limited in a simple metric space... 20 focus of rest of the talk
  19. Do Humans Matter? 21 Metric Space 1: Machine Metric Space

    2: Human Metric Space 1: Machine 1 Metric Space 2: Machine 2
  20. Variants Automated Classifier Evasion Using Genetic Programming Clone Benign PDFs

    Malicious PDF Mutation Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Benign Oracle
  21. Variants Generating Variants Clone Benign PDFs Malicious PDF Mutation Variants

    Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive?
  22. Variants Generating Variants Clone Benign PDFs Malicious PDF Mutation Variants

    Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Found Evasive ? 0 /JavaScript eval(‘…’); /Root /Catalog /Pages Select random node Randomly transform: delete, insert, replace
  23. Variants Generating Variants Clone Benign PDFs Malicious PDF Mutation Variants

    Variants Select Variants Found Evasive? Found Evasive ? Select random node Randomly transform: delete, insert, replace Nodes from Benign PDFs 0 /JavaScript eval(‘…’); /Root /Catalog /Pages 128 546 7 63 128
  24. Variants Selecting Promising Variants Clone Benign PDFs Malicious PDF Mutation

    Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive?
  25. Variants Selecting Promising Variants Clone Benign PDFs Malicious PDF Mutation

    Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Fitness Function Candidate Variant (ghijkl , jkimm ) Score Malicious 0 /JavaScript eval(‘…’); /Root /Catalog /Pages 128 Oracle Target Classifier
  26. Oracle Execute candidate in vulnerable Adobe Reader in virtual environment

    Behavioral signature: malicious if signature matches https://github.com/cuckoosandbox Simulated network: INetSim Cuckoo HTTP_URL + HOST extracted from API traces Advantage: we know the target malware behavior
  27. Fitness Function Assumes lost malicious behavior will not be recovered

    = o .5 − classifier_score if oracle = "malicious" −∞ otherwise classifier_score ≥ 0.5: labeled malicious
  28. 0 100 200 300 400 500 0 100 200 300

    Seeds Evaded (out of 500) PDFRate Number of Mutations Hidost
  29. 0 100 200 300 400 500 0 100 200 300

    Seeds Evaded (out of 500) PDFRate Number of Mutations Hidost Simple transformations often worked
  30. 0 100 200 300 400 500 0 100 200 300

    Seeds Evaded (out of 500) PDFRate Number of Mutations Hidost (insert, /Root/Pages/Kids, 3:/Root/Pages/Kids/4/Kids/5/) Works on 162/500 seeds
  31. 0 100 200 300 400 500 0 100 200 300

    Seeds Evaded (out of 500) PDFRate Number of Mutations Hidost Works on 162/500 seeds Some seeds required complex transformations
  32. Possible Defense: Adjust Threshold Charles Smutz, Angelos Stavrou. When a

    Tree Falls: Using Diversity in Ensemble Classifiers to Identify Evasion in Malware Detectors. NDSS 2016.
  33. Variants Hide the Classifier Score? Clone Benign PDFs Malicious PDF

    Mutation Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Fitness Function Candidate Variant (ghijkl , jkimm ) Score Malicious 0 /JavaScript eval(‘…’); /Root /Catalog /Pages 128 Oracle Target Classifier
  34. Variants Binary Classifier Output is Enough Clone Benign PDFs Malicious

    PDF Mutation Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Fitness Function Candidate Variant (ghijkl , jkimm ) Score Malicious 0 /JavaScript eval(‘…’); /Root /Catalog /Pages 128 Oracle Target Classifier ACM CCS 2017
  35. Labelled Training Data ML Algorithm Feature Extraction Vectors Deployment Malicious

    / Benign Operational Data Trained Classifier Training (supervised learning) Retrain Classifier
  36. 0 100 200 300 400 500 0 200 400 600

    800 Seeds Evaded (out of 500) Generations Hidost16 Original classifier: Takes 614 generations to evade all seeds
  37. 0 100 200 300 400 500 0 200 400 600

    800 HidostR1 Seeds Evaded (out of 500) Generations Hidost16
  38. 0 100 200 300 400 500 0 200 400 600

    800 HidostR1 HidostR2 Seeds Evaded (out of 500) Generations Hidost16
  39. 0 100 200 300 400 500 0 200 400 600

    800 Hidost16 Genome Contagio Benign Hidost16 0.00 0.00 HidostR1 0.78 0.30 HidostR2 0.85 0.53 False Positive Rates HidostR1 Seeds Evaded (out of 500) Generations HidostR2
  40. 50 Only 8/6987 robust features (Hidost) Robust classifier High false

    positives /Names /Names /JavaScript /Names /JavaScript /Names /Names /JavaScript /JS /OpenAction /OpenAction /JS /OpenAction /S /Pages
  41. EvadeML-Zoo: an AML Toolbox Model FGSM, BIM, JSMA, DeepFool, CW2

    , CW∞ , CW0 MNIST CIFAR-10 ImageNet CNN DenseNet MobileNets Feature Squeezing Weilin Xu, Andrew Norton, Noah Kim, Yanjun Qi Visualization evademl.org/zoo
  42. Open Questions Can we close the gap between experimental techniques

    (that work on complex models) and formal methods (that work on small models)? Reducing adversarial search space Will classifiers ever be good enough to apply “crypto” standards to adversarial examples? Is PDF Malware the MNIST of malware classification? 52 EvadeML.org