Shrinking and Exploring Adversarial Search Spaces

Shrinking and Exploring Adversarial Search Spaces David Evans University of
Virginia ARO Workshop on Adversarial Learning Stanford, 14 Sept 2017 Weilin Xu Yanjun Qi evadeML.org

Machine Learning is Eating Computer Science 1

Security State-of-the-Art Random guessing attack success probability Threat models Proofs
Cryptography " information theoretic, resource bounded required System Security " capabilities, motivations, rationality common Adversarial Machine Learning " *; " white-box, black-box rare! 2

Adversarial Examples 3 0.007 × [] + = “panda” “gibbon”
Example from: Ian J. Goodfellow, Jonathon Shlens, Christian Szegedy. Explaining and Harnessing Adversarial Examples. 2014.

Adversarial Examples Game 4 Given seed sample, , find 6
where: 6 ≠ () Class is different (untargeted) 6 = Class is (targeted) ∆ , 6 ≤ Difference below threshold ∆ , 6 is defined in some (simple!) metric space: @ “norm (# different), Anorm, Bnorm (“Euclidean”), Cnorm:

Model Model Model Squeezer1 Squeezer2 Prediction0 Prediction1 Prediction2 (@ ,
A , … , K ) Yes Input Adversarial No Legitimate Model’ Squeezerk … Predictionk Detecting Adversarial Examples

“Feature Squeezing” 6 [0.054, 0.4894, 0.9258, 0.0116, 0.2898, 0.5222, 0.5074,
…] [0.0491, 0.4903, 0.9292, 0.009, 0.2942, 0.5243, 0.5078, …] [0.0, 0.5, 1.0, 0.0, 0.25, 0.5, 0.5, …] 6 Squeeze: O = round(O ×4)/4 Squeeze: O = round(O ×4)/4 [0.0, 0.5, 1.0, 0.0, 0.25, 0.5, 0.5, …] squeeze 6 ≈ squeeze ⟹ (squeeze 6 ) ≈ (squeeze )

Example Squeezers 7 Reduce Color Depth Median Smoothing 8-bit greyscale
1-bit monochrome 3x3 smoothing: Replace with median of pixels and its neighbors

Simple Instantiation Model (7-layer CNN) Model Model Bit Depth- 1
Median 2×2 Prediction0 Prediction1 Prediction2 Yes Input Adversarial No Legitimate max A @ , A , A @ , 2 >

9 0 200 400 600 800 0.0 0.4 0.8 1.2
1.6 2.0 Number of Examples Legitimate Adversarial Maximum A distance between original and squeezed input threshold = 0.0029 detection: 98.2%, FP < 4%

ImageNet Configuration Model (MobileNet) Model Model Bit Depth- 5 Median
2×2 Prediction0 Prediction1 Prediction2 Yes Input Adversarial No Legitimate max(A (@ , {A , B , b }) > Model Non-local Mean Prediction3

11 0 20 40 60 80 100 120 140 0.0
0.4 0.8 1.2 1.6 2.0 Legitimate Adversarial Maximum A distance between original and squeezed input threshold = 1.24 detection: 85%, FP < 5%

Detection Performance Summary Dataset Most Effective Squeezers Found Threshold Detection
Rate False Positive Rate ROC-AUC Successful AEs Failed AEs Including FAEs Excluding FAEs MNIST Bit Depth (1-bit), Median (2x2) 0.0029 98.2% 20.0% 3.98% 94.5% 99.6% CIFAR-10 Bit Depth (5-bit), Median (2x2), Non-local Mean (13-3-2) 1.1402 85.0% 9.1% 4.93% 95.7% 95.9% ImageNet Bit Depth (5-bit), Median (2x2), Non-local Mean (11-3-4) 1.2476 85.2% 25.0% 4.70% 94.0% 94.5%

0% 10% 20% 30% 40% 50% 60% 70% 80% 90%
100% Detection Performance JSMA (LL) JSMA (Next) CW0 (LL) CW0 (Next) CW2 (LL) CW2 (Next) DeepFool CW∞ (LL) CW∞ (Next) BIM FGSM MNIST CIFAR-10 ImageNet

14 Composes with model-based defenses =

Arms Race? 15 WOOT (August 2017) Incorporate A squeezed distance
into loss function Untargeted Targeted (Next) Targeted (Least Likely) 64% 41% 21% (Adversary success rate on MNIST)

Raising the Bar or Changing the Game? 16 Metric Space
1: Target Classifier Metric Space 2: “Oracle” Before: find a small perturbation that changes class for classifier, but imperceptible to oracle. Now: change class for both original and squeezed classifier, but imperceptible to oracle.

“Feature Squeezing” Conjecture For any distance-limited adversarial method, there exists
some feature squeezer that accurately detects its adversarial examples. 17 Intuition: if the perturbation is small (in some simple metric space), there is some squeezer that coalesces original and adversarial example into same sample.

Model Model Model Squeezer1 Squeezer2 Prediction0 Prediction1 Prediction2 (@ ,
A , … , K ) Yes Input Adversarial No Legitimate Model’ Squeezerk … Predictionk Defender’s Entropy Advantage random seed

More Complex Squeezers + Entropy 19 CCS 2017 Pick a
random autoencoder

Changing the Game Option 1: Find distance-limited adversarial methods for
which it is intractable to find effective feature squeezers. Option 2: Redefine adversarial examples so distance is not limited in a simple metric space... 20 focus of rest of the talk

Do Humans Matter? 21 Metric Space 1: Machine Metric Space
2: Human Metric Space 1: Machine 1 Metric Space 2: Machine 2

Malware Classifiers

Variants Automated Classifier Evasion Using Genetic Programming Clone Benign PDFs
Malicious PDF Mutation Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Benign Oracle

Variants Generating Variants Clone Benign PDFs Malicious PDF Mutation Variants
Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive?

Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Found Evasive ? 0 /JavaScript eval(‘…’); /Root /Catalog /Pages Select random node Randomly transform: delete, insert, replace

Variants Select Variants Found Evasive? Found Evasive ? Select random node Randomly transform: delete, insert, replace Nodes from Benign PDFs 0 /JavaScript eval(‘…’); /Root /Catalog /Pages 128 546 7 63 128

Variants Selecting Promising Variants Clone Benign PDFs Malicious PDF Mutation
Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive?

Variants Selecting Promising Variants Clone Benign PDFs Malicious PDF Mutation
Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Fitness Function Candidate Variant (ghijkl , jkimm ) Score Malicious 0 /JavaScript eval(‘…’); /Root /Catalog /Pages 128 Oracle Target Classifier

Oracle Execute candidate in vulnerable Adobe Reader in virtual environment
Behavioral signature: malicious if signature matches https://github.com/cuckoosandbox Simulated network: INetSim Cuckoo HTTP_URL + HOST extracted from API traces Advantage: we know the target malware behavior

Fitness Function Assumes lost malicious behavior will not be recovered
= o .5 − classifier_score if oracle = "malicious" −∞ otherwise classifier_score ≥ 0.5: labeled malicious

0 100 200 300 400 500 0 100 200 300
Seeds Evaded (out of 500) PDFRate Number of Mutations Hidost

0 100 200 300 400 500 0 100 200 300
Seeds Evaded (out of 500) PDFRate Number of Mutations Hidost Simple transformations often worked

0 100 200 300 400 500 0 100 200 300
Seeds Evaded (out of 500) PDFRate Number of Mutations Hidost (insert, /Root/Pages/Kids, 3:/Root/Pages/Kids/4/Kids/5/) Works on 162/500 seeds

0 100 200 300 400 500 0 100 200 300
Seeds Evaded (out of 500) PDFRate Number of Mutations Hidost Works on 162/500 seeds Some seeds required complex transformations

Possible Defenses

Possible Defense: Adjust Threshold Charles Smutz, Angelos Stavrou. When a
Tree Falls: Using Diversity in Ensemble Classifiers to Identify Evasion in Malware Detectors. NDSS 2016.

Original Malicious Seeds Evading PDFrate Malicious Label Threshold

Discovered Evasive Variants Adjust threshold?

Adjust threshold? Variants found with threshold = 0.25 Variants found
with threshold = 0.50

Possible Defense: Hide Classifier

Variants Hide the Classifier Score? Clone Benign PDFs Malicious PDF
Mutation Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Fitness Function Candidate Variant (ghijkl , jkimm ) Score Malicious 0 /JavaScript eval(‘…’); /Root /Catalog /Pages 128 Oracle Target Classifier

Variants Binary Classifier Output is Enough Clone Benign PDFs Malicious
PDF Mutation Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Fitness Function Candidate Variant (ghijkl , jkimm ) Score Malicious 0 /JavaScript eval(‘…’); /Root /Catalog /Pages 128 Oracle Target Classifier ACM CCS 2017

Possible Defense: Retrain Classifier

Labelled Training Data ML Algorithm Feature Extraction Vectors Deployment Malicious
/ Benign Operational Data Trained Classifier Training (supervised learning) Retrain Classifier

Labelled Training Data ML Algorithm Feature Extraction Vectors Training (supervised
learning) Clone EvadeML Deployment

0 100 200 300 400 500 0 200 400 600
800 Seeds Evaded (out of 500) Generations Hidost16 Original classifier: Takes 614 generations to evade all seeds

0 100 200 300 400 500 0 200 400 600
800 HidostR1 Seeds Evaded (out of 500) Generations Hidost16

0 100 200 300 400 500 0 200 400 600
800 HidostR1 HidostR2 Seeds Evaded (out of 500) Generations Hidost16

0 100 200 300 400 500 0 200 400 600
800 Hidost16 Genome Contagio Benign Hidost16 0.00 0.00 HidostR1 0.78 0.30 HidostR2 0.85 0.53 False Positive Rates HidostR1 Seeds Evaded (out of 500) Generations HidostR2

50 Only 8/6987 robust features (Hidost) Robust classifier High false
positives /Names /Names /JavaScript /Names /JavaScript /Names /Names /JavaScript /JS /OpenAction /OpenAction /JS /OpenAction /S /Pages

EvadeML-Zoo: an AML Toolbox Model FGSM, BIM, JSMA, DeepFool, CW2
, CW∞ , CW0 MNIST CIFAR-10 ImageNet CNN DenseNet MobileNets Feature Squeezing Weilin Xu, Andrew Norton, Noah Kim, Yanjun Qi Visualization evademl.org/zoo

Open Questions Can we close the gap between experimental techniques
(that work on complex models) and formal methods (that work on small models)? Reducing adversarial search space Will classifiers ever be good enough to apply “crypto” standards to adversarial examples? Is PDF Malware the MNIST of malware classification? 52 EvadeML.org

David Evans University of Virginia [email protected] EvadeML.org source code, papers

Shrinking and Exploring Adversarial Search Spaces

Shrinking and Exploring Adversarial Search Spaces

More Decks by David Evans

Other Decks in Research

Featured

Transcript