Slide 1

Slide 1 text

Automatically Evading Classifiers A Case Study on PDF Malware Classifiers Weilin Xu David Evans Yanjun Qi University of Virginia

Slide 2

Slide 2 text

Machine Learning is Solving Our Problems 2 Fake Spam IDS Malware Fake Accounts … …

Slide 3

Slide 3 text

3

Slide 4

Slide 4 text

4

Slide 5

Slide 5 text

Machine Learning is Eating the World Data Scientist Security Expert 5 ?

Slide 6

Slide 6 text

Machine Learning is Eating the World Data Scientist Security Expert 6 No! Security is different.

Slide 7

Slide 7 text

Goal: Understand classifiers under attack. Results: Vulnerable to automated evasion. Security Tasks are Different: Adversary Adapts 7

Slide 8

Slide 8 text

Building Machine Learning Classifiers 8 Trained Classifier Labelled Training Data ML Algorithm Training (Supervised Learning) Feature Extraction Vectors

Slide 9

Slide 9 text

Assumption: Training Data is Representative 9 Labelled Training Data ML Algorithm Feature Extraction Vectors Deployment Malicious / Benign Operational Data Trained Classifier Training (Supervised Learning)

Slide 10

Slide 10 text

Results: Evaded PDF Malware Classifiers PDFrate* [ACSAC’12] Hidost [NDSS’13] Accuracy 0.9976 0.9996 False Negative Rate 0.0000 0.0056 False Negative Rate with Adversary 1.0000 1.0000 10 * Mimicus [Oakland ’14], an open source reimplementation of PDFrate.

Slide 11

Slide 11 text

Results: Evaded PDF Malware Classifiers PDFrate* [ACSAC’12] Hidost [NDSS’13] Accuracy 0.9976 0.9996 False Negative Rate 0.0000 0.0056 False Negative Rate with Adversary 1.0000 1.0000 11 Very robust against “strongest conceivable mimicry attack”. * Mimicus [Oakland ’14], an open source reimplementation of PDFrate.

Slide 12

Slide 12 text

Variants 12 Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants Variants Select Variants ✓ ✓ ✗ ✓ Based on Genetic Programming Automated Evasion Approach

Slide 13

Slide 13 text

Variants 13 Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants Variants Select Variants ✓ ✓ ✗ ✓ Based on Genetic Programming Automated Evasion Approach /Catalog /Pages 0 /JavaScript eval(‘…’); /Root Modified Parser Extract Me If You Can: Abusing PDF Parsers in Malware Detectors Curtis Carmony,et al.

Slide 14

Slide 14 text

Variants 14 Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants Variants Select Variants ✓ ✓ ✗ ✓ Based on Genetic Programming Automated Evasion Approach /Catalog /Pages 0 /JavaScript eval(‘…’); /Root Mutation Variants From Benign Insert / Replace / Delete

Slide 15

Slide 15 text

Variants 15 Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants Variants Select Variants ✓ ✓ ✗ ✓ Based on Genetic Programming Automated Evasion Approach /Catalog /Pages 0 /JavaScript eval(‘…’); /Root Mutation Variants From Benign 128 546 0 0 Insert / Replace / Delete

Slide 16

Slide 16 text

Variants 16 Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants Variants Select Variants ✓ ✓ ✗ ✓ Based on Genetic Programming Automated Evasion Approach /Catalog /Pages 0 /JavaScript eval(‘…’); /Root Mutation Variants From Benign 128 546 0 0 Insert / Replace / Delete

Slide 17

Slide 17 text

Variants 17 Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants Variants Select Variants ✓ ✓ ✗ ✓ Based on Genetic Programming Automated Evasion Approach /Catalog /Pages 0 /JavaScript eval(‘…’); /Root Mutation Variants From Benign 128 546 0 0 128 0 Insert / Replace / Delete

Slide 18

Slide 18 text

Variants 18 Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants Variants Select Variants ✓ ✓ ✗ ✓ Based on Genetic Programming Automated Evasion Approach /Catalog /Pages 0 /JavaScript eval(‘…’); /Root Mutation Variants From Benign 128 0 Insert / Replace / Delete

Slide 19

Slide 19 text

Variants 19 Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants Variants Select Variants ✓ ✓ ✗ ✓ Based on Genetic Programming Automated Evasion Approach /Catalog /Pages 0 /JavaScript eval(‘…’); /Root Mutation Variants From Benign 128 0 Insert / Replace / Delete

Slide 20

Slide 20 text

Variants 20 Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants Variants Select Variants ✓ ✓ ✗ ✓ Based on Genetic Programming Automated Evasion Approach

Slide 21

Slide 21 text

Variants 21 Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants Variants Select Variants ✓ ✓ ✗ ✓ Based on Genetic Programming Automated Evasion Approach Fitness Function Oracle Target Classifier f(x) Malicious? Score Fitness Score Variants

Slide 22

Slide 22 text

Variants 22 Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants Variants Select Variants ✓ ✓ ✗ ✓ Based on Genetic Programming Automated Evasion Approach Fitness Function Oracle Target Classifier f(x) Malicious? Score Fitness Score Variants Malicious Benign

Slide 23

Slide 23 text

Variants 23 Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants Variants Select Variants ✓ ✓ ✗ ✓ Based on Genetic Programming Automated Evasion Approach

Slide 24

Slide 24 text

Results: Evaded PDFrate 100% 24 Original Malware Seeds

Slide 25

Slide 25 text

Results: Evaded PDFrate 100% 25 Original Malware Seeds Evasive Variants

Slide 26

Slide 26 text

Evaded PDFrate with Adjusted Threshold 26 Original Malware Seeds Evasive Variants Evasive Variants with lower threshold

Slide 27

Slide 27 text

Results: Evaded Hidost 100% 27 Original Malware Seeds

Slide 28

Slide 28 text

Results: Evaded Hidost 100% 28 Original Malware Seeds Evasive Variants

Slide 29

Slide 29 text

29 Difficulty varies by seed Simple mutations often work Complex mutations sometimes needed. Difficulty varied by targets: PDFrate: 6 days to evade all Hidost: 2 days to evade all Results: Accumulated Evasion Rate

Slide 30

Slide 30 text

Cross-Evasion Effects 30 PDF Malware Seeds Hidost Evasive PDF Malware (against Hidost) Automated Evasion PDFrate 387/500 Evasive (77.4%) 3/500 Evasive (0.6%) Gmail’s classifier is secure?

Slide 31

Slide 31 text

Cross-Evasion Effects 31 PDF Malware Seeds Hidost Evasive PDF Malware (against Hidost) Automated Evasion PDFrate 387/500 Evasive (77.4%) 3/500 Evasive (0.6%) Gmail’s classifier is secure? different.

Slide 32

Slide 32 text

Evading Gmail’s Classifier 32 Evasion rate on : 135/380 (35.5%)

Slide 33

Slide 33 text

Evading Gmail’s Classifier 33 Evasion rate on : 179/380 (47.1%)

Slide 34

Slide 34 text

Conclusion 34 Source Code: http://EvadeML.org Vs. Who will win this arm race?