Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Automatically Evading Classifiers: A Case Study on PDF Malware Classifiers

David Evans
February 24, 2016

Automatically Evading Classifiers: A Case Study on PDF Malware Classifiers

Weilin Xu's talk at
Network and Distributed Systems Symposium 2016
24 February 2016
San Diego, California

For more details see: http://evademl.org/
and the full paper: http://evademl.org/docs/evademl.pdf

David Evans

February 24, 2016
Tweet

More Decks by David Evans

Other Decks in Research

Transcript

  1. Automatically Evading Classifiers A Case Study on PDF Malware Classifiers

    Weilin Xu David Evans Yanjun Qi University of Virginia
  2. 3

  3. 4

  4. Building Machine Learning Classifiers 8 Trained Classifier Labelled Training Data

    ML Algorithm Training (Supervised Learning) Feature Extraction Vectors
  5. Assumption: Training Data is Representative 9 Labelled Training Data ML

    Algorithm Feature Extraction Vectors Deployment Malicious / Benign Operational Data Trained Classifier Training (Supervised Learning)
  6. Results: Evaded PDF Malware Classifiers PDFrate* [ACSAC’12] Hidost [NDSS’13] Accuracy

    0.9976 0.9996 False Negative Rate 0.0000 0.0056 False Negative Rate with Adversary 1.0000 1.0000 10 * Mimicus [Oakland ’14], an open source reimplementation of PDFrate.
  7. Results: Evaded PDF Malware Classifiers PDFrate* [ACSAC’12] Hidost [NDSS’13] Accuracy

    0.9976 0.9996 False Negative Rate 0.0000 0.0056 False Negative Rate with Adversary 1.0000 1.0000 11 Very robust against “strongest conceivable mimicry attack”. * Mimicus [Oakland ’14], an open source reimplementation of PDFrate.
  8. Variants 12 Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants

    Variants Select Variants ✓ ✓ ✗ ✓ Based on Genetic Programming Automated Evasion Approach
  9. Variants 13 Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants

    Variants Select Variants ✓ ✓ ✗ ✓ Based on Genetic Programming Automated Evasion Approach /Catalog /Pages 0 /JavaScript eval(‘…’); /Root Modified Parser Extract Me If You Can: Abusing PDF Parsers in Malware Detectors Curtis Carmony,et al.
  10. Variants 14 Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants

    Variants Select Variants ✓ ✓ ✗ ✓ Based on Genetic Programming Automated Evasion Approach /Catalog /Pages 0 /JavaScript eval(‘…’); /Root Mutation Variants From Benign Insert / Replace / Delete
  11. Variants 15 Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants

    Variants Select Variants ✓ ✓ ✗ ✓ Based on Genetic Programming Automated Evasion Approach /Catalog /Pages 0 /JavaScript eval(‘…’); /Root Mutation Variants From Benign 128 546 0 0 Insert / Replace / Delete
  12. Variants 16 Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants

    Variants Select Variants ✓ ✓ ✗ ✓ Based on Genetic Programming Automated Evasion Approach /Catalog /Pages 0 /JavaScript eval(‘…’); /Root Mutation Variants From Benign 128 546 0 0 Insert / Replace / Delete
  13. Variants 17 Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants

    Variants Select Variants ✓ ✓ ✗ ✓ Based on Genetic Programming Automated Evasion Approach /Catalog /Pages 0 /JavaScript eval(‘…’); /Root Mutation Variants From Benign 128 546 0 0 128 0 Insert / Replace / Delete
  14. Variants 18 Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants

    Variants Select Variants ✓ ✓ ✗ ✓ Based on Genetic Programming Automated Evasion Approach /Catalog /Pages 0 /JavaScript eval(‘…’); /Root Mutation Variants From Benign 128 0 Insert / Replace / Delete
  15. Variants 19 Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants

    Variants Select Variants ✓ ✓ ✗ ✓ Based on Genetic Programming Automated Evasion Approach /Catalog /Pages 0 /JavaScript eval(‘…’); /Root Mutation Variants From Benign 128 0 Insert / Replace / Delete
  16. Variants 20 Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants

    Variants Select Variants ✓ ✓ ✗ ✓ Based on Genetic Programming Automated Evasion Approach
  17. Variants 21 Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants

    Variants Select Variants ✓ ✓ ✗ ✓ Based on Genetic Programming Automated Evasion Approach Fitness Function Oracle Target Classifier f(x) Malicious? Score Fitness Score Variants
  18. Variants 22 Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants

    Variants Select Variants ✓ ✓ ✗ ✓ Based on Genetic Programming Automated Evasion Approach Fitness Function Oracle Target Classifier f(x) Malicious? Score Fitness Score Variants Malicious Benign
  19. Variants 23 Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants

    Variants Select Variants ✓ ✓ ✗ ✓ Based on Genetic Programming Automated Evasion Approach
  20. Evaded PDFrate with Adjusted Threshold 26 Original Malware Seeds Evasive

    Variants Evasive Variants with lower threshold
  21. 29 Difficulty varies by seed Simple mutations often work Complex

    mutations sometimes needed. Difficulty varied by targets: PDFrate: 6 days to evade all Hidost: 2 days to evade all Results: Accumulated Evasion Rate
  22. Cross-Evasion Effects 30 PDF Malware Seeds Hidost Evasive PDF Malware

    (against Hidost) Automated Evasion PDFrate 387/500 Evasive (77.4%) 3/500 Evasive (0.6%) Gmail’s classifier is secure?
  23. Cross-Evasion Effects 31 PDF Malware Seeds Hidost Evasive PDF Malware

    (against Hidost) Automated Evasion PDFrate 387/500 Evasive (77.4%) 3/500 Evasive (0.6%) Gmail’s classifier is secure? different.