Automatically Evading Classifiers: A Case Study on PDF Malware Classifiers

40e37c08199ed4d3866ce6e1ff0be06d?s=47 David Evans
February 24, 2016

Automatically Evading Classifiers: A Case Study on PDF Malware Classifiers

Weilin Xu's talk at
Network and Distributed Systems Symposium 2016
24 February 2016
San Diego, California

For more details see: http://evademl.org/
and the full paper: http://evademl.org/docs/evademl.pdf

40e37c08199ed4d3866ce6e1ff0be06d?s=128

David Evans

February 24, 2016
Tweet

Transcript

  1. Automatically Evading Classifiers A Case Study on PDF Malware Classifiers

    Weilin Xu David Evans Yanjun Qi University of Virginia
  2. Machine Learning is Solving Our Problems 2 Fake Spam IDS

    Malware Fake Accounts … …
  3. 3

  4. 4

  5. Machine Learning is Eating the World Data Scientist Security Expert

    5 ?
  6. Machine Learning is Eating the World Data Scientist Security Expert

    6 No! Security is different.
  7. Goal: Understand classifiers under attack. Results: Vulnerable to automated evasion.

    Security Tasks are Different: Adversary Adapts 7
  8. Building Machine Learning Classifiers 8 Trained Classifier Labelled Training Data

    ML Algorithm Training (Supervised Learning) Feature Extraction Vectors
  9. Assumption: Training Data is Representative 9 Labelled Training Data ML

    Algorithm Feature Extraction Vectors Deployment Malicious / Benign Operational Data Trained Classifier Training (Supervised Learning)
  10. Results: Evaded PDF Malware Classifiers PDFrate* [ACSAC’12] Hidost [NDSS’13] Accuracy

    0.9976 0.9996 False Negative Rate 0.0000 0.0056 False Negative Rate with Adversary 1.0000 1.0000 10 * Mimicus [Oakland ’14], an open source reimplementation of PDFrate.
  11. Results: Evaded PDF Malware Classifiers PDFrate* [ACSAC’12] Hidost [NDSS’13] Accuracy

    0.9976 0.9996 False Negative Rate 0.0000 0.0056 False Negative Rate with Adversary 1.0000 1.0000 11 Very robust against “strongest conceivable mimicry attack”. * Mimicus [Oakland ’14], an open source reimplementation of PDFrate.
  12. Variants 12 Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants

    Variants Select Variants ✓ ✓ ✗ ✓ Based on Genetic Programming Automated Evasion Approach
  13. Variants 13 Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants

    Variants Select Variants ✓ ✓ ✗ ✓ Based on Genetic Programming Automated Evasion Approach /Catalog /Pages 0 /JavaScript eval(‘…’); /Root Modified Parser Extract Me If You Can: Abusing PDF Parsers in Malware Detectors Curtis Carmony,et al.
  14. Variants 14 Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants

    Variants Select Variants ✓ ✓ ✗ ✓ Based on Genetic Programming Automated Evasion Approach /Catalog /Pages 0 /JavaScript eval(‘…’); /Root Mutation Variants From Benign Insert / Replace / Delete
  15. Variants 15 Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants

    Variants Select Variants ✓ ✓ ✗ ✓ Based on Genetic Programming Automated Evasion Approach /Catalog /Pages 0 /JavaScript eval(‘…’); /Root Mutation Variants From Benign 128 546 0 0 Insert / Replace / Delete
  16. Variants 16 Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants

    Variants Select Variants ✓ ✓ ✗ ✓ Based on Genetic Programming Automated Evasion Approach /Catalog /Pages 0 /JavaScript eval(‘…’); /Root Mutation Variants From Benign 128 546 0 0 Insert / Replace / Delete
  17. Variants 17 Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants

    Variants Select Variants ✓ ✓ ✗ ✓ Based on Genetic Programming Automated Evasion Approach /Catalog /Pages 0 /JavaScript eval(‘…’); /Root Mutation Variants From Benign 128 546 0 0 128 0 Insert / Replace / Delete
  18. Variants 18 Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants

    Variants Select Variants ✓ ✓ ✗ ✓ Based on Genetic Programming Automated Evasion Approach /Catalog /Pages 0 /JavaScript eval(‘…’); /Root Mutation Variants From Benign 128 0 Insert / Replace / Delete
  19. Variants 19 Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants

    Variants Select Variants ✓ ✓ ✗ ✓ Based on Genetic Programming Automated Evasion Approach /Catalog /Pages 0 /JavaScript eval(‘…’); /Root Mutation Variants From Benign 128 0 Insert / Replace / Delete
  20. Variants 20 Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants

    Variants Select Variants ✓ ✓ ✗ ✓ Based on Genetic Programming Automated Evasion Approach
  21. Variants 21 Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants

    Variants Select Variants ✓ ✓ ✗ ✓ Based on Genetic Programming Automated Evasion Approach Fitness Function Oracle Target Classifier f(x) Malicious? Score Fitness Score Variants
  22. Variants 22 Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants

    Variants Select Variants ✓ ✓ ✗ ✓ Based on Genetic Programming Automated Evasion Approach Fitness Function Oracle Target Classifier f(x) Malicious? Score Fitness Score Variants Malicious Benign
  23. Variants 23 Clone Benign PDFs Malicious PDF Mutation 01011001101 Variants

    Variants Select Variants ✓ ✓ ✗ ✓ Based on Genetic Programming Automated Evasion Approach
  24. Results: Evaded PDFrate 100% 24 Original Malware Seeds

  25. Results: Evaded PDFrate 100% 25 Original Malware Seeds Evasive Variants

  26. Evaded PDFrate with Adjusted Threshold 26 Original Malware Seeds Evasive

    Variants Evasive Variants with lower threshold
  27. Results: Evaded Hidost 100% 27 Original Malware Seeds

  28. Results: Evaded Hidost 100% 28 Original Malware Seeds Evasive Variants

  29. 29 Difficulty varies by seed Simple mutations often work Complex

    mutations sometimes needed. Difficulty varied by targets: PDFrate: 6 days to evade all Hidost: 2 days to evade all Results: Accumulated Evasion Rate
  30. Cross-Evasion Effects 30 PDF Malware Seeds Hidost Evasive PDF Malware

    (against Hidost) Automated Evasion PDFrate 387/500 Evasive (77.4%) 3/500 Evasive (0.6%) Gmail’s classifier is secure?
  31. Cross-Evasion Effects 31 PDF Malware Seeds Hidost Evasive PDF Malware

    (against Hidost) Automated Evasion PDFrate 387/500 Evasive (77.4%) 3/500 Evasive (0.6%) Gmail’s classifier is secure? different.
  32. Evading Gmail’s Classifier 32 Evasion rate on : 135/380 (35.5%)

  33. Evading Gmail’s Classifier 33 Evasion rate on : 179/380 (47.1%)

  34. Conclusion 34 Source Code: http://EvadeML.org Vs. Who will win this

    arm race?