Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Classifiers Under Attack

David Evans
February 01, 2017

Classifiers Under Attack

Talk at USENIX Enigma 2017
1 February 2017
Oakland, CA

https://evadeML.org

David Evans

February 01, 2017
Tweet

More Decks by David Evans

Other Decks in Science

Transcript

  1. Classifiers Under Attack David Evans work with Weilin Xu and

    Yanjun Qi University of Virginia [email protected] evadeML.org 1 February 2017
  2. Labelled Training Data ML Algorithm Feature Extraction Vectors Deployment Malicious

    / Benign Operational Data Trained Classifier Training (Supervised Learning) Assumption: Training Data is Representative
  3. Focus: Evasion Attacks Goal: Automatically simulate adaptive adversary against generic

    classifier Purpose: Understand classifier robustness Build better classifiers (or give up)
  4. 0 50 100 150 200 250 2006 2007 2008 2009

    2010 2011 2012 2013 2014 2015 2016 2017 Vulnerabilities reported in Adobe Acrobat Reader Source: http://www.cvedetails.com/vulnerability-list.php?vendor_id=53&product_id=921 33 already in Jan 2017!
  5. PDF Malware Classifiers PDFrate [ACSA 2012] Hidost16 [JIS 2016] Hidost13

    [NDSS 2013] Random Forest Random Forest Support Vector Machine Classifier Accuracy 0.9976 0.9996 0.9996 * Mimicus [Oakland 2014], an open source reimplementation of PDFrate.
  6. Random Forest x y w 0 1 z 1 0

    1 r q 0 z 0 0 y 0 1 Generate many random decision trees Train independently Select best trees Vote on result
  7. PDF Malware Classifiers Random Forest Random Forest Support Vector Machine

    Features Object counts, lengths, positions, … Object structural paths Very robust against “strongest conceivable mimicry attack”. Automated Features Manual Features PDFrate [ACSA 2012] Hidost16 [JIS 2016] Hidost13 [NDSS 2013]
  8. Variants Automated Classifier Evasion Using Genetic Programming Clone Benign PDFs

    Malicious PDF Mutation Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive?
  9. Variants Goal: Find Evasive Variant Clone Benign PDFs Malicious PDF

    Mutation Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Benign Simulated attacker’s goal: find sample classified as benign, that exhibits malicious behavior.
  10. Variants Start with Malicious Seed Clone Benign PDFs Malicious PDF

    Mutation Variants Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? Benign
  11. Variants Clone Benign PDFs Malicious PDF Mutation Variants Select Variants

    ✓ ✓ ✗ ✓ Found Evasive? Modified Parser 0 /JavaScript eval(‘…’); /Root /Catalog /Pages “robust” version of pdfrw
  12. Variants Clone Benign PDFs Malicious PDF Mutation Variants Variants Select

    Variants ✓ ✓ ✗ ✓ Found Evasive? Generating Variants
  13. Variants Clone Benign PDFs Malicious PDF Mutation Variants Variants Select

    Variants ✓ ✓ ✗ ✓ Found Evasive? 0 /JavaScript eval(‘…’); /Root /Catalog /Pages Generating Variants Select random node
  14. Variants Clone Benign PDFs Malicious PDF Mutation Variants Variants Select

    Variants ✓ ✓ ✗ ✓ Found Evasive? 0 /JavaScript eval(‘…’); /Root /Catalog /Pages Select random node Random transform: delete, insert, replace Generating Variants
  15. Variants Generating Variants Clone Benign PDFs Malicious PDF Mutation Variants

    Variants Select Variants ✓ ✓ ✗ ✓ Found Evasive? 0 /JavaScript eval(‘…’); /Root /Catalog /Pages Nodes from Benign PDFs 128 546 7 63 Random transform: delete, insert, replace 128 Select random node
  16. Variants Clone Benign PDFs Malicious PDF Mutation Variants Variants Select

    Variants ✓ ✓ ✗ ✓ Found Evasive? Selecting Promising Variants
  17. Variants Clone Benign PDFs Malicious PDF Mutation Variants Variants Select

    Variants ✓ ✓ ✗ ✓ Found Evasive? Selecting Promising Variants Clone Generated Variants Clone Variants Fitness Function Candidate Variant ($%&'() , '(&++ ) Score Malicious Benign PDFs Malicious PDF Variants Benign PDFs Malicious PDF Variants Oracle Variant 0 /JavaScript eval(‘…’); /Root /Catalog /Pages 128 Oracle Target Classifier
  18. Oracle Execute candidate in vulnerable Adobe Reader in virtual environment

    Behavioral signature: malicious if signature matches https://github.com/cuckoosandbox Simulated network: INetSim Cuckoo HTTP_URL + HOST extracted from API traces Advantage: we know the target malware behavior
  19. Variants Clone Benign PDFs Malicious PDF Mutation Variants Variants Select

    Variants ✓ ✓ ✗ ✓ Found Evasive? Selecting Promising Variants Clone Generated Variants Clone Variants Fitness Function Candidate Variant ($%&'() , '(&++ ) Score Malicious Benign PDFs Malicious PDF Variants Benign PDFs Malicious PDF Variants Oracle Variant 0 /JavaScript eval(‘…’); /Root /Catalog /Pages 128 Oracle Target Classifier
  20. Fitness Function Assumes lost malicious behavior will not be recovered

    = 0 .5 − classifier_score if oracle = "malicious" −∞ otherwise classifier_score ≥ 0.5: labeled malicious
  21. Classifier Performance PDFrate Hidost Accuracy 0.9976 0.9996 False Negative Rate

    0.0000 0.0056 False Negative Rate against Adversary 1.0000 1.0000
  22. 0 100 200 300 400 500 0 100 200 300

    Seeds Evaded (out of 500) PDFRate Number of Mutations Hidost
  23. 0 100 200 300 400 500 0 100 200 300

    Seeds Evaded (out of 500) PDFRate Hidost Number of Mutations Simple transformations often worked
  24. 0 100 200 300 400 500 0 100 200 300

    Seeds Evaded (out of 500) PDFRate Number of Mutations (insert, /Root/Pages/Kids, 3:/Root/Pages/Kids/4/Kids/5/) Works on 162/500 seeds
  25. 0 100 200 300 400 500 0 100 200 300

    Seeds Evaded (out of 500) PDFRate Number of Mutations Hidost Some seeds required complex transformations
  26. Insert: Threads, ViewerPreferences/Direction, Metadata, Metadata/Length, Metadata/Subtype, Metadata/Type, OpenAction/Contents, OpenAction/Contents/Filter, OpenAction/Contents/Length,

    Pages/MediaBox Delete: AcroForm, Names/JavaSCript/Names/S, AcroForm/DR/Encoding/PDFDocEncoding, AcroForm/DR/Encoding/PDFDocEncoding/Differences, AcroForm/DR/Encoding/PDFDocEncoding/Type, Pages/Rotate, AcroForm/Fields, AcroForm/DA, Outlines/Type, Outlines, Outlines/Count, Pages/Resources/ProcSet, Pages/Resources 85-step mutation trace evading Hidost Effective for 198/500 seeds
  27. 0 20 40 60 80 100 120 Hidost PDFrate Oracle

    Execution Cost Hours to find all 500 variants on one desktop PC Oracle Mutation Classifier
  28. Possible Defense: Adjust Threshold Charles Smutz, Angelos Stavrou. When a

    Tree Falls: Using Diversity in Ensemble Classifiers to Identify Evasion in Malware Detectors. NDSS 2016.
  29. Labelled Training Data ML Algorithm Feature Extraction Vectors Deployment Malicious

    / Benign Operational Data Trained Classifier Training (Supervised Learning) Retraining Classifier
  30. 0 100 200 300 400 500 0 200 400 600

    800 Seeds Evaded (out of 500) Generations Hidost16 Original classifier: Takes 614 generations to evade all seeds
  31. 0 100 200 300 400 500 0 200 400 600

    800 Seeds Evaded (out of 500) Generations Hidost16 HidostR1
  32. 0 100 200 300 400 500 0 200 400 600

    800 Seeds Evaded (out of 500) Generations Hidost16 HidostR1 HidostR2
  33. 0 100 200 300 400 500 0 200 400 600

    800 Seeds Evaded (out of 500) Generations Hidost16 HidostR1 HidostR2
  34. 0 100 200 300 400 500 0 200 400 600

    800 Seeds Evaded (out of 500) Generations Hidost16 HidostR1 HidostR2 Genome Contagio Benign Hidost16 0.00 0.00 HidostR1 0.78 0.30 HidostR2 0.85 0.53 False Positive Rates
  35. 0 100 200 300 400 500 0 200 400 600

    800 Seeds Evaded (out of 500) Generations Hidost16 HidostR1 HidostR2 Genome Contagio Benign Hidost16 0.00 0.00 HidostR1 0.78 0.30 HidostR2 0.85 0.53 False Positive Rates
  36. 0 100 200 300 400 500 0 500 1000 1500

    2000 Retrained using evasive variants and all benign samples available to adversary .11 Evasion Rate .07 False Positive Generations
  37. Variants Clone Benign PDFs Malicious PDF Mutation Variants Variants Select

    Variants ✓ ✓ ✗ ✓ Found Evasive? Hide Classifier “Security Through Obscurity” Clone Generated Variants Clone Variants Fitness Function Candidate Variant ($%&'() , '(&++ ) Score Malicious Benign PDFs Malicious PDF Variants Benign PDFs Malicious PDF Variants Oracle Variant 0 /JavaScript eval(‘…’); /Root /Catalog /Pages 128 Oracle Target Classifier
  38. Cross-Evasion Effects PDF Malware Seeds Hidost 13 Evasive PDF Malware

    (against PDFrate) Automated Evasion PDFrate 2/500 Evasive (0.4% Success) Potentially Good News?
  39. Evasive PDF Malware (against PDFrate) Cross-Evasion Effects PDF Malware Seeds

    Hidost 13 Automated Evasion PDFrate 2/500 Evasive (0.4% Success) Evasive PDF Malware (against Hidost) 387/500 Evasive (77.4% Success)
  40. 387/500 Evasive (77.4% Success) Cross-Evasion Effects PDF Malware Seeds Hidost

    13 Automated Evasion PDFrate Evasive PDF Malware (against Hidost)
  41. Cross-Evasion Effects PDF Malware Seeds Hidost 13 Automated Evasion Evasive

    PDF Malware (against Hidost) 6/500 Evasive (0.6% Success)
  42. Evading Gmail’s Classifier Evasion rate on Gmail: 179/380 (47.1%) for

    javascript in pdf.all_js: javascript.append_code("var enigma=1;“) if pdf.get_size() < 7050000: pdf.add_padding(7050000 – pdf.get_size())