Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Automatically Evading Classifiers: A Case Study on PDF Malware Classifiers

David Evans
February 24, 2016

Automatically Evading Classifiers: A Case Study on PDF Malware Classifiers

Weilin Xu's talk at
Network and Distributed Systems Symposium 2016
24 February 2016
San Diego, California

For more details see: http://evademl.org/
and the full paper: http://evademl.org/docs/evademl.pdf

David Evans

February 24, 2016
Tweet

More Decks by David Evans

Other Decks in Research

Transcript

  1. Automatically Evading Classifiers
    A Case Study on PDF Malware Classifiers
    Weilin Xu David Evans Yanjun Qi
    University of Virginia

    View full-size slide

  2. Machine Learning is Solving Our Problems
    2
    Fake
    Spam IDS Malware
    Fake
    Accounts


    View full-size slide

  3. Machine Learning is Eating the World
    Data
    Scientist
    Security
    Expert
    5
    ?

    View full-size slide

  4. Machine Learning is Eating the World
    Data
    Scientist
    Security
    Expert
    6
    No!
    Security is different.

    View full-size slide

  5. Goal: Understand classifiers under attack.
    Results: Vulnerable to automated evasion.
    Security Tasks are Different:
    Adversary Adapts
    7

    View full-size slide

  6. Building Machine Learning Classifiers
    8
    Trained Classifier
    Labelled
    Training
    Data
    ML
    Algorithm
    Training
    (Supervised Learning)
    Feature
    Extraction
    Vectors

    View full-size slide

  7. Assumption: Training Data is Representative
    9
    Labelled
    Training
    Data
    ML
    Algorithm
    Feature
    Extraction
    Vectors
    Deployment
    Malicious / Benign
    Operational Data
    Trained Classifier
    Training
    (Supervised Learning)

    View full-size slide

  8. Results: Evaded PDF Malware Classifiers
    PDFrate*
    [ACSAC’12]
    Hidost
    [NDSS’13]
    Accuracy 0.9976 0.9996
    False Negative Rate 0.0000 0.0056
    False Negative Rate
    with Adversary
    1.0000 1.0000
    10
    * Mimicus [Oakland ’14], an open source reimplementation of PDFrate.

    View full-size slide

  9. Results: Evaded PDF Malware Classifiers
    PDFrate*
    [ACSAC’12]
    Hidost
    [NDSS’13]
    Accuracy 0.9976 0.9996
    False Negative Rate 0.0000 0.0056
    False Negative Rate
    with Adversary
    1.0000 1.0000
    11
    Very robust against “strongest
    conceivable mimicry attack”.
    * Mimicus [Oakland ’14], an open source reimplementation of PDFrate.

    View full-size slide

  10. Variants
    12
    Clone
    Benign PDFs
    Malicious PDF
    Mutation
    01011001101
    Variants
    Variants
    Select
    Variants




    Based on Genetic Programming
    Automated Evasion Approach

    View full-size slide

  11. Variants
    13
    Clone
    Benign PDFs
    Malicious PDF
    Mutation
    01011001101
    Variants
    Variants
    Select
    Variants




    Based on Genetic Programming
    Automated Evasion Approach
    /Catalog /Pages
    0
    /JavaScript
    eval(‘…’);
    /Root
    Modified
    Parser
    Extract Me If You Can:
    Abusing PDF Parsers in Malware Detectors
    Curtis Carmony,et al.

    View full-size slide

  12. Variants
    14
    Clone
    Benign PDFs
    Malicious PDF
    Mutation
    01011001101
    Variants
    Variants
    Select
    Variants




    Based on Genetic Programming
    Automated Evasion Approach
    /Catalog /Pages
    0
    /JavaScript
    eval(‘…’);
    /Root
    Mutation
    Variants From
    Benign
    Insert / Replace / Delete

    View full-size slide

  13. Variants
    15
    Clone
    Benign PDFs
    Malicious PDF
    Mutation
    01011001101
    Variants
    Variants
    Select
    Variants




    Based on Genetic Programming
    Automated Evasion Approach
    /Catalog /Pages
    0
    /JavaScript
    eval(‘…’);
    /Root
    Mutation
    Variants From
    Benign
    128
    546
    0
    0
    Insert / Replace / Delete

    View full-size slide

  14. Variants
    16
    Clone
    Benign PDFs
    Malicious PDF
    Mutation
    01011001101
    Variants
    Variants
    Select
    Variants




    Based on Genetic Programming
    Automated Evasion Approach
    /Catalog /Pages
    0
    /JavaScript
    eval(‘…’);
    /Root
    Mutation
    Variants From
    Benign
    128
    546
    0
    0
    Insert / Replace / Delete

    View full-size slide

  15. Variants
    17
    Clone
    Benign PDFs
    Malicious PDF
    Mutation
    01011001101
    Variants
    Variants
    Select
    Variants




    Based on Genetic Programming
    Automated Evasion Approach
    /Catalog /Pages
    0
    /JavaScript
    eval(‘…’);
    /Root
    Mutation
    Variants From
    Benign
    128
    546
    0
    0
    128
    0
    Insert / Replace / Delete

    View full-size slide

  16. Variants
    18
    Clone
    Benign PDFs
    Malicious PDF
    Mutation
    01011001101
    Variants
    Variants
    Select
    Variants




    Based on Genetic Programming
    Automated Evasion Approach
    /Catalog /Pages
    0
    /JavaScript
    eval(‘…’);
    /Root
    Mutation
    Variants From
    Benign
    128
    0
    Insert / Replace / Delete

    View full-size slide

  17. Variants
    19
    Clone
    Benign PDFs
    Malicious PDF
    Mutation
    01011001101
    Variants
    Variants
    Select
    Variants




    Based on Genetic Programming
    Automated Evasion Approach
    /Catalog /Pages
    0
    /JavaScript
    eval(‘…’);
    /Root
    Mutation
    Variants From
    Benign
    128
    0
    Insert / Replace / Delete

    View full-size slide

  18. Variants
    20
    Clone
    Benign PDFs
    Malicious PDF
    Mutation
    01011001101
    Variants
    Variants
    Select
    Variants




    Based on Genetic Programming
    Automated Evasion Approach

    View full-size slide

  19. Variants
    21
    Clone
    Benign PDFs
    Malicious PDF
    Mutation
    01011001101
    Variants
    Variants
    Select
    Variants




    Based on Genetic Programming
    Automated Evasion Approach
    Fitness Function
    Oracle
    Target Classifier
    f(x)
    Malicious?
    Score
    Fitness Score
    Variants

    View full-size slide

  20. Variants
    22
    Clone
    Benign PDFs
    Malicious PDF
    Mutation
    01011001101
    Variants
    Variants
    Select
    Variants




    Based on Genetic Programming
    Automated Evasion Approach
    Fitness Function
    Oracle
    Target Classifier
    f(x)
    Malicious?
    Score
    Fitness Score
    Variants
    Malicious
    Benign

    View full-size slide

  21. Variants
    23
    Clone
    Benign PDFs
    Malicious PDF
    Mutation
    01011001101
    Variants
    Variants
    Select
    Variants




    Based on Genetic Programming
    Automated Evasion Approach

    View full-size slide

  22. Results: Evaded PDFrate 100%
    24
    Original Malware
    Seeds

    View full-size slide

  23. Results: Evaded PDFrate 100%
    25
    Original Malware
    Seeds
    Evasive Variants

    View full-size slide

  24. Evaded PDFrate with Adjusted Threshold
    26
    Original Malware
    Seeds
    Evasive Variants
    Evasive Variants
    with lower threshold

    View full-size slide

  25. Results: Evaded Hidost 100%
    27
    Original Malware
    Seeds

    View full-size slide

  26. Results: Evaded Hidost 100%
    28
    Original Malware
    Seeds
    Evasive Variants

    View full-size slide

  27. 29
    Difficulty varies by seed
    Simple mutations often work
    Complex mutations sometimes
    needed.
    Difficulty varied by targets:
    PDFrate: 6 days to evade all
    Hidost: 2 days to evade all
    Results: Accumulated Evasion Rate

    View full-size slide

  28. Cross-Evasion Effects
    30
    PDF Malware
    Seeds
    Hidost
    Evasive
    PDF Malware
    (against Hidost)
    Automated Evasion
    PDFrate
    387/500 Evasive
    (77.4%)
    3/500 Evasive
    (0.6%)
    Gmail’s classifier is secure?

    View full-size slide

  29. Cross-Evasion Effects
    31
    PDF Malware
    Seeds
    Hidost
    Evasive
    PDF Malware
    (against Hidost)
    Automated Evasion
    PDFrate
    387/500 Evasive
    (77.4%)
    3/500 Evasive
    (0.6%)
    Gmail’s classifier is secure? different.

    View full-size slide

  30. Evading Gmail’s Classifier
    32
    Evasion rate on : 135/380 (35.5%)

    View full-size slide

  31. Evading Gmail’s Classifier
    33
    Evasion rate on : 179/380 (47.1%)

    View full-size slide

  32. Conclusion
    34
    Source Code: http://EvadeML.org
    Vs.
    Who will win this arm race?

    View full-size slide