Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Adversarial Machine Learning: Are We Playing the Wrong Game?

David Evans
July 10, 2017

Adversarial Machine Learning: Are We Playing the Wrong Game?

CISPA Distinguished Lecture
Center for IT-Security, Privacy and Accountability
Universität des Saarlandes
10 July 2017

https://privacy-sfb.cispa.saarland/blog/distinguished-lecture-adversarial-machine-learning-are-we-playing-the-wrong-game/

David Evans

July 10, 2017
Tweet

More Decks by David Evans

Other Decks in Science

Transcript

  1. Adversarial
    Machine Learning:
    Are We Playing
    the Wrong Game?
    David Evans
    University of Virginia
    work mostly with
    Weilin Xu and
    Yanjun Qi
    evadeML.org
    Center for IT-Security,
    Privacy and Accountability,
    Universität des Saarlandes
    10 July 2017

    View Slide

  2. Machine Learning Does Amazing Things
    1

    View Slide

  3. 2

    View Slide

  4. … and can solve all Security
    Problems!
    Fake
    Spam
    IDS
    Malware
    Fake Accounts

    “Fake News”

    View Slide

  5. Labelled
    Training Data
    ML
    Algorithm
    Feature
    Extraction
    Vectors
    Deployment
    Malicious / Benign
    Operational Data
    Trained Classifier
    Training
    (supervised learning)

    View Slide

  6. Labelled
    Training Data
    ML
    Algorithm
    Feature
    Extraction
    Vectors
    Deployment
    Malicious / Benign
    Operational Data
    Trained Classifier
    Training
    (supervised learning)
    Assumption: Training Data is Representative

    View Slide

  7. Adversaries Don’t Cooperate
    Assumption: Training Data is Representative
    Deployment
    Training

    View Slide

  8. Deployment
    Adversaries Don’t Cooperate
    Assumption: Training Data is Representative
    Training
    Poisoning

    View Slide

  9. Adversaries Don’t Cooperate
    Assumption: Training Data is Representative
    Evading
    Deployment
    Training

    View Slide

  10. Focus: Evasion Attacks
    Goals: Understand classifier robustness
    Build better classifiers (or give up)

    View Slide

  11. Adversarial Examples
    10
    0.007 × []
    + =
    “panda” “gibbon”
    Example from: Ian J. Goodfellow, Jonathon Shlens, Christian Szegedy.
    Explaining and Harnessing Adversarial Examples. ICLR 2015.

    View Slide

  12. Goal of Machine Learning Classifier
    11
    Metric Space 1: Target Classifier Metric Space 2: “Oracle”
    Model and visualization based on work by Beilun Wang, Ji Gao and Yanjun Qi (ICLR 2017 Workshop)

    View Slide

  13. Well-Trained Classifier
    12
    Metric Space 1: Target Classifier Metric Space 2: “Oracle”
    Model and visualization based on work by Beilun Wang, Ji Gao and Yanjun Qi (ICLR 2017 Workshop)

    View Slide

  14. Adversarial Examples
    13
    Metric Space 1: Target Classifier Metric Space 2: “Oracle”
    Model and visualization based on work by Beilun Wang, Ji Gao and Yanjun Qi (ICLR 2017 Workshop)

    View Slide

  15. Adversarial Examples
    14
    Metric Space 1: Target Classifier Metric Space 2: “Oracle”
    Adversary’s goal: find a small perturbation that changes class for classifier, but imperceptible to oracle.

    View Slide

  16. Misleading Visualization
    15
    Metric Space 1: Target Classifier
    Cartoon Reality
    2 dimensions
    thousands of
    dimensions
    few samples near
    boundaries
    all samples near
    boundaries
    every sample
    near 1-3 classes
    every sample
    near all classes

    View Slide

  17. Formalizing Adversarial Examples Game
    16
    Given seed sample, , find 0 where:
    0 ≠ () Class is different
    ∆ , 0 ≤ Difference below threshold

    View Slide

  18. Formalizing Adversarial Examples Game
    17
    Given seed sample, , find 0 where:
    0 ≠ () Class is different
    ∆ , 0 ≤ Difference below threshold
    ∆ is defined in some metric space:
    9 “norm” (# different): ⋕ <
    ≠ <
    0)
    >norm: ∑ |<
    − <
    0|
    Cnorm (“Euclidean”): ∑(<
    −<
    0)C
    Dnorm: max(<
    −<
    0)

    View Slide

  19. Targeted Attacks
    18
    Given seed sample, , find 0 where:
    0 ≠ () Class is different
    ∆ , 0 ≤ Difference below threshold
    Untargeted
    Attack
    Given seed sample, , and target class, , find 0 where:
    0 = Class is
    ∆ , 0 ≤ Difference below threshold
    Targeted
    Attack

    View Slide

  20. Datasets
    MNIST
    19
    2 8 7 6 8 6 5 9
    70 000 images
    28×28 pixels, 8-bit grayscale
    scanned hand-written digits
    labeled by humans
    LeCun, Cortes, Burges [1998]

    View Slide

  21. Datasets
    MNIST CIFAR-10
    20
    2 8 7 6 8 6 5 9
    70 000 images
    28×28 pixels, 8-bit grayscale
    scanned hand-written digits
    labeled by humans
    truck
    ship
    horse
    frog
    dog
    deer
    cat
    bird
    automobile
    airplane
    60 000 images
    32×32 pixels, 24-bit color
    human-labeled subset of
    images in 10 classes from
    Tiny Images Dataset
    Alex Krizhevsky [2009]
    LeCun, Cortes, Burges [1998]

    View Slide

  22. ImageNet
    21
    14 Million high-resolution, full color images
    Manually annotated in WordNet
    ~20,000 synonym sets (~1000 images in each)
    Models: MobileNet (Top-1 accuracy: .684 / Top-5: .882)
    Inception v3 (Top-1: .763 / Top-5: .930)

    View Slide

  23. 22

    View Slide

  24. D
    Adversary (Fast Gradient Sign)
    23
    original 0.1 0.2 0.3 0.4 0.5
    Adversary Power:
    Dnorm adversary: max(<
    −<
    0) <
    <
    0 = <
    − ⋅ sign(lossS
    ())

    View Slide

  25. D
    Adversary: Binary Filter
    24
    original 0.1 0.2 0.3 0.4 0.5
    Adversary Power:
    1-bit
    filter

    View Slide

  26. AdversarialDNN Playground
    25
    Andrew Norton and Yanjun Qi
    Live demo: https://evadeML.org/playground
    Will be integrated with EvadeML-Zoo models and attacks soon!
    L

    View Slide

  27. 26
    Given seed sample, , find 0 where:
    0 ≠ () Class is different
    or 0 = Class is target class
    ∆ , 0 ≤ Difference below threshold
    Is this the
    right game?

    View Slide

  28. Is this the
    right game?
    27

    View Slide

  29. Arms Race
    28
    ICLR 2014
    NDSS 2013
    ICLR 2015
    S&P 2016
    S&P 2017
    NDSS 2016
    NDSS 2016
    This Talk
    Feb 2017

    View Slide

  30. New Idea: Detect Adversarial Examples
    29
    Given seed sample, , find 0 where:
    0 ≠ () Class is different
    ∆ , 0 ≤ Difference below threshold
    Deployed classifier only sees 0 - can we search for “”?

    View Slide

  31. 30
    Model
    Model
    Model
    Filter 1 Filter 2
    Prediction
    Prediction′
    Prediction′′
    Compare
    Predictions
    Difference exceeds
    threshold
    Reject
    Prediction
    Ok
    Input
    Need filters that do not affect predictions on normal inputs,
    but that reverse malicious perturbations.

    View Slide

  32. “Feature Squeezing”
    31
    0

    0 ≠ ()
    [0.054, 0.4894, 0.9258, 0.0116, 0.2898, 0.5222, 0.5074, …]
    [0.0491, 0.4903, 0.9292, 0.009, 0.2942, 0.5243, 0.5078, …]

    View Slide

  33. “Feature Squeezing”
    32
    [0.054, 0.4894, 0.9258, 0.0116, 0.2898, 0.5222, 0.5074, …]
    [0.0491, 0.4903, 0.9292, 0.009, 0.2942, 0.5243, 0.5078, …]
    [0.0, 0.5, 1.0, 0.0, 0.25, 0.5, 0.5, …]
    0

    Squeeze: <
    = round(<
    ×4)/4
    Squeeze: <
    = round(<
    ×4)/4
    [0.0, 0.5, 1.0, 0.0, 0.25, 0.5, 0.5, …]
    squeeze 0 ≈ squeeze ⟹ (squeeze 0 ) ≈ (squeeze )

    View Slide

  34. Squeezing Images
    33
    Reduce Color Depth
    8-bit greyscale
    1-bit monochrome

    View Slide

  35. Squeezing Images
    34
    Reduce Color Depth Median Smoothing
    8-bit greyscale
    1-bit monochrome
    3x3 smoothing:
    Replace with median of pixels and its neighbors

    View Slide

  36. MNIST Results: Accuracy
    35
    Original (8) 7 6 5 4 3 2 1
    .9930 .9930 .9930 .9930 .9930 .9928 .9926 .9924
    Reducing bit depth (all the way to 1) barely reduces model accuracy!
    Correct on original image, wrong on 1-bit filtered image (19)
    Wrong on original image, correct on 1-bit filtered image (13)
    (out of 10 000 MNIST test images)
    Both wrong, but differently

    View Slide

  37. Robustness Results (MNIST)
    36
    bit depth
    accuracy
    .00
    .25
    .50
    .75
    1.00
    8 7 6 5 4 3 2 1
    non-adversarial (ε=0.0)
    ε=0.3
    ε=0.2
    ε=0.1
    adversary strength (ε)
    .987
    .944
    .640
    .107
    0.0 0.1 0.2 0.3 0.4 0.5 0.6
    8-bit (unfiltered)
    1-bit filtered
    Even for strong adversaries, 1-bit filter effectively removes adversarial perturbations

    View Slide

  38. 9
    Adversary
    (Jacobian-based Saliency Map)
    37
    original
    JSMA
    9 “norm” (# different): ⋕ <
    ≠ <
    0)
    Adversary strength = 0.1 (can modify up to 10% of pixels)

    View Slide

  39. 9
    Adversary
    (Jacobian-based Saliency Map)
    38
    original
    JSMA
    smoothed
    (3x3)

    View Slide

  40. Smoothing Results (MNIST)
    39
    .993 .988
    .991 .980 .943
    .845
    .650
    .479
    .014
    .700
    .976
    .953
    .906
    .791
    .616
    .454
    .00
    .25
    .50
    .75
    1.00
    1 2 3 4 5 6 7 8
    Adversarial (JSMA)
    Original
    accuracy
    smoothing window (×)
    No smoothing: adversary succeeds 98.6% of time

    View Slide

  41. Smoothing Results
    40
    .993 .988
    .991 .980 .943
    .845
    .650
    .479
    .014
    .700
    .976
    .953
    .906
    .791
    .616
    .454
    .00
    .25
    .50
    .75
    1.00
    1 2 3 4 5 6 7 8
    Adversarial (JSMA)
    Original
    accuracy
    smoothing window (×)
    .9257 .8592
    .7812
    .0100
    .8400
    .7500
    1 2 3 4
    MNIST
    CIFAR-10
    2 × 2 smoothing
    defeats adversary,
    but reduces accuracy

    View Slide

  42. Carlini/Wagner Untargeted Attacks
    41
    Data Set Attack
    Accuracy on
    Adversarial
    Examples
    MNIST
    C
    0.0
    D
    0.0
    9
    0.0
    CIFAR-10
    C
    0.0
    D
    0.0
    9
    0.0
    Nicholas Carlini, David Wagner. Oakland 2017 (Best Student Paper)
    Adversary suceeds
    100% of the time
    with very small
    perturbations
    “Our D
    attacks on ImageNet
    are so successful that we can
    change the classification of an
    image to any desired label by
    only flipping the lowest bit of
    each pixel, a change that
    would be impossible to detect
    visually.”

    View Slide

  43. Squeezing Results (2x2 Median Smoothing)
    42
    Weilin Xu, David Evans, Yanjun Qi. https://arxiv.org/1705.10686
    Data Set Attack
    Accuracy on Adversarial Examples
    Original Squeezed
    MNIST
    C
    0.0 0.904
    D
    0.0 0.942
    9
    0.0 0.817
    CIFAR-10
    C
    0.0 0.682
    D
    0.0 0.661
    9
    0.0 0.706

    View Slide

  44. Results on Carlini/Wagner Untargeted Attacks
    43
    Weilin Xu, David Evans, Yanjun Qi. https://arxiv.org/1705.10686
    Data Set Attack
    Accuracy on Adversarial Examples
    Original Squeezed
    MNIST
    C
    0.0 0.904
    D
    0.0 0.942
    9
    0.0 0.817
    CIFAR-10
    C
    0.0 0.682
    D
    0.0 0.661
    9
    0.0 0.706
    Accuracy on
    legitimate
    examples:
    0.783

    View Slide

  45. Results on Carlini/Wagner Targeted Attacks
    44
    Weilin Xu, David Evans, Yanjun Qi. https://arxiv.org/1705.10686
    Data Set Attack
    Adversary Success Rate
    Original Squeezed
    MNIST
    C
    0.999 0.022
    D
    1.0 0.011
    9
    1.0 0.057
    CIFAR-10
    C
    1.0 0.033
    D
    1.0 0.037
    9
    1.0 0.037

    View Slide

  46. Detecting Adversarial Examples
    45
    Model
    Model
    Model
    Squeeze
    Bit Depth
    Median
    Smoothing
    Prediction
    Prediction′
    Prediction′′
    Compare
    Predictions
    (>
    distance)
    Difference exceeds
    threshold
    Adversarial
    Normal
    Ok
    Input

    View Slide

  47. 46
    0
    50
    100
    150
    200
    250
    300
    350
    0.0 0.5 1.0 1.5 2.0
    FGSM
    Non-adversarial
    >
    between normal and squeezed predictions
    MNIST
    binning-histogram
    over 1000 samples
    Detection Results
    JSMA

    View Slide

  48. 47
    0
    50
    100
    150
    200
    250
    300
    350
    0.0 0.5 1.0 1.5 2.0
    FGSM
    Non-adversarial
    JSMA
    >
    between normal and squeezed predictions
    MNIST
    binning-histogram
    over 1000 samples
    Detection Results
    With max >
    threshold 0.362, detect
    99.5% of adversarial examples with
    false positive rate of 1.5%.

    View Slide

  49. Detection Results on C/W Attacks
    Dataset ROC-AUC
    L1
    Distance
    Threshold
    Accuracy
    True
    Positive
    Rate
    False
    Positive
    Rate
    MNIST 0.9950 0.1147 0.9880 0.9933 0.0173
    CIFAR-10 0.8711 0.7423 0.8750 0.9527 0.2027
    Weilin Xu, David Evans, Yanjun Qi. https://arxiv.org/1705.10686
    (Validated results – use half samples to determine threshold, test with other half)

    View Slide

  50. 0
    5
    10
    15
    20
    25
    0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2
    Non-adversarial
    FGSM
    >
    between normal and squeezed predictions
    ImageNet with MobileNet
    (histogram for 68 seeds)

    View Slide

  51. 0
    5
    10
    15
    20
    25
    0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2
    DeepFool
    Carlini/Wagner (_2)
    FGSM
    Non-adversarial
    >
    between normal and squeezed predictions
    ImageNet with MobileNet
    (histogram for 68 seeds)
    Adversarial Success Rate: 100%
    Adversarial Success Rate: 47%

    View Slide

  52. Arms Race
    51
    ICLR 2014
    ICLR 2015
    S&P 2016
    S&P 2017
    NDSS 2013
    NDSS 2016
    NDSS 2016
    Feature Squeezing
    15 June 2017 (arXiv)
    Quick Hack (not yet published)
    Weilin Xu, and others
    A new tweak
    Authors TBD
    Delta, my Epsilon!
    Authors TBD

    View Slide

  53. Raising the Bar or Changing the Game?
    52
    Metric Space 1: Target Classifier Metric Space 2: “Oracle”
    Before: find a small perturbation that changes class for classifier, but imperceptible to oracle.

    View Slide

  54. Raising the Bar or Changing the Game?
    53
    Metric Space 1: Target Classifier Metric Space 2: “Oracle”
    Before: find a small perturbation that changes class for classifier, but imperceptible to oracle.
    Now: change class for both original and squeezed classifier, but imperceptible to oracle.

    View Slide

  55. “Feature Squeezing” Conjecture
    For any distance-limited adversarial
    method, there exists some feature squeezer
    that accurately detects its adversarial
    examples.
    54
    Intuition: if the perturbation is small (in some simple
    metric space), there is some squeezer that coalesces
    original and adversarial example into same sample.

    View Slide

  56. Entropy Advantage
    Model
    Model
    Model
    Randomized
    Squeezer #1
    Prediction
    Prediction′
    Prediction′′
    Compare
    Predictions
    (>
    distance)
    Difference exceeds
    threshold
    Adversarial
    Normal
    Ok
    Input
    Randomized
    Squeezer #2
    Squeezers can be selected randomly, and
    behave randomly different for each feature

    View Slide

  57. Changing the Game
    Option 1:
    Find distance-limited adversarial methods for which it
    is intractable to find effective feature squeezer.
    Option 2:
    Redefine adversarial examples so distance is not
    limited (in simple metric space).
    56
    focus of rest of the talk

    View Slide

  58. Evolutionary Search for
    Faraway Adversarial Examples
    57

    View Slide

  59. Faraway Adversarial Examples
    58
    Metric Space 1: Target Classifier Metric Space 2: “Oracle”
    Need a domain where we know Metric Space 2: “Oracle”

    View Slide

  60. Domain: PDF Malware Classifiers

    View Slide

  61. 0
    50
    100
    150
    200
    250
    2006
    2007
    2008
    2009
    2010
    2011
    2012
    2013
    2014
    2015
    2016
    2017
    Vulnerabilities reported
    in Adobe Acrobat Reader
    Source: http://www.cvedetails.com/vulnerability-list.php?vendor_id=53&product_id=921

    View Slide

  62. PDF Malware Classifiers
    Random Forest Random Forest
    Support Vector Machine
    Features
    Object counts,
    lengths,
    positions, …
    Object structural paths
    Very robust against “strongest
    conceivable mimicry attack”.
    Automated Features
    Manual Features
    PDFrate
    [ACSA 2012]
    Hidost16
    [JIS 2016]
    Hidost13
    [NDSS 2013]

    View Slide

  63. Variants
    Automated Classifier Evasion
    Using Genetic Programming
    Clone
    Benign PDFs
    Malicious PDF
    Mutation
    Variants
    Variants
    Select
    Variants




    Found
    Evasive?
    Benign
    Oracle

    View Slide

  64. Variants
    Generating Variants
    Clone
    Benign PDFs
    Malicious PDF
    Mutation
    Variants
    Variants
    Select
    Variants




    Found
    Evasive?

    View Slide

  65. Variants
    Generating Variants
    Clone
    Benign PDFs
    Malicious PDF
    Mutation
    Variants
    Variants
    Select
    Variants




    Found
    Evasive?
    Found
    Evasive
    ?
    0
    /JavaScript
    eval(‘…’);
    /Root
    /Catalog
    /Pages

    View Slide

  66. Variants
    Generating Variants
    Clone
    Benign PDFs
    Malicious PDF
    Mutation
    Variants
    Variants
    Select
    Variants




    Found
    Evasive?
    Found
    Evasive
    ?
    0
    /JavaScript
    eval(‘…’);
    /Root
    /Catalog
    /Pages
    Select random node

    View Slide

  67. Variants
    Generating Variants
    Clone
    Benign PDFs
    Malicious PDF
    Mutation
    Variants
    Variants
    Select
    Variants




    Found
    Evasive?
    Found
    Evasive
    ?
    0
    /JavaScript
    eval(‘…’);
    /Root
    /Catalog
    /Pages
    Select random node
    Randomly transform: delete, insert, replace

    View Slide

  68. Variants
    Generating Variants
    Clone
    Benign PDFs
    Malicious PDF
    Mutation
    Variants
    Variants
    Select
    Variants
    Found
    Evasive?
    Found
    Evasive
    ?
    Select random node
    Randomly transform: delete, insert, replace
    Nodes from
    Benign PDFs
    0
    /JavaScript
    eval(‘…’);
    /Root
    /Catalog
    /Pages
    128
    546
    7
    63
    128

    View Slide

  69. Variants
    Selecting Promising Variants
    Clone
    Benign PDFs
    Malicious PDF
    Mutation
    Variants
    Variants
    Select
    Variants




    Found
    Evasive?

    View Slide

  70. Variants
    Selecting Promising Variants
    Clone
    Benign PDFs
    Malicious PDF
    Mutation
    Variants
    Variants
    Select
    Variants




    Found
    Evasive?
    Fitness Function
    Candidate Variant
    (efghij
    , higkk
    )
    Score
    Malicious
    0
    /JavaScript
    eval(‘…’);
    /Root
    /Catalog
    /Pages
    128
    Oracle
    Target Classifier

    View Slide

  71. Oracle
    Execute candidate in
    vulnerable Adobe Reader in
    virtual environment
    Behavioral signature:
    malicious if signature matches
    https://github.com/cuckoosandbox
    Simulated network: INetSim
    Cuckoo
    HTTP_URL + HOST
    extracted from API traces
    Advantage: we know the target malware behavior

    View Slide

  72. Variants
    Selecting Promising Variants
    Clone
    Benign PDFs
    Malicious PDF
    Mutation
    Variants
    Variants
    Select
    Variants




    Found
    Evasive?
    Fitness Function
    Candidate Variant
    (efghij
    , higkk
    )
    Score
    Malicious
    0
    /JavaScript
    eval(‘…’);
    /Root
    /Catalog
    /Pages
    128
    Oracle
    Target Classifier

    View Slide

  73. Fitness Function
    Assumes lost malicious behavior will not be
    recovered
    = m
    .5 − classifier_score if oracle = "malicious"
    −∞ otherwise
    classifier_score ≥ 0.5: labeled malicious

    View Slide

  74. Experimental Results

    View Slide

  75. Classifier Performance
    PDFrate Hidost
    Accuracy 0.9976 0.9996
    False Negative Rate 0.0000 0.0056
    Results on non-adversarial samples

    View Slide

  76. Classifier Performance
    PDFrate Hidost
    Accuracy 0.9976 0.9996
    False Negative Rate 0.0000 0.0056
    False Negative Rate
    against Adversary
    1.0000 1.0000

    View Slide

  77. 0
    100
    200
    300
    400
    500
    0 100 200 300
    Seeds Evaded
    (out of 500)
    PDFRate
    Number of Mutations
    Hidost

    View Slide

  78. 0
    100
    200
    300
    400
    500
    0 100 200 300
    Seeds Evaded
    (out of 500)
    PDFRate
    Number of Mutations
    Hidost
    Simple
    transformations
    often worked

    View Slide

  79. 0
    100
    200
    300
    400
    500
    0 100 200 300
    Seeds Evaded
    (out of 500)
    PDFRate
    Number of Mutations
    Hidost
    (insert, /Root/Pages/Kids,
    3:/Root/Pages/Kids/4/Kids/5/)
    Works on 162/500 seeds

    View Slide

  80. 0
    100
    200
    300
    400
    500
    0 100 200 300
    Seeds Evaded
    (out of 500)
    PDFRate
    Number of Mutations
    Hidost
    Works on 162/500 seeds
    Some seeds
    required complex
    transformations

    View Slide

  81. Insert: Threads, ViewerPreferences/Direction, Metadata,
    Metadata/Length, Metadata/Subtype, Metadata/Type,
    OpenAction/Contents, OpenAction/Contents/Filter,
    OpenAction/Contents/Length, Pages/MediaBox
    Delete: AcroForm, Names/JavaSCript/Names/S,
    AcroForm/DR/Encoding/PDFDocEncoding,
    AcroForm/DR/Encoding/PDFDocEncoding/Differences,
    AcroForm/DR/Encoding/PDFDocEncoding/Type, Pages/Rotate,
    AcroForm/Fields, AcroForm/DA, Outlines/Type, Outlines,
    Outlines/Count, Pages/Resources/ProcSet, Pages/Resources
    85-step mutation trace evading Hidost
    Effective for 198/500 seeds

    View Slide

  82. 0 20 40 60 80 100 120
    Hidost
    PDFrate Oracle
    Execution Cost
    Hours to find all 500 variants on one desktop PC
    Oracle
    Mutation
    Classifier

    View Slide

  83. Possible Defenses

    View Slide

  84. Possible Defense:
    Adjust Threshold
    Charles Smutz, Angelos Stavrou. When a Tree Falls:
    Using Diversity in Ensemble Classifiers to Identify
    Evasion in Malware Detectors. NDSS 2016.

    View Slide

  85. Original Malicious Seeds
    Evading PDFrate
    Malicious Label Threshold

    View Slide

  86. Discovered Evasive Variants
    Adjust threshold?

    View Slide

  87. Adjust threshold?
    Variants found with threshold = 0.25
    Variants found with threshold = 0.50

    View Slide

  88. Possible Defense:
    Retrain Classifier

    View Slide

  89. Labelled
    Training Data
    ML
    Algorithm
    Feature
    Extraction
    Vectors
    Deployment
    Malicious / Benign
    Operational Data
    Trained Classifier
    Training
    (supervised learning)
    Retrain Classifier

    View Slide

  90. Labelled
    Training Data
    ML
    Algorithm
    Feature
    Extraction
    Vectors
    Training
    (supervised learning)
    Clone
    EvadeML

    View Slide

  91. Labelled
    Training Data
    ML
    Algorithm
    Feature
    Extraction
    Vectors
    Training
    (supervised learning)
    Clone
    EvadeML
    Deployment

    View Slide

  92. 0
    100
    200
    300
    400
    500
    0 200 400 600 800
    Seeds Evaded (out of 500)
    Generations
    Hidost16
    Original classifier:
    Takes 614 generations
    to evade all seeds

    View Slide

  93. 0
    100
    200
    300
    400
    500
    0 200 400 600 800
    HidostR1
    Seeds Evaded (out of 500)
    Generations
    Hidost16

    View Slide

  94. 0
    100
    200
    300
    400
    500
    0 200 400 600 800
    HidostR1
    Seeds Evaded (out of 500)
    Generations
    Hidost16

    View Slide

  95. 0
    100
    200
    300
    400
    500
    0 200 400 600 800
    HidostR1
    HidostR2
    Seeds Evaded (out of 500)
    Generations
    Hidost16

    View Slide

  96. 0
    100
    200
    300
    400
    500
    0 200 400 600 800
    HidostR1
    HidostR2
    Seeds Evaded (out of 500)
    Generations
    Hidost16

    View Slide

  97. 0
    100
    200
    300
    400
    500
    0 200 400 600 800
    Hidost16
    Genome Contagio Benign
    Hidost16 0.00 0.00
    HidostR1 0.78 0.30
    HidostR2 0.85 0.53
    False Positive Rates
    HidostR1
    Seeds Evaded (out of 500)
    Generations
    HidostR2

    View Slide

  98. Possible Defense:
    Hide Classifier

    View Slide

  99. Variants
    Hiding the Classifier
    Clone
    Benign PDFs
    Malicious PDF
    Mutation
    Variants
    Variants
    Select
    Variants




    Found
    Evasive?
    Fitness Function
    Candidate Variant
    (efghij
    , higkk
    )
    Score
    Malicious
    0
    /JavaScript
    eval(‘…’);
    /Root
    /Catalog
    /Pages
    128
    Oracle
    Target Classifier

    View Slide

  100. Evading Classifiers in the Dark
    99
    arXiv, May 2017

    View Slide

  101. Cross-Evasion Effects
    PDF Malware
    Seeds
    Hidost 13
    Evasive
    PDF Malware
    (against PDFrate)
    Automated Evasion
    PDFrate
    2/500 Evasive
    (0.4% Success)
    Potentially Good News?

    View Slide

  102. Evasive
    PDF Malware
    (against PDFrate)
    Cross-Evasion Effects
    PDF Malware
    Seeds
    Hidost 13
    Automated Evasion
    PDFrate
    2/500 Evasive
    (0.4% Success)
    Evasive
    PDF Malware
    (against Hidost)
    387/500 Evasive
    (77.4% Success)

    View Slide

  103. Cross-Evasion Effects
    PDF Malware
    Seeds
    Automated Evasion
    6/500 Evasive
    (1.2% Success)
    Hidost 13
    Evasive
    PDF Malware
    (against Hidost)

    View Slide

  104. Evading Gmail’s Classifier
    Evasion rate on Gmail: 179/380 (47.1%)
    for javascript in pdf.all_js:
    javascript.append_code("var ucb=1;“)
    if pdf.get_size() < 7050000:
    pdf.add_padding(7050000 – pdf.get_size())

    View Slide

  105. Conclusion

    View Slide

  106. Hopeful Conclusions
    Domain Knowledge is not Dead
    • Classifiers trained without understanding vulnerable
    • Adversaries can exploit unnecessary features
    Trust Requires Understanding
    • Good results against test data do not apply to
    adaptive adversaries
    but there is hope for building robust ML models!

    View Slide

  107. Credits
    Funding: National Science Foundation, Air Force Office of Scientific Research, Google, Microsoft, Amazon
    Weilin Xu Security Research Group
    Yanjun Qi

    View Slide

  108. David Evans
    University of Virginia
    [email protected]
    EvadeML.org
    source code, papers

    View Slide