Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Can Machine Learning Ever Be Trustworthy?

David Evans
December 07, 2018

Can Machine Learning Ever Be Trustworthy?

University of Maryland
Booz Allen Hamilton Distinguished Colloquium
7 December 2018

https://evademl.org

David Evans

December 07, 2018
Tweet

More Decks by David Evans

Other Decks in Research

Transcript

  1. Can Machine
    Learning
    Ever Be
    Trustworthy?
    David Evans
    University of Virginia
    evadeML.org
    7 December 2018
    University of Maryland

    View Slide

  2. 1
    No!

    View Slide

  3. 2
    Its too late!

    View Slide

  4. 3
    “Unfortunately, our translation systems made an error last week
    that misinterpreted what this individual posted. Even though
    our translations are getting better each day, mistakes like these
    might happen from time to time and we’ve taken steps to
    address this particular issue. We apologize to him and his family
    for the mistake and the disruption this caused.”

    View Slide

  5. 4

    View Slide

  6. Amazon Employment
    5

    View Slide

  7. Amazon Employment
    6

    View Slide

  8. Risks from Artificial Intelligence
    7
    Benign developers and operators
    AI out of control
    AI inadvertently causes harm
    Malicious operators
    Build AI to do harm
    Malicious abuse of benign AI
    On Robots
    Joe Berger and Pascal Wyse
    (The Guardian, 21 July 2018)

    View Slide

  9. Risks from Artificial Intelligence
    Benign developers and operators
    AI out of control
    AI inadvertently causes harm
    Malicious operators
    Build AI to do harm
    Malicious abuse of benign AI systems
    8

    View Slide

  10. Crash Course in
    Artificial Intelligence
    and Machine Learning
    9

    View Slide

  11. Labelled
    Training Data
    ML
    Algorithm
    Feature
    Extraction
    Vectors
    Deployment
    Malicious / Benign
    Operational Data
    Trained Classifier
    Training
    (supervised learning)
    Statistical Machine Learning

    View Slide

  12. Labelled
    Training Data
    ML
    Algorithm
    Feature
    Extraction
    Vectors
    Deployment
    Malicious / Benign
    Operational Data
    Trained Classifier
    Training
    (supervised learning)
    Assumption: Training Data is Representative

    View Slide

  13. Deployment
    Adversaries Don’t Cooperate
    Assumption: Training Data is Representative
    Training
    Poisoning

    View Slide

  14. Adversaries Don’t Cooperate
    Assumption: Training Data is Representative
    Evading
    Deployment
    Training

    View Slide

  15. 14

    View Slide

  16. More Ambition
    15
    “The human race will have a new
    kind of instrument which will
    increase the power of the mind
    much more than optical lenses
    strengthen the eyes and which
    will be as far superior to
    microscopes or telescopes as
    reason is superior to sight.”

    View Slide

  17. More Ambition
    16
    “The human race will have a new
    kind of instrument which will
    increase the power of the mind
    much more than optical lenses
    strengthen the eyes and which
    will be as far superior to
    microscopes or telescopes as
    reason is superior to sight.”
    Gottfried Wilhelm Leibniz (1679)

    View Slide

  18. 17
    Gottfried Wilhelm Leibniz (Universitat Altdorf, 1666) who advised:
    Jacob Bernoulli (Universitdt Basel, 1684) who advised:
    Johann Bernoulli (Universitdt Basel, 1694) who advised:
    Leonhard Euler (Universitat Basel, 1726) who advised:
    Joseph Louis Lagrange who advised:
    Simeon Denis Poisson who advised:
    Michel Chasles (Ecole Polytechnique, 1814) who advised:
    H. A. (Hubert Anson) Newton (Yale, 1850) who advised:
    E. H. Moore (Yale, 1885) who advised:
    Oswald Veblen (U. of Chicago, 1903) who advised:
    Philip Franklin (Princeton 1921) who advised:
    Alan Perlis (MIT Math PhD 1950) who advised:
    Jerry Feldman (CMU Math 1966) who advised:
    Jim Horning (Stanford CS PhD 1969) who advised:
    John Guttag (U. of Toronto CS PhD 1975) who advised:
    David Evans (MIT CS PhD 2000)
    my academic great-
    great-great-great-
    great-great-great-
    great-great-great-
    great-great-great-
    great-great-
    grandparent!

    View Slide

  19. More Precision
    18
    “The human race will have a new
    kind of instrument which will
    increase the power of the mind
    much more than optical lenses
    strengthen the eyes and which
    will be as far superior to
    microscopes or telescopes as
    reason is superior to sight.”
    Gottfried Wilhelm Leibniz (1679)
    Normal computing amplifies
    (quadrillions of times faster)
    and aggregates (enables
    millions of humans to work
    together) human cognitive
    abilities; AI goes beyond
    what humans can do.

    View Slide

  20. Operational Definition
    “Artificial Intelligence”
    means making
    computers do things
    their programmers
    don’t understand well
    enough to program
    explicitly.
    19
    If it is
    explainable,
    its not ML!

    View Slide

  21. Inherent Paradox of “Trustworthy” ML
    20
    “Artificial Intelligence”
    means making
    computers do things
    their programmers
    don’t understand well
    enough to program
    explicitly.
    If we could specify
    precisely what the model
    should do, we wouldn’t
    need ML to do it!

    View Slide

  22. Inherent Paradox of “Trustworthy” ML
    21
    If we could specify
    precisely what the model
    should do, we wouldn’t
    need ML to do it!
    Best we hope for is verifying certain properties
    M1
    M2
    ∀": $%
    " = $'
    (")
    DeepXplore: Automated Whitebox
    Testing of Deep Learning Systems.
    Kexin Pei, Yinzhi Cao, Junfeng
    Yang, Suman Jana. SOSP 2017
    Model Similarity

    View Slide

  23. Inherent Paradox of “Trustworthy” ML
    22
    Best we hope for is verifying certain properties
    M
    1
    M
    2
    ∀" ∈ $: &'
    " ≈ &)
    (")
    DeepXplore: Automated Whitebox
    Testing of Deep Learning Systems.
    Kexin Pei, Yinzhi Cao, Junfeng
    Yang, Suman Jana. SOSP 2017
    Model Similarity
    M
    ∀" ∈ $, ∀∆ ∈ .: & " ≈ &(" + ∆)
    " " + ∆
    Model Robustness
    0
    M
    0∗

    View Slide

  24. Third Strategy: Specify Containing System
    23
    Somesh Jha’s
    talk (Oct 26)

    View Slide

  25. Adversarial Robustness
    24
    M
    ∀" ∈ $, ∀∆ ∈ ': ) " ≈ )(" + ∆)
    " " + ∆
    .
    M
    .∗
    Adversary’s Goal:
    find a “small” perturbation that changes
    model output
    targeted attack: in some desired way
    Defender’s Goal:
    Robust Model: find model where this is hard
    Detection: detect inputs that are adversarial

    View Slide

  26. Not a new problem...
    25
    Or do you think any Greek
    gift’s free of treachery? Is that
    Ulysses’s reputation? Either
    there are Greeks in hiding,
    concealed by the wood, or it’s
    been built as a machine to use
    against our walls, or spy on
    our homes, or fall on the city
    from above, or it hides some
    other trick: Trojans, don’t trust
    this horse. Whatever it is, I’m
    afraid of Greeks even those
    bearing gifts.’
    Virgil, The Aenid (Book II)

    View Slide

  27. Adversarial Examples for DNNs
    26
    0.007 × [&'()*]
    + =
    “panda” “gibbon”
    Example from: Ian J. Goodfellow, Jonathon Shlens, Christian Szegedy.
    Explaining and Harnessing Adversarial Examples. 2014 (in ICLR 2015)

    View Slide

  28. Impact of Adversarial Perturbations
    27
    Distance between layer output and its output for original seed
    FGSM
    ! = 0.0245
    CIFAR-10
    DenseNet
    95th percentile
    5th percentile

    View Slide

  29. Impact of Adversarial Perturbations
    28
    Distance between layer output and its output for original seed
    Random noise
    (same amount)
    FGSM
    ! = 0.0245
    CIFAR-10
    DenseNet

    View Slide

  30. Impact of Adversarial Perturbations
    29
    Distance between layer output and its output for original seed
    Random noise
    (same amount)
    Carlini-
    Wagner L2
    CIFAR-10
    DenseNet

    View Slide

  31. 0
    200
    400
    600
    800
    1000
    1200
    1400
    1600
    1800
    2018
    2017
    2016
    2015
    2014
    2013
    30
    Papers on “Adversarial Examples”
    (Google Scholar)
    1826.68 papers
    expected in 2018!

    View Slide

  32. 0
    200
    400
    600
    800
    1000
    1200
    1400
    1600
    1800
    2018
    2017
    2016
    2015
    2014
    2013
    31
    Papers on “Adversarial Examples”
    (Google Scholar)
    1826.68 papers
    expected in 2018!

    View Slide

  33. 0
    200
    400
    600
    800
    1000
    1200
    1400
    1600
    1800
    2018
    2017
    2016
    2015
    2014
    2013
    32
    Emergence of “Theory”
    ICML Workshop 2015 15% of 2018
    “adversarial examples”
    papers contain
    “theorem” and “proof”

    View Slide

  34. Adversarial Example
    33
    Prediction Change Definition:
    An input, !′ ∈ $, is an adversarial example for ! ∈ $, iff
    ∃!& ∈ Ball*
    (!) such that - ! ≠ - !& .

    View Slide

  35. Adversarial Example
    34
    Ball$
    (&) is some space around &, typically defined in
    some (simple!) metric space:
    ()
    norm (# different), (*
    norm (“Euclidean distance”), (+
    Without constraints on Ball$
    , every input has
    adversarial examples.
    Prediction Change Definition:
    An input, &′ ∈ /, is an adversarial example for & ∈ /, iff
    ∃&1 ∈ Ball$
    (&) such that 2 & ≠ 2 &1 .

    View Slide

  36. Adversarial Example
    35
    Any non-trivial model has
    adversarial examples:
    ∃"#
    , "%
    ∈ '. ) "#
    ≠ )("%
    )
    Prediction Change Definition:
    An input, -′ ∈ ', is an adversarial example for - ∈ ', iff
    ∃-/ ∈ Ball3
    (-) such that ) - ≠ ) -/ .

    View Slide

  37. Prediction Error Robustness
    36
    Error Robustness:
    An input, !′ ∈ $, is an adversarial example for (correct) ! ∈ $, iff
    ∃!& ∈ Ball*
    (!) such that - !′ ≠ true label for !′.
    Perfect classifier has no
    (error robustness)
    adversarial examples.

    View Slide

  38. Prediction Error Robustness
    37
    Error Robustness:
    An input, !′ ∈ $, is an adversarial example for (correct) ! ∈ $, iff
    ∃!& ∈ Ball*
    (!) such that - !′ ≠ true label for !′.
    Perfect classifier has no
    (error robustness)
    adversarial examples.
    If we have a way to know
    this, don’t need an ML
    classifier.

    View Slide

  39. Global Robustness Properties
    38
    Adversarial Risk: probability an input has an adversarial example
    Pr
    # ← %
    [∃ () ∈ +,--.
    ( . 0 () ≠ class (′ ]
    Dimitrios I. Diochnos, Saeed Mahloujifar,
    Mohammad Mahmoody, NeurIPS 2018

    View Slide

  40. Global Robustness Properties
    39
    Dimitrios I. Diochnos, Saeed Mahloujifar,
    Mohammad Mahmoody, NeurIPS 2018
    Adversarial Risk: probability an input has an adversarial example
    Pr
    # ← %
    [∃ () ∈ +,--.
    ( . 0 () ≠ class (′ ]
    Error Region Robustness: expected distance to closest AE:
    8
    # ← %
    [inf { =: ∃ () ∈ +,--.
    ( . 0 () ≠ class () }]

    View Slide

  41. Assumption Key Result
    Adversarial Spheres
    [Gilmer et al., 2018]
    Uniform distribution on two
    concentric !-spheres
    Expected safe distance ("#
    -norm)
    is relatively small.
    Adversarial vulnerability
    for any classifier
    [Fawzi × 3, 2018]
    Smooth generative model:
    1. Gaussian in latent space.
    2. Generator is L-Lipschitz.
    Adversarial risk ⟶ 1 for relatively
    small attack strength ("#
    -norm).
    Curse of Concentration in
    Robust Learning
    [Mahloujifar et al., 2018]
    Normal Lévy families
    • Unit sphere, uniform, "#
    norm
    • Boolean hypercube, uniform,
    Hamming distance
    ...
    If attack strength exceeds a
    relatively small threshold,
    adversarial risk > 1/2.
    b >
    p
    log(k1/")
    p
    k2
    · n
    ! Riskb(h, c) 1/2
    Recent Global Robustness Results
    P(r(x)  ⌘) 1
    r

    2
    e ⌘2/2L2
    Properties of any model for input space:
    distance to AE is small relative to expected distance between two sampled points

    View Slide

  42. Prediction Change Robustness
    41
    Prediction Change:
    An input, !′ ∈ $, is an adversarial example for ! ∈ $, iff
    ∃!& ∈ Ball*
    (!) such that - !′ ≠ - ! .
    Any non-trivial model has
    adversarial examples:
    ∃!0
    , !2
    ∈ $. - !0
    ≠ -(!2
    )
    Solutions:
    - only consider particular inputs (“good” seeds)
    - output isn’t just class (e.g., confidence)
    - targeted adversarial examples
    cost-sensitive adversarial robustness

    View Slide

  43. Local (Instance) Robustness
    42
    Robust Region: For an input !, the robust region is the
    maximum region with no adversarial example:
    sup % > 0 ∀)* ∈ Ball/
    ) , 1 )* = 1 ) }

    View Slide

  44. Local (Instance) Robustness
    43
    Robust Region: For an input !, the robust region is the
    maximum region with no adversarial example:
    sup % > 0 ∀)* ∈ Ball/
    ) , 1 )* = 1 ) }
    Robust Error: For a test set, 4, and bound, %5
    :
    | ) ∈ 4, RobustRegion ) < %5
    }
    | 4|

    View Slide

  45. 44
    Scalability
    Formal Verification
    MILP solver (MIPVerify)
    SMT solver (Reluplex)
    Interval analysis (Reluval)
    robust error
    Heuristic Defenses
    distillation (Papernot et al., 2016)
    gradient obfuscation
    adversarial retraining (Madry et al., 2017)
    attack success rate
    (set of attacks)
    Certified Robustness
    CNN-Cert (Boopathy et al., 2018)
    Dual-LP (Kolter & Wong 2018)
    Dual-SDP (Raghunathan et al., 2018)
    bound
    Evaluation Metric
    precise
    feature squeezing

    View Slide

  46. 45
    Theory “Practice” Reality
    Distributional assumptions Toy, arbitrary datasets Malware, Fake News, ...
    Classification Problems
    Adversarial Strength
    !"
    norm bound !#
    bound application specific
    Fake

    View Slide

  47. Example: PDF Malware

    View Slide

  48. Finding Evasive Malware
    47
    Given seed sample, !, with desired malicious behavior
    find an adversarial example !" that satisfies:
    # !" = “&'()*(” Model misclassifies
    ℬ !′) = ℬ(! Malicious behavior preserved
    Generic attack: heuristically explore input
    space for !′ that satisfies definition.
    No requirement that ! ~ !′ except through ℬ.

    View Slide

  49. PDF Malware Classifiers
    Random Forest Random Forest
    Support Vector Machine
    Features
    Object counts,
    lengths,
    positions, …
    Object structural paths
    Very robust against “strongest
    conceivable mimicry attack”.
    Automated Features
    Manual Features
    PDFrate
    [ACSA 2012]
    Hidost16
    [JIS 2016]
    Hidost13
    [NDSS 2013]

    View Slide

  50. Variants
    Evolutionary Search
    Clone
    Benign PDFs
    Malicious PDF
    Mutation
    01011001101
    Variants
    Variants
    Select
    Variants




    Found
    Evasive?
    Benign
    Oracle
    Weilin Xu Yanjun Qi
    Fitness
    Selection
    Mutant
    Generation

    View Slide

  51. Variants
    Generating Variants
    Clone
    Benign PDFs
    Malicious PDF
    Mutation
    01011001101
    Variants
    Variants
    Select
    Variants




    Found
    Evasive?
    Fitness
    Selection
    Mutant
    Generation

    View Slide

  52. PDF Structure

    View Slide

  53. Variants
    Generating Variants
    Clone
    Benign PDFs
    Malicious PDF
    Mutation
    01011001101
    Variants
    Variants
    Select
    Variants




    Found
    Evasive?
    Fitness
    Selection
    Mutant
    Generation

    View Slide

  54. Variants
    Generating Variants
    Clone
    Benign PDFs
    Malicious PDF
    Mutation
    01011001101
    Variants
    Variants
    Select
    Variants




    Found
    Evasive?
    Found
    Evasive
    ?
    0
    /JavaScript
    eval(‘…’);
    /Root
    /Catalog
    /Pages
    Select random node
    Randomly transform: delete, insert, replace

    View Slide

  55. Variants
    Generating Variants
    Clone
    Benign PDFs
    Malicious PDF
    Mutation
    01011001101
    Variants
    Variants
    Select
    Variants
    Found
    Evasive?
    Found
    Evasive
    ?
    Select random node
    Randomly transform: delete, insert, replace
    Nodes from
    Benign PDFs
    0
    /JavaScript
    eval(‘…’);
    /Root
    /Catalog
    /Pages
    128
    546
    7
    63
    128

    View Slide

  56. Variants
    Selecting Promising Variants
    Clone
    Benign PDFs
    Malicious PDF
    Mutation
    01011001101
    Variants
    Variants
    Select
    Variants




    Found
    Evasive?
    Fitness
    Selection
    Mutant
    Generation

    View Slide

  57. Variants
    Selecting Promising Variants
    Clone
    Benign PDFs
    Malicious PDF
    Mutation
    01011001101
    Variants
    Variants
    Select
    Variants




    Found
    Evasive?
    Fitness Function
    Candidate Variant
    !(#$%&'()
    , #'(&++
    )
    Score
    Malicious
    0
    /JavaScript
    eval(‘…’);
    /Root
    /Catalog
    /Pages
    128
    Oracle
    Target Classifier

    View Slide

  58. Oracle: ℬ "′) = ℬ(" ?
    Execute candidate in
    vulnerable Adobe Reader in
    virtual environment
    Behavioral signature:
    malicious if signature matches
    https://github.com/cuckoosandbox
    Simulated network: INetSim
    Cuckoo
    HTTP_URL + HOST
    extracted from API traces

    View Slide

  59. Fitness Function
    Assumes lost malicious behavior will not be
    recovered
    !itness '′ = *
    1 − classi!ier_score '3 if ℬ '′) = ℬ('
    −∞ otherwise

    View Slide

  60. 0
    100
    200
    300
    400
    500
    0 100 200 300
    Seeds Evaded
    (out of 500)
    PDFRate
    Number of Mutations
    Hidost

    View Slide

  61. 0
    100
    200
    300
    400
    500
    0 100 200 300
    Seeds Evaded
    (out of 500)
    PDFRate
    Number of Mutations
    Hidost
    Simple
    transformations
    often worked

    View Slide

  62. 0
    100
    200
    300
    400
    500
    0 100 200 300
    Seeds Evaded
    (out of 500)
    PDFRate
    Number of Mutations
    Hidost
    (insert, /Root/Pages/Kids,
    3:/Root/Pages/Kids/4/Kids/5/)
    Works on 162/500 seeds

    View Slide

  63. 0
    100
    200
    300
    400
    500
    0 100 200 300
    Seeds Evaded
    (out of 500)
    PDFRate
    Number of Mutations
    Hidost
    Some seeds
    required complex
    transformations

    View Slide

  64. Malicious Label
    Threshold
    Original Malicious Seeds
    Evading
    PDFrate
    Classification Score
    Malware Seed (sorted by original score)
    Discovered Evasive Variants

    View Slide

  65. Discovered Evasive Variants
    Malicious Label
    Threshold
    Original Malicious Seeds
    Adjust threshold?
    Charles Smutz, Angelos
    Stavrou. When a Tree Falls:
    Using Diversity in Ensemble
    Classifiers to Identify
    Evasion in Malware
    Detectors. NDSS 2016.
    Classification Score
    Malware Seed (sorted by original score)

    View Slide

  66. Variants found with threshold = 0.25
    Variants found with threshold = 0.50
    Adjust threshold?
    Classification Score
    Malware Seed (sorted by original score)

    View Slide

  67. Variants
    Hide the Classifier Score?
    Clone
    Benign PDFs
    Malicious PDF
    Mutation
    01011001101
    Variants
    Variants
    Select
    Variants




    Found
    Evasive?
    Fitness Function
    Candidate Variant
    !(#$%&'()
    , #'(&++
    )
    Score
    Malicious
    0
    /JavaScript
    eval(‘…’);
    /Root
    /Catalog
    /Pages
    128
    Oracle
    Target Classifier

    View Slide

  68. Variants
    Binary Classifier Output is Enough
    Clone
    Benign PDFs
    Malicious PDF
    Mutation
    01011001101
    Variants
    Variants
    Select
    Variants




    Found
    Evasive?
    Fitness Function
    Candidate Variant
    !(#$%&'()
    , #'(&++
    )
    Score
    Malicious
    0
    /JavaScript
    eval(‘…’);
    /Root
    /Catalog
    /Pages
    128
    Oracle
    Target Classifier
    ACM CCS 2017

    View Slide

  69. Labelled
    Training Data
    ML
    Algorithm
    Feature
    Extraction
    Vectors
    Deployment
    Malicious / Benign
    Operational Data
    Trained Classifier
    Training
    (supervised learning)
    Retrain Classifier

    View Slide

  70. Labelled
    Training Data
    ML
    Algorithm
    Feature
    Extraction
    Vectors
    Training
    (supervised learning)
    Clone
    01011001
    101
    EvadeML
    Deployment

    View Slide

  71. 0
    100
    200
    300
    400
    500
    0 200 400 600 800
    Seeds Evaded (out of 500)
    Generations
    Hidost16
    Original classifier:
    Takes 614 generations
    to evade all seeds

    View Slide

  72. 0
    100
    200
    300
    400
    500
    0 200 400 600 800
    HidostR1
    Seeds Evaded (out of 500)
    Generations
    Hidost16

    View Slide

  73. 0
    100
    200
    300
    400
    500
    0 200 400 600 800
    HidostR1
    Seeds Evaded (out of 500)
    Generations
    Hidost16

    View Slide

  74. 0
    100
    200
    300
    400
    500
    0 200 400 600 800
    HidostR1
    HidostR2
    Seeds Evaded (out of 500)
    Generations
    Hidost16

    View Slide

  75. 0
    100
    200
    300
    400
    500
    0 200 400 600 800
    HidostR1
    HidostR2
    Seeds Evaded (out of 500)
    Generations
    Hidost16

    View Slide

  76. 0
    100
    200
    300
    400
    500
    0 200 400 600 800
    Hidost16
    Genome Contagio Benign
    Hidost16 0.00 0.00
    HidostR1 0.78 0.30
    HidostR2 0.85 0.53
    False Positive Rates
    HidostR1
    Seeds Evaded (out of 500)
    Generations
    HidostR2

    View Slide

  77. 76
    Only 8/6987 robust features (Hidost)
    Robust classifier
    High false positives
    /Names
    /Names /JavaScript
    /Names /JavaScript /Names
    /Names /JavaScript /JS
    /OpenAction
    /OpenAction /JS
    /OpenAction /S
    /Pages

    View Slide

  78. Malware Classification Moral
    To build robust, effective malware
    classifiers need robust features that are
    strong signals for malware.
    77
    If you have features like this – don’t need ML!

    View Slide

  79. 78
    Theory “Practice” “Reality”
    Distributional assumptions Toy, arbitrary datasets Malware, Fake News, ...
    Classification Problems
    Adversarial Strength
    !"
    norm bound !#
    bound application specific
    Fake

    View Slide

  80. Adversarial Examples across Domains
    79
    Domain Classifier Space “Reality” Space
    Trojan Wars
    Judgment of Trojans
    !(#) = “gift”
    Physical Reality
    !∗(#) = invading army
    Malware
    Malware Detector
    !(#) = “benign”
    Victim’s Execution
    !∗(#) = malicious behavior
    Image
    Classification
    DNN Classifier
    !(#) = )
    Human Perception
    !∗(#) = *
    Next Done Not DL

    View Slide

  81. Adversarial Example
    80
    Prediction Change Definition:
    An input, !′ ∈ $, is an adversarial example for ! ∈ $, iff
    ∃!& ∈ Ball
    '
    (!) such that * ! ≠ * !& .
    Suggested Defense: given an
    input !∗, see how the model
    behaves on .(!∗) where .(/)
    reverses transformations in
    ∆-space.

    View Slide

  82. 81
    Scalability
    Formal Verification
    MILP solver (MIPVerify)
    SMT solver (Reluplex)
    Interval analysis (Reluval)
    robust error
    Heuristic Defenses
    distillation (Papernot et al., 2016)
    gradient obfuscation
    adversarial retraining (Madry et al., 2017)
    attack success rate
    (set of attacks)
    Certified Robustness
    CNN-Cert (Boopathy et al., 2018)
    Dual-LP (Kolter & Wong 2018)
    Dual-SDP (Raghunathan et al., 2018)
    bound
    Evaluation Metric
    precise
    feature squeezing

    View Slide

  83. Model
    Model
    Squeezer
    1
    Prediction0
    Prediction1
    "($%&'(
    , $%&'*
    , … , $%&',
    )
    Input
    Adversarial
    Legitimate
    Model’
    Squeezer
    k

    Predictionk
    Feature Squeezing Detection Framework
    Weilin Xu Yanjun Qi

    View Slide

  84. Model
    Model
    Squeezer
    1
    Prediction0
    Prediction1
    "($%&'(
    , $%&'*
    , … , $%&',
    )
    Input
    Adversarial
    Legitimate
    Model’
    Squeezer
    k

    Predictionk
    Feature Squeezing Detection Framework
    Feature Squeezer coalesces similar inputs into one point:
    • Barely change legitimate inputs.
    • Destruct adversarial perturbations.

    View Slide

  85. Coalescing by Feature Squeezing
    84
    Metric Space 1: Target Classifier Metric Space 2: “Oracle”
    Before: find a small perturbation that changes class for classifier, but imperceptible to oracle.
    Now: change class for both original and squeezed classifier, but imperceptible to oracle.

    View Slide

  86. Example Squeezer: Bit Depth Reduction
    0
    0.1
    0.2
    0.3
    0.4
    0.5
    0.6
    0.7
    0.8
    0.9
    1
    0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
    3-bit
    1-bit
    8-bit
    Input
    Output
    85
    Signal Quantization

    View Slide

  87. Example Squeezer: Bit Depth Reduction
    0
    0.1
    0.2
    0.3
    0.4
    0.5
    0.6
    0.7
    0.8
    0.9
    1
    0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
    3-bit
    1-bit
    8-bit
    Input
    Output
    86
    Signal Quantization
    Seed
    1 1 4 2 2
    1 1 1 1 1
    CW
    2
    CW

    BIM
    FGSM

    View Slide

  88. Other Potential Squeezers
    87
    C Xie, et al. Mitigating Adversarial Effects Through Randomization, ICLR 2018.
    J Buckman, et al. Thermometer Encoding: One Hot Way To Resist Adversarial
    Examples, ICLR 2018.
    D Meng and H Chen, MagNet: a Two-Pronged Defense against Adversarial
    Examples, CCS 2017; A Prakash, et al., Deflecting Adversarial Attacks with Pixel
    Deflection, CVPR 2018;...
    Thermometer Encoding (learnable bit depth reduction)
    Image denoising using autoencoder, wavelet, JPEG, etc.
    Image resizing
    ...
    Spatial Smoothers: median filter, non-local means

    View Slide

  89. Other Potential Squeezers
    88
    C Xie, et al. Mitigating Adversarial Effects Through Randomization, ICLR 2018.
    J Buckman, et al. Thermometer Encoding: One Hot Way To Resist Adversarial
    Examples, ICLR 2018.
    D Meng and H Chen, MagNet: a Two-Pronged Defense against Adversarial
    Examples, CCS 2017; A Prakash, et al., Deflecting Adversarial Attacks with Pixel
    Deflection, CVPR 2018;...
    Thermometer Encoding (learnable bit depth reduction)
    Image denoising using autoencoder, wavelet, JPEG, etc.
    Image resizing
    ...
    Spatial Smoothers: median filter, non-local means
    Anish Athalye, Nicholas Carlini, David Wagner.
    Obfuscated Gradients Give a False Sense of
    Security: Circumventing Defenses to
    Adversarial Examples. ICML 2018.

    View Slide

  90. “Feature Squeezing” (Vacuous) Conjecture
    For any distance-limited adversarial method,
    there exists some feature squeezer that
    accurately detects its adversarial examples.
    89
    Intuition: if the perturbation is small (in some simple
    metric space), there is some squeezer that coalesces
    original and adversarial example into same sample.

    View Slide

  91. Feature Squeezing Detection
    Model
    (7-layer
    CNN)
    Model
    Model
    Bit Depth-
    1
    Median
    2×2
    Prediction0
    Prediction1
    Prediction2
    Yes
    Input
    Adversarial
    No
    Legitimate
    max '(
    )*
    , )(
    , '(
    )*
    , )2 > -

    View Slide

  92. Detecting Adversarial Examples
    Distance between original input and its squeezed version
    Adversarial
    inputs
    (CW attack)
    Legitimate
    inputs

    View Slide

  93. 92
    0
    200
    400
    600
    800
    0.0 0.4 0.8 1.2 1.6 2.0
    Number of Examples
    Legitimate
    Adversarial
    Maximum !"
    distance between original and squeezed input
    threshold = 0.0029
    detection: 98.2%, FP < 4%
    Training a detector
    (MNIST)
    set the detection threshold to keep
    false positive rate below target

    View Slide

  94. ImageNet Configuration
    Model
    (MobileNet)
    Model
    Model
    Bit Depth-
    5
    Median
    2×2
    Prediction0
    Prediction1
    Prediction2
    Yes
    Input
    Adversarial
    No
    Legitimate
    max(()
    (*+
    , {*)
    , *.
    , */
    }) > 3
    Model
    Non-local
    Mean
    Prediction3

    View Slide

  95. 94
    0
    20
    40
    60
    80
    100
    120
    140
    0.0 0.4 0.8 1.2 1.6 2.0
    Legitimate
    Adversarial
    Maximum !"
    distance between original and squeezed input
    threshold = 1.24
    detection: 85%, FP < 5%
    Training a detector
    (ImageNet)

    View Slide

  96. What about better
    adversaries?
    95

    View Slide

  97. Instance Defense-Robustness
    96
    For an input !, the robust-defended region is the maximum
    region with no undetected adversarial example:
    sup % > 0 ∀)* ∈ Ball/
    ) , 1 )* = 1 ) ⋁ 45657654(!*)}
    Defense Failure: For a test set, ;, and bound, %<
    :
    | ) ∈ ;, RobustDefendedRegion ) < %<
    }
    | ;|
    Can we verify a defense?

    View Slide

  98. Formal Verification of Defense Instance
    exhaustively test all inputs in ∀"# ∈ Ball(
    "
    for correctness or detection
    Need to transform model into a
    function amenable to verification

    View Slide

  99. Linear Programming
    !""
    #"
    + !"%
    #%
    + ⋯ ≤ ("
    !%"
    #"
    + !%%
    #%
    + ⋯ ≤ (%
    #)
    ≤ 0
    ...
    Find values of + that minimize linear
    function under constraints:
    ,"
    #"
    + ,%
    #%
    + ,-
    #-
    + …

    View Slide

  100. Encoding a Neural Network
    Linear Components (! = #$ + &)
    Convolutional Layer
    Fully-connected Layer
    Batch Normalization (in test mode)
    Non-linear
    Activation (ReLU, Sigmoid, Softmax)
    Pooling Layer (max, avg)
    99

    View Slide

  101. Encode ReLU
    Mixed Integer Linear Programming
    adds discrete values to LP
    ReLU
    (Rectified Linear Unit )
    ! = max(0, ))
    + ∈ 0, 1
    ! ≥ )
    ! ≥ 0
    ! ≤ ) − 1 1 − +
    ! ≤ 2+
    1 2
    Piecewise Linear

    View Slide

  102. Mixed Integer Linear Programming (MILP)
    Intractable in theory (NP-Complete)
    Efficient in practice
    (e.g., Gurobi solver)
    MIPVerify
    Vincent Tjeng, Kai Xiao, Russ Tedrake
    Verify NNs using MILP

    View Slide

  103. Encode Feature Squeezers
    Binary Filter
    0.5 1
    0
    Actual Input: uint8 [0, 1, 2, … 254, 255]
    127 / 255 = 0.498
    128 / 255 = 0.502
    An infeasible gap [0.499, 0.501]
    Lower semi-continuous

    View Slide

  104. Verified L ∞
    Robustness
    Model Test Accuracy
    Robust Error
    ε = 0.1
    Robust Error
    with
    Binary Filter
    Raghunathan
    et al.
    95.82% 14.36%-30.81% 7.37%
    Wong & Kolter 98.11% 4.38% 4.25%
    Ours with
    binary filter
    98.94% 2.66-6.63% -
    Even without detection, this helps!

    View Slide

  105. Encode Detection Mechanism
    Original version:
    Simplify for verification:
    !"
    ⟶ maximum difference
    softmax ⟶ multiple piecewise-linear
    approximate sigmoid
    score(*) = - * − -(squeeze * ) "
    where f(x) is softmax output

    View Slide

  106. Preliminary Experiments
    105
    Model
    (4-layer
    CNN)
    Model
    Bit Depth-1
    Yes
    Input
    !’
    Adversarial
    No
    y1
    valid
    max_diff +,
    , +.
    > 0
    Verification: for a
    seed !, there is no
    adversarial input
    !1 ∈ Ball5
    ! for
    which +.
    ≠ 7 !
    and not detected
    Adversarially robust retrained [Wong & Kolter] model
    1000 test MNIST seeds, 8 = 0.1 (=>
    )
    970 infeasible (verified no adversarial example)
    13 misclassified (original seed)
    17 vulnerable
    Robust error: 0.3%
    Verification time ~0.2s
    (compared to 0.8s without binarization)

    View Slide

  107. 106
    Scalability
    Formal Verification
    MILP solver (MIPVerify)
    SMT solver (Reluplex)
    Interval analysis (Reluval)
    robust error
    Heuristic Defenses
    distillation (Papernot et al., 2016)
    gradient obfuscation
    adversarial retraining (Madry et al., 2017)
    attack success rate
    (set of attacks)
    Certified Robustness
    CNN-Cert (Boopathy et al., 2018)
    Dual-LP (Kolter & Wong 2018)
    Dual-SDP (Raghunathan et al., 2018)
    bound
    Evaluation Metric
    precise
    feature squeezing

    View Slide

  108. 107
    target class
    Original
    Model
    (no robustness
    training)
    seed class
    target class
    MNIST Model
    2 convolutional layers
    2 fully-connected layers
    (100, 10 units)
    ! = 0.2, '(

    View Slide

  109. 108
    target class
    Original
    Model
    (no robustness
    training)
    seed class
    target class
    MNIST Model
    2 convolutional layers
    2 fully-connected layers
    (100, 10 units)
    ! = 0.2, '(

    View Slide

  110. Training a Robust Network
    Eric Wong and J. Zico Kolter. Provable defenses against adversarial
    examples via the convex outer adversarial polytope. ICML 2018.
    replace loss with
    differentiable function
    based on outer bound
    using dual network
    ReLU
    (Rectified Linear Unit ) linear approximation
    ! "

    View Slide

  111. 110
    seed class
    target class
    Standard
    Robustness
    Training
    (overall
    robustness goal)
    MNIST Model
    2 convolutional layers
    2 fully-connected layers
    (100, 10 units)
    ! = 0.2, '(

    View Slide

  112. Cost-Sensitive Robustness Training
    111
    Xiao Zhang
    Cost-matrix: cost of different adversarial transformations
    ! =
    − 0
    1 −
    benign malware
    benign
    malware
    Incorporate a cost-matrix into robustness training

    View Slide

  113. 112
    seed class
    target class
    Standard
    Robustness
    Training
    (overall
    robustness goal)
    MNIST Model
    2 convolutional layers
    2 fully-connected layers
    (100, 10 units)
    ! = 0.2, '(

    View Slide

  114. 113
    seed class
    target class
    Cost-
    Sensitive
    Robustness
    Training
    Protect odd
    classes from
    evasion

    View Slide

  115. 114
    seed class
    target class
    Cost-
    Sensitive
    Robustness
    Training
    Protect even
    classes from
    evasion

    View Slide

  116. History of the
    destruction
    of Troy, 1498
    Conclusion

    View Slide

  117. Security State-of-the-Art
    Attack success
    probability
    Threat models Proofs
    Cryptography !"#!$
    information theoretic,
    resource bounded
    required
    System Security !"%!
    capabilities,
    motivations, rationality
    common
    Adversarial
    Machine Learning
    !&; !"#*
    artificially limited
    adversary
    making
    progress!
    116

    View Slide

  118. Security State-of-the-Art
    Attack success
    probability
    Threat models Proofs
    Cryptography !"#!$
    information theoretic,
    resource bounded
    required
    System Security !"%!
    capabilities,
    motivations, rationality
    common
    Adversarial
    Machine Learning
    !&; !"#*
    artificially limited
    adversary
    making
    progress!
    117
    Huge gaps to close:
    threat models are unrealistic (but real threats unclear)
    verification techniques only work for tiny models
    experimental defenses often (quickly) broken

    View Slide

  119. David Evans
    University of Virginia
    [email protected]
    EvadeML.org
    Weilin Xu Yanjun Qi
    Funding: NSF, Intel, Baidu
    Xiao Zhang Center for Trustworthy Machine Learning

    View Slide

  120. David Evans
    University of Virginia
    [email protected]
    EvadeML.org

    View Slide