Upgrade to Pro — share decks privately, control downloads, hide ads and more …

FOSAD Trustworthy Machine Learning: Class 1

David Evans
August 26, 2019

FOSAD Trustworthy Machine Learning: Class 1

19th International School on Foundations of Security Analysis and Design
Mini-course on "Trustworthy Machine Learning"
https://jeffersonswheel.org/fosad2019
David Evans

Class 1: Introduction/Attacks

David Evans

August 26, 2019
Tweet

More Decks by David Evans

Other Decks in Education

Transcript

  1. Trustworthy
    Machine
    Learning
    David Evans
    University of Virginia
    jeffersonswheel.org
    Bertinoro, Italy
    26 August 2019
    19th International School on Foundations of Security Analysis and Design
    1: Introduction/Attacks

    View full-size slide

  2. Plan for the Course
    Monday (Today)
    Introduction
    ML Background
    Attacks
    Tuesday (Tomorrow)
    Defenses
    Wednesday
    Privacy, Fairness, Abuse
    1
    Overall Goals:
    broad and whirlwind survey* of an
    exciting emerging research area
    explain a few of my favorite research
    results in enough detail to understand
    them at a high-level
    introduce some open problems that I
    hope you will work on and solve
    * but highly biased by my own interests

    View full-size slide

  3. 2
    Why should
    we care
    about
    Trustworthy
    Machine
    Learning?

    View full-size slide

  4. 3
    “Unfortunately, our translation systems made an error last week
    that misinterpreted what this individual posted. Even though
    our translations are getting better each day, mistakes like these
    might happen from time to time and we’ve taken steps to
    address this particular issue. We apologize to him and his family
    for the mistake and the disruption this caused.”

    View full-size slide

  5. Amazon Employment
    5

    View full-size slide

  6. Risks from Artificial Intelligence
    6
    Benign developers and operators
    AI out of control
    AI inadvertently causes harm
    Malicious operators
    Build AI to do harm
    Malicious abuse of benign AI
    On Robots
    Joe Berger and Pascal Wyse
    (The Guardian, 21 July 2018)

    View full-size slide

  7. Harmful AI
    Benign developers and operators
    AI out of control
    AI causes harm (without creators objecting)
    Malicious operators
    Build AI to do harm
    7

    View full-size slide

  8. Out-of-Control AI
    8
    HAL, 2001: A Space Odyssey SkyNet, The Terminator

    View full-size slide

  9. Alignment Problem
    9
    Bostrom’s Paperclip Maximizer

    View full-size slide

  10. Harmful AI
    Benign developers and operators
    AI out of control
    AI inadvertently causes harm to humanity
    Malicious operators
    Build AI to do harm
    10

    View full-size slide

  11. Lost Jobs and Dignity
    11

    View full-size slide

  12. 12
    On Robots
    Joe Berger and Pascal Wyse
    (The Guardian, 21 July 2018)
    Human Jobs
    of the Future

    View full-size slide

  13. Inadvertent Bias and Discrimination
    13
    3rd lecture

    View full-size slide

  14. Harmful AI
    Benign developers
    AI out of control
    AI causes harm (without creators objecting)
    Malicious developers
    Using AI to do harm
    14
    Malice is (often) in the eye of the beholder
    (e.g., mass surveillance, pop-up ads, etc.)

    View full-size slide

  15. Automated Spear Phishing
    15
    “It’s slightly less effective [than manually generated] but it’s
    dramatically more efficient” (John Seymour)
    More malicious use
    of AI in 3rd lecture?

    View full-size slide

  16. Risks from Artificial Intelligence
    Benign developers and operators
    AI out of control
    AI inadvertently causes harm
    Malicious operators
    Build AI to do harm
    Malicious abuse of benign AI systems
    16
    rest of today and tomorrow

    View full-size slide

  17. Crash Course in
    Machine Learning
    17

    View full-size slide

  18. More Ambition
    19
    “The human race will have a
    new kind of instrument which
    will increase the power of the
    mind much more than optical
    lenses strengthen the eyes
    and which will be as far
    superior to microscopes or
    telescopes as reason is
    superior to sight.”

    View full-size slide

  19. More Ambition
    20
    Gottfried Wilhelm Leibniz (1679)

    View full-size slide

  20. 21
    Gottfried Wilhelm Leibniz (Universitat Altdorf, 1666) who advised:
    Jacob Bernoulli (Universitdt Basel, 1684) who advised:
    Johann Bernoulli (Universitdt Basel, 1694) who advised:
    Leonhard Euler (Universitat Basel, 1726) who advised:
    Joseph Louis Lagrange who advised:
    Simeon Denis Poisson who advised:
    Michel Chasles (Ecole Polytechnique, 1814) who advised:
    H. A. (Hubert Anson) Newton (Yale, 1850) who advised:
    E. H. Moore (Yale, 1885) who advised:
    Oswald Veblen (U. of Chicago, 1903) who advised:
    Philip Franklin (Princeton 1921) who advised:
    Alan Perlis (MIT Math PhD 1950) who advised:
    Jerry Feldman (CMU Math 1966) who advised:
    Jim Horning (Stanford CS PhD 1969) who advised:
    John Guttag (U. of Toronto CS PhD 1975) who advised:
    David Evans (MIT CS PhD 2000)
    my academic great-
    great-great-great-
    great-great-great-
    great-great-great-
    great-great-great-
    great-great-
    grandparent!

    View full-size slide

  21. More Precision
    22
    “The human race will have a
    new kind of instrument which
    will increase the power of the
    mind much more than optical
    lenses strengthen the eyes
    and which will be as far
    superior to microscopes or
    telescopes as reason is
    superior to sight.”
    Gottfried Wilhelm Leibniz (1679)
    Normal computing amplifies
    (quadrillions of times faster)
    and aggregates (enables
    millions of humans to work
    together) human cognitive
    abilities; AI goes beyond
    what humans can do.

    View full-size slide

  22. Operational Definition
    23
    If it is
    explainable,
    its not ML!
    “Artificial Intelligence”
    means making
    computers do things
    their programmers
    don’t understand well
    enough to program
    explicitly.

    View full-size slide

  23. Inherent Paradox of “Trustworthy” ML
    24
    If we could specify
    precisely what the model
    should do, we wouldn’t
    need ML to do it!
    “Artificial Intelligence”
    means making
    computers do things
    their programmers
    don’t understand well
    enough to program
    explicitly.

    View full-size slide

  24. Inherent Paradox of “Trustworthy” ML
    25
    If we could specify
    precisely what the model
    should do, we wouldn’t
    need ML to do it!
    Best we hope for is verifying certain properties
    M
    1
    M
    2
    ∀": $%
    " = $'
    (")
    DeepXplore: Automated
    Whitebox Testing of Deep
    Learning Systems. Kexin Pei,
    Yinzhi Cao, Junfeng Yang,
    Suman Jana. SOSP 2017
    Model Similarity

    View full-size slide

  25. Inherent Paradox of “Trustworthy” ML
    26
    Best we hope for is verifying certain properties
    M
    1
    M
    2
    ∀" ∈ $: &'
    " ≈ &)
    (")
    DeepXplore: Automated Whitebox
    Testing of Deep Learning Systems.
    Kexin Pei, Yinzhi Cao, Junfeng
    Yang, Suman Jana. SOSP 2017
    Model Similarity
    M
    ∀" ∈ $, ∀∆ ∈ .: & " ≈ &(" + ∆)
    " " + ∆
    Model Robustness
    0
    M
    0∗

    View full-size slide

  26. Adversarial Robustness
    27
    M
    ∀" ∈ $, ∀∆ ∈ ': ) " ≈ )(" + ∆)
    " " + ∆
    .
    M
    .∗
    Adversary’s Goal:
    find a “small” perturbation that changes
    model output
    targeted attack: in some desired way
    Defender’s Goal:
    Robust Model: find model where this is hard
    Detection: detect inputs that are adversarial

    View full-size slide

  27. Not a new problem...
    28
    Or do you think any Greek
    gift’s free of treachery? Is that
    Ulysses’s reputation? Either
    there are Greeks in hiding,
    concealed by the wood, or it’s
    been built as a machine to use
    against our walls, or spy on
    our homes, or fall on the city
    from above, or it hides some
    other trick: Trojans, don’t trust
    this horse. Whatever it is, I’m
    afraid of Greeks even those
    bearing gifts.’
    Virgil, The Aenid (Book II)

    View full-size slide

  28. Introduction to
    Deep Learning
    29

    View full-size slide

  29. Generic Classifier
    30
    !: # → Y
    Input: % ∈ ℝ(
    Output (label): ) ∈ {1, … , .}
    Natural distribution: 0 ⊆ %, ) pairs

    View full-size slide

  30. Neural Network
    31
    ! " = ! $ ! % &' … ! ) ! ' !(")
    “layer”:
    ! , : mostly from ℝ. → ℝ0

    View full-size slide

  31. Activation Layer
    32
    . . .
    Layer t − 1
    . . .
    #$,&
    '()
    *
    &
    ' = ,(∑
    $.)
    /(123)
    #
    $,&
    (' ())5
    $
    ('()))
    5
    $
    ('())

    View full-size slide

  32. Activation Layer
    33
    . . .
    Layer ! − 1
    . . .
    $%,'
    ()*
    +
    '
    ( = -(∑
    %/*
    0(234)
    $
    %,'
    (( )*)6
    %
    (()*))
    Activation function
    6
    %
    (()*)
    ReLU:
    Rectified
    Linear Unit
    - 6 = 7
    0, 6 < 0
    6, 6 ≥ 0

    View full-size slide

  33. “Fancy” Layers: Convolution
    34
    . . .
    Layer ! − 1
    $
    %
    & = (()
    *
    (&,-))
    . . .
    . . .
    . . .
    . . .
    . . .
    . . .
    . . .
    ×
    0--
    ⋯ 02-
    ⋮ ⋱ ⋮
    0-2
    ⋯ 022

    View full-size slide

  34. “Fancy” Layers: Max Pooling
    35
    Layer ! − 1
    $
    %
    & = (()
    *
    (&,-))
    . . .
    . . .
    . . .
    . . .
    . . .
    . . .
    . . .
    . . .
    ×
    0--
    ⋯ 02-
    ⋮ ⋱ ⋮
    0-2
    ⋯ 022

    View full-size slide

  35. “Fancy” Layers: Max Pooling
    36
    max(%&&
    , %&(
    , %(&
    , %((
    )
    max(%*&
    , %*(
    , %+&
    , %+(
    )
    max(%,&
    , %,(
    , %-&
    , %-(
    )

    View full-size slide

  36. Final Layer: SoftMax
    37
    . . .
    Layer ! − 1
    $%,'
    ()*
    +
    '
    ( = -(/( )*)
    SoftMax function
    - / =
    123

    '5*
    6 127
    | 9 = 1, … , ;
    [0.03, 0.32, 0.01, A. BC, 0.00, 0.01]
    /
    %
    (E)*)
    It’s a “cat” (0.63 confidence).

    View full-size slide

  37. DNNs in 1989
    38
    Backpropagation Applied to Handwritten Zip
    Code Recognition. Yann LeCun, et al., 1989.

    View full-size slide

  38. Turing Award in 2018
    39
    Yann Lecun Geoffrey Hinton Yoshua Bengio
    AT&T → Facebook/NYU Google/U. Toronto U. Montreal

    View full-size slide

  39. DNNs in 1989
    40
    Backpropagation Applied to Handwritten Zip
    Code Recognition. Yann LeCun, et al., 1989.

    View full-size slide

  40. MNIST
    41
    https://www.usenix.org/conference/usenixsecurity18/presentation/mickens
    James Mickens’ USENIX Security Symposium 2018 (Keynote)
    MNIST
    Dataset

    View full-size slide

  41. MNIST Dataset
    42
    2 8 7 6 8 6 5 9
    70 000 images
    (60 000 training, 10 000 testing)
    28×28 pixels, 8-bit grayscale
    scanned hand-written digits
    labeled by humans
    LeCun, Cortes, Burges [1998]

    View full-size slide

  42. MNIST Dataset
    43
    2 8 7 6 8 6 5 9
    70 000 images
    (60 000 training, 10 000 testing)
    28×28 pixels, 8-bit grayscale
    scanned hand-written digits
    labeled by humans
    LeCun, Cortes, Burges [1998]

    View full-size slide

  43. Progress in MNIST
    44
    Year Error Rate
    1998 [Yann
    LeCun, et al.]
    5% error rate
    (12.1% rejection
    for 1% error rate)
    2013 [..., Yann
    Le Cun, ...]
    0.21% (21 out of
    10,000 tests)

    View full-size slide

  44. CIFAR-10 (and CIFAR-100)
    45
    truck
    ship
    horse
    frog
    dog
    deer
    cat
    bird
    automobile
    airplane
    60 000 images
    32×32 pixels, 24-bit color
    human-labeled subset of
    images in 10 classes from
    Tiny Images Dataset
    Alex Krizhevsky [2009]

    View full-size slide

  45. 46
    14M high-resolution
    full color images
    Manually annotated in
    WordNet
    ~20,000 synonym set
    (~1000 images in each)

    View full-size slide

  46. Example CNN Architectures
    47
    Image from Deep Residual Learning for Image Recognition,
    Kaiming He Xiangyu Zhang Shaoqing Ren Jian Sun, 2015

    View full-size slide

  47. 48
    Image from Deep Residual Learning for Image Recognition,
    Kaiming He Xiangyu Zhang Shaoqing Ren Jian Sun, 2015
    Test error
    Training error
    Accuracy on CIFAR-10

    View full-size slide

  48. Inception
    49
    https://arxiv.org/pdf/1905.11946.pdf
    Image from Mingxing Tan, Quoc V.
    Le. EfficientNet: Rethinking
    Model Scaling for Convolutional
    Neural Networks. ICML 2019.

    View full-size slide

  49. Training a DNN
    50

    View full-size slide

  50. 51
    https://youtu.be/TVmjjfTvnFs

    View full-size slide

  51. Training a Network
    52
    select a network architecture, !
    " ← initialize with random parameters
    while (still improving):
    " ← adjust parameters(!, ", &, ')

    View full-size slide

  52. Goal of Training: Minimize Loss
    53
    Define a Loss Function:
    !"# =
    1
    &
    '
    ()*
    +
    ,-
    .(
    − 0(
    1
    Mean Square Error:
    (Maximize) Likelihood Estimation:
    ℒ = 3
    ()*
    +
    4 0 5) log ℒ = '
    ()*
    +
    log 4 0 5)
    (Maximize) Log-Likelihood Estimation:

    View full-size slide

  53. Training a Network
    54
    select a network architecture, !
    " ← initialize with random parameters
    while (still improving):
    " ← adjust parameters(!, ", &, ')

    View full-size slide

  54. Training a Network
    55
    select a network architecture, !
    " ← initialize with random parameters
    while ($%&&(!(
    , *, +) > goal and funding > 0):
    " ← adjust parameters(!, ", *, +)

    View full-size slide

  55. while (available_students > 0 and funding > 0):
    Finding a Good Architecture
    56
    select a network architecture, !
    " ← initialize with random parameters
    while ($%&&(!(
    , *, +) > goal and funding > 0):
    " ← adjust parameters(!, ", *, +)

    View full-size slide

  56. Gradient Descent
    57
    ℒ",$
    (&)
    &
    Goal: find & that
    minimizes ℒ",$
    (&).

    View full-size slide

  57. !
    Gradient Descent
    58
    ℒ#,%
    (!)
    Pick a random starting point
    Follow gradient (first derivative):
    to minimize, negative direction
    ℒ′#,%
    (!)
    !)
    = !)+,
    − . / ∇ℒ#,%
    (!)+,
    )

    View full-size slide

  58. !
    Gradient Descent: Non-Convex Loss
    59
    ℒ#,%
    (!)
    Pick a random starting point
    Follow gradient (first derivative):
    to minimize, negative direction
    ℒ′#,%
    (!)
    !)
    = !)+,
    − . / ∇ℒ#,%
    (!)+,
    )
    Repeat many times,
    hopefully find global
    minimum

    View full-size slide

  59. !
    Mini-Batch Stochastic Gradient Descent
    60
    ℒ#,%
    (!)
    Pick a random starting point
    Follow gradient (first derivative):
    to minimize, negative direction
    ℒ′#,%
    (!)
    !)
    = !)+,
    − . / ∇ℒ#,%
    (!)+,
    )
    Repeat many times,
    hopefully find global
    minimum
    To reduce computation, evaluate
    gradient of loss on randomly
    selected subset (“mini-batch”)

    View full-size slide

  60. Cost of Training
    61
    https://openai.com/blog/ai-and-compute/

    View full-size slide

  61. Cost of Training
    62
    https://openai.com/blog/ai-and-compute/

    View full-size slide

  62. Adversarial
    Machine Learning
    64

    View full-size slide

  63. Labelled
    Training Data
    ML
    Algorithm
    Feature
    Extraction
    Vectors
    Deployment
    Malicious / Benign
    Operational Data
    Trained Classifier
    Training
    (supervised learning)
    Statistical Machine Learning

    View full-size slide

  64. Labelled
    Training Data
    ML
    Algorithm
    Feature
    Extraction
    Vectors
    Deployment
    Malicious / Benign
    Operational Data
    Trained Classifier
    Training
    (supervised learning)
    Assumption: Training Data is Representative

    View full-size slide

  65. Deployment
    Adversaries Don’t Cooperate
    Assumption: Training Data is Representative
    Training
    Poisoning

    View full-size slide

  66. Adversaries Don’t Cooperate
    Assumption: Training Data is Representative
    Evading
    Deployment
    Training

    View full-size slide

  67. Adversarial Examples for DNNs
    69
    0.007 × [&'()*]
    + =
    “panda” “gibbon”
    Example from: Ian J. Goodfellow, Jonathon Shlens, Christian Szegedy.
    Explaining and Harnessing Adversarial Examples. 2014 (in ICLR 2015)

    View full-size slide

  68. 0
    200
    400
    600
    800
    1000
    1200
    1400
    1600
    1800
    2018
    2017
    2016
    2015
    2014
    2013
    70
    Papers on “Adversarial Examples”
    (Google Scholar)
    1826.68 papers
    expected in 2018!

    View full-size slide

  69. 0
    500
    1000
    1500
    2000
    2500
    3000
    2019
    2018
    2017
    2016
    2015
    2014
    2013
    71
    Papers on “Adversarial Examples”
    (Google Scholar)
    2901.67 papers
    expected in 2019!

    View full-size slide

  70. 0
    500
    1000
    1500
    2000
    2019
    2018
    2017
    2016
    2015
    2014
    2013
    72
    Dash of “Theory”
    ICML Workshop 2015
    15% of 2018 and 2019
    “adversarial examples”
    papers contain
    “theorem” and “proof”

    View full-size slide

  71. 73
    Battista Biggio, et al. ECML-KDD 2013

    View full-size slide

  72. Defining Adversarial Example
    74
    Assumption: small perturbation does not change class
    in “Reality Space” (human perception)
    Given seed sample, !, !" is an adversarial example iff:
    # !" = % Class is % (targeted)
    or # !" ≠ #(!) Class is different (untargeted)
    ∆ !, !" ≤ , Similar to seed !
    Difference below threshold

    View full-size slide

  73. 75
    Dog
    Random
    Direction
    Random
    Direction
    Slide by Nicholas Carlini

    View full-size slide

  74. 76
    Dog
    Random
    Direction
    Random
    Direction
    Slide by Nicholas Carlini
    Truck

    View full-size slide

  75. 77
    Dog
    Truck
    Adversarial
    Direction
    Random
    Direction
    Slide by Nicholas Carlini
    Airplane

    View full-size slide

  76. 78
    Weilin Xu et al. “Magic Tricks for Self-
    driving Cars”, Defcon-CAAD, 2018.
    Benign Malignant
    Melanoma Diagnosis
    Samuel G Finlayson et al. “Adversarial attacks on
    medical machine learning”, Science, 2019.
    Mahmood Sharif et al. “Accessorize to a Crime: Real and Stealthy
    Attacks on State-of-the-Art Face Recognition”, ACM CCS, 2016.
    =

    View full-size slide

  77. Natural Language
    79
    Examples by Hannah Chen
    Prediction: Positive (Confidence = 99.22)
    IMDB Movie Review Dataset
    Hilarious film, I had a great time watching it. The
    star (Cuneyt Arkin, sometimes credited as Steve
    Arkin) is a popular actor from Turkey. He has
    played in lots of tough-guy roles, epic-sword
    films, and romances. It was fun to see him with an
    international cast and some real lousy looking
    pair of gloves. If I remember it was also dubbed in
    English which made things even more funnier.
    (kinda like seeing John Wayne speak Turkish).

    View full-size slide

  78. Natural Language
    80
    Examples by Hannah Chen
    Prediction: Positive (Confidence = 91.06)
    IMDB Movie Review Dataset
    Hilarious film, I had a great time watching it. The
    star (Cuneyt Arkin, sometimes credited as Steve
    Arkin) is a popular actor from Turkey. He has
    played in lots of tough-guy roles, epic-sword
    films, and romances. It was fun to see him with an
    international cast and some real lousy looking
    pair of gloves. If I remember it was also dubbed in
    English which made things even more funnier.
    (kinda like seeing John Wayne speak Turkish).
    movies
    Target: Negative (Confidence = 8.94)

    View full-size slide

  79. Natural Language
    81
    Examples by Hannah Chen
    Prediction: Positive (Confidence = 92.28)
    IMDB Movie Review Dataset
    Hilarious film, I had a great time watching it. The
    star (Cuneyt Arkin, sometimes credited as Steve
    Arkin) is a popular actor from Turkey. He has
    played in lots of tough-guy roles, epic-sword
    films, and romances. It was fun to see him with an
    international cast and some real lousy looking
    pair of gloves. If I remember it was also dubbed in
    English which made things even more funnier.
    (kinda like seeing John Wayne speak Turkish).
    researching
    Target: Negative (Confidence = 7.72)

    View full-size slide

  80. Natural Language
    82
    Examples by Hannah Chen
    Prediction: Negative (Confidence = 73.33)
    IMDB Movie Review Dataset
    Hilarious film, I had a great time watching it. The
    star (Cuneyt Arkin, sometimes credited as Steve
    Arkin) is a popular actor from Turkey. He has
    played in lots of tough-guy roles, epic-sword
    films, and romances. It was fun to see him with an
    international cast and some real lousy looking
    pair of gloves. If I remember it was also dubbed in
    English which made things even more funnier.
    (kinda like seeing John Wayne speak Turkish).
    researching
    Target: Negative
    movies

    View full-size slide

  81. Defining Adversarial Example
    83
    Given seed sample, !, !" is an adversarial example iff:
    # !" = % Class is % (targeted)
    or # !" ≠ #(!) Class is different (untargeted)
    ∆ !, !" ≤ , Similar to seed !
    Difference below threshold
    ∆ -, -" is defined in some (simple!) metric space.

    View full-size slide

  82. Distance Metrics
    !"
    norms:
    84
    !"
    ($, $′) =
    )
    *+
    − *+
    - "
    ./
    “norm” (# different): ⋕ 1 *+
    ≠ *+
    -)
    !3
    norm: ∑ |*+
    − *+
    -|
    .6
    norm (“Euclidean”): ∑(*+
    −*
    +
    -)7
    .8
    norm: max(*+
    −*+
    -)
    Useful for theory and experiments, but not realistic!

    View full-size slide

  83. 85
    Images by Nicholas Carlini
    Original Image (!)

    View full-size slide

  84. 86
    Images by Nicholas Carlini
    Original Image (!) Adversarial Image: "#
    !, !% = '(

    View full-size slide

  85. 87
    Images by Nicholas Carlini
    Original Image (!) Adversarial Image: "#
    !, !% = '(

    View full-size slide

  86. Other Distance Metrics
    88
    Set of transformations:
    rotate, scale, “fog”, color, etc.
    NLP: word substitutions (synonym constraints)
    Semantic distance:
    ℬ "′) = ℬ(" Behavior we care about is the same
    Malware: it still behaves maliciously
    Vision: still looks like a “cat” to most humans
    We’ll get back to these...for now, let’s assume '(
    norms (like most research) despite flaws.

    View full-size slide

  87. 89
    Dog
    Truck
    Adversarial
    Direction
    Random
    Direction
    Slide by Nicholas Carlini
    Airplane
    How can we find nearby
    adversarial example?

    View full-size slide

  88. 90
    Slide by Nicholas Carlini

    View full-size slide

  89. 91
    Visualization by Nicholas Carlini

    View full-size slide

  90. Fast Gradient Sign
    92
    original 0.1 0.2 0.3 0.4 0.5
    Adversary Power: !
    "#
    -bounded adversary: max(abs(*+
    −*+
    -)) ≤ !
    *- = * − ! ⋅ sign(∇*
    6(*, 8))
    Goodfellow, Shlens, Szegedy 2014

    View full-size slide

  91. Impact of Adversarial Perturbations
    93
    Distance between layer output and its output for original seed
    FGSM
    ! = 0.0245
    CIFAR-10
    DenseNet
    95th percentile
    5th percentile

    View full-size slide

  92. Impact of Adversarial Perturbations
    94
    Distance between layer output and its output for original seed
    Random noise
    (same amount)
    FGSM
    ! = 0.0245
    CIFAR-10
    DenseNet

    View full-size slide

  93. Basic Iterative Method (BIM)
    95
    !"
    # = !
    for % iterations:
    !&'(
    # = clip-,/
    (!&
    # − 2 ⋅ sign(∇ 8 !&
    # , 9 )
    !# = !;

    A. Kurakin, I. Goodfellow, and S. Bengio 2016

    View full-size slide

  94. Projected Gradient Descent (PGD)
    96
    !"
    # = !
    for % iterations:
    !&'(
    # = project0,2
    (!&
    # − 5 ⋅ sign(∇ < !&
    # , = )
    !# = !?

    A. Kurakin, I. Goodfellow, and S. Bengio 2016

    View full-size slide

  95. Carlini/Wagner
    97
    min
    $
    (∆(', ' + *) + , ⋅ .(' + *))
    such that
    (/ + *) ∈ 0, 1 3
    Formulate optimization
    problem where . is defined
    objective function:
    . /4 ≥ 0 iff 7 /4 = 9
    model output matches target
    Nicholas Carlini, David Wagner IEEE S&P 2017
    Optimization problem that can be solved by
    standard optimizers
    Adam (SGD + momentum) [Kingman, Ba 2015]

    View full-size slide

  96. Carlini/Wagner
    98
    Formulate optimization
    problem where ! is defined
    objective function:
    ! "# ≥ 0 iff ( "# = *
    model output matches target
    ! "# = max
    . /0
    Z "#
    .
    − Z "#
    0
    ( 3 = ( 4 ( 5 67 … ( 9 ( 7 ((3)
    Z(")
    Nicholas Carlini, David Wagner IEEE S&P 2017
    min
    =
    (∆(3, 3 + A) + B ⋅ !(3 + A))
    such that
    (" + A) ∈ 0, 1 F
    softmax

    View full-size slide

  97. Carlini/Wagner: !" Attack
    99
    # $% = max
    * +,
    Z $%
    *
    − Z $%
    ,
    Nicholas Carlini, David Wagner IEEE S&P 2017
    min
    1
    (∆(4, 4 + 7) + 9 ⋅ #(4 + 7))
    such that
    ($ + 7) ∈ 0, 1 > 7?
    =
    1
    2
    (tanh C*
    + 1) + $*

    View full-size slide

  98. Carlini/Wagner: !" Attack
    100
    # $% = max
    * +,
    Z $%
    *
    − Z $%
    ,
    Nicholas Carlini, David Wagner IEEE S&P 2017
    min
    1
    ( 3
    "
    (tanh 6 + 1) − :
    "
    "
    + ; ⋅ #(3
    "
    (tanh 6 + 1) ))
    # $% = max(max
    * +,
    Z $%
    *
    − Z $%
    ,
    , −>)
    confidence parameter

    View full-size slide

  99. Impact of Adversarial Perturbations
    102
    Distance between layer output and its output for original seed
    Random noise
    (same amount)
    Carlini-
    Wagner L2
    CIFAR-10
    DenseNet

    View full-size slide

  100. Content-Space Attacks
    103
    What is there is no gradient to follow?

    View full-size slide

  101. Example: PDF Malware

    View full-size slide

  102. Finding Evasive Malware
    105
    Given seed sample, !, with desired malicious behavior
    find an adversarial example !" that satisfies:
    # !" = “&'()*(” Model misclassifies
    ℬ !′) = ℬ(! Malicious behavior preserved
    Generic attack: heuristically explore input
    space for !′ that satisfies definition.
    No requirement that ! ~ !′ except through ℬ.

    View full-size slide

  103. PDF Malware Classifiers
    Random Forest Random Forest
    Support Vector Machine
    Features
    Object counts,
    lengths,
    positions, …
    Object structural paths
    Very robust against “strongest
    conceivable mimicry attack”.
    Automated Features
    Manual Features
    PDFrate
    [ACSA 2012]
    Hidost16
    [JIS 2016]
    Hidost13
    [NDSS 2013]

    View full-size slide

  104. Variants
    Evolutionary Search
    Clone
    Benign PDFs
    Malicious PDF
    Mutation
    01011001101
    Variants
    Variants
    Select
    Variants




    Found
    Evasive?
    Benign
    Oracle
    Weilin Xu Yanjun Qi
    Fitness
    Selection
    Mutant
    Generation

    View full-size slide

  105. Variants
    Generating Variants
    Clone
    Benign PDFs
    Malicious PDF
    Mutation
    01011001101
    Variants
    Variants
    Select
    Variants




    Found
    Evasive?
    Fitness
    Selection
    Mutant
    Generation

    View full-size slide

  106. PDF Structure

    View full-size slide

  107. Variants
    Generating Variants
    Clone
    Benign PDFs
    Malicious PDF
    Mutation
    01011001101
    Variants
    Variants
    Select
    Variants




    Found
    Evasive?
    Fitness
    Selection
    Mutant
    Generation

    View full-size slide

  108. Variants
    Generating Variants
    Clone
    Benign PDFs
    Malicious PDF
    Mutation
    01011001101
    Variants
    Variants
    Select
    Variants




    Found
    Evasive?
    Found
    Evasive
    ?
    0
    /JavaScript
    eval(‘…’);
    /Root
    /Catalog
    /Pages
    Select random node
    Randomly transform: delete, insert, replace

    View full-size slide

  109. Variants
    Generating Variants
    Clone
    Benign PDFs
    Malicious PDF
    Mutation
    01011001101
    Variants
    Variants
    Select
    Variants
    Found
    Evasive?
    Found
    Evasive
    ?
    Select random node
    Randomly transform: delete, insert, replace
    Nodes from
    Benign PDFs
    0
    /JavaScript
    eval(‘…’);
    /Root
    /Catalog
    /Pages
    128
    546
    7
    63
    128

    View full-size slide

  110. Variants
    Selecting Promising Variants
    Clone
    Benign PDFs
    Malicious PDF
    Mutation
    01011001101
    Variants
    Variants
    Select
    Variants




    Found
    Evasive?
    Fitness
    Selection
    Mutant
    Generation

    View full-size slide

  111. Variants
    Selecting Promising Variants
    Clone
    Benign PDFs
    Malicious PDF
    Mutation
    01011001101
    Variants
    Variants
    Select
    Variants




    Found
    Evasive?
    Fitness Function
    Candidate Variant
    !(#$%&'()
    , #'(&++
    )
    Score
    Malicious
    0
    /JavaScript
    eval(‘…’);
    /Root
    /Catalog
    /Pages
    128
    Oracle
    Target Classifier

    View full-size slide

  112. Oracle: ℬ "′) = ℬ(" ?
    Execute candidate in
    vulnerable Adobe Reader in
    virtual environment
    Behavioral signature:
    malicious if signature matches
    https://github.com/cuckoosandbox
    Simulated network: INetSim
    Cuckoo
    HTTP_URL + HOST
    extracted from API traces

    View full-size slide

  113. Fitness Function
    Assumes lost malicious behavior will not be
    recovered
    !itness '′ = *
    1 − classi!ier_score '3 if ℬ '′) = ℬ('
    −∞ otherwise

    View full-size slide

  114. 0
    100
    200
    300
    400
    500
    0 100 200 300
    Seeds Evaded
    (out of 500)
    PDFRate
    Number of Mutations
    Hidost

    View full-size slide

  115. 0
    100
    200
    300
    400
    500
    0 100 200 300
    Seeds Evaded
    (out of 500)
    PDFRate
    Number of Mutations
    Hidost
    Simple
    transformations
    often worked

    View full-size slide

  116. 0
    100
    200
    300
    400
    500
    0 100 200 300
    Seeds Evaded
    (out of 500)
    PDFRate
    Number of Mutations
    Hidost
    (insert, /Root/Pages/Kids,
    3:/Root/Pages/Kids/4/Kids/5/)
    Works on 162/500 seeds

    View full-size slide

  117. 0
    100
    200
    300
    400
    500
    0 100 200 300
    Seeds Evaded
    (out of 500)
    PDFRate
    Number of Mutations
    Hidost
    Some seeds
    required complex
    transformations

    View full-size slide

  118. Malicious Label
    Threshold
    Original Malicious Seeds
    Evading
    PDFrate
    Classification Score
    Malware Seed (sorted by original score)
    Discovered Evasive Variants

    View full-size slide

  119. Discovered Evasive Variants
    Malicious Label
    Threshold
    Original Malicious Seeds
    Adjust threshold?
    Charles Smutz, Angelos
    Stavrou. When a Tree Falls:
    Using Diversity in Ensemble
    Classifiers to Identify
    Evasion in Malware
    Detectors. NDSS 2016.
    Classification Score
    Malware Seed (sorted by original score)

    View full-size slide

  120. Variants found with threshold = 0.25
    Variants found with threshold = 0.50
    Adjust threshold?
    Classification Score
    Malware Seed (sorted by original score)

    View full-size slide

  121. Variants
    Hide the Classifier Score?
    Clone
    Benign PDFs
    Malicious PDF
    Mutation
    01011001101
    Variants
    Variants
    Select
    Variants




    Found
    Evasive?
    Fitness Function
    Candidate Variant
    !(#$%&'()
    , #'(&++
    )
    Score
    Malicious
    0
    /JavaScript
    eval(‘…’);
    /Root
    /Catalog
    /Pages
    128
    Oracle
    Target Classifier

    View full-size slide

  122. Variants
    Binary Classifier Output is Enough
    Clone
    Benign PDFs
    Malicious PDF
    Mutation
    01011001101
    Variants
    Variants
    Select
    Variants




    Found
    Evasive?
    Fitness Function
    Candidate Variant
    !(#$%&'()
    , #'(&++
    )
    Score
    Malicious
    0
    /JavaScript
    eval(‘…’);
    /Root
    /Catalog
    /Pages
    128
    Oracle
    Target Classifier
    ACM CCS 2017

    View full-size slide

  123. Labelled
    Training Data
    ML
    Algorithm
    Feature
    Extraction
    Vectors
    Deployment
    Malicious / Benign
    Operational Data
    Trained Classifier
    Training
    (supervised learning)
    Retrain Classifier

    View full-size slide

  124. Labelled
    Training Data
    ML
    Algorithm
    Feature
    Extraction
    Vectors
    Training
    (supervised learning)
    Clone
    01011001
    101
    EvadeML
    Deployment

    View full-size slide

  125. 0
    100
    200
    300
    400
    500
    0 200 400 600 800
    Seeds Evaded (out of 500)
    Generations
    Hidost16
    Original classifier:
    Takes 614 generations
    to evade all seeds

    View full-size slide

  126. 0
    100
    200
    300
    400
    500
    0 200 400 600 800
    HidostR1
    Seeds Evaded (out of 500)
    Generations
    Hidost16

    View full-size slide

  127. 0
    100
    200
    300
    400
    500
    0 200 400 600 800
    HidostR1
    Seeds Evaded (out of 500)
    Generations
    Hidost16

    View full-size slide

  128. 0
    100
    200
    300
    400
    500
    0 200 400 600 800
    HidostR1
    HidostR2
    Seeds Evaded (out of 500)
    Generations
    Hidost16

    View full-size slide

  129. 0
    100
    200
    300
    400
    500
    0 200 400 600 800
    HidostR1
    HidostR2
    Seeds Evaded (out of 500)
    Generations
    Hidost16

    View full-size slide

  130. 0
    100
    200
    300
    400
    500
    0 200 400 600 800
    Hidost16
    Genome Contagio Benign
    Hidost16 0.00 0.00
    HidostR1 0.78 0.30
    HidostR2 0.85 0.53
    False Positive Rates
    HidostR1
    Seeds Evaded (out of 500)
    Generations
    HidostR2

    View full-size slide

  131. 0
    100
    200
    300
    400
    500
    0 200 400 600 800
    Hidost16
    Genome Contagio Benign
    Hidost16 0.00 0.00
    HidostR1 0.78 0.30
    HidostR2 0.85 0.53
    False Positive Rates
    HidostR1
    Seeds Evaded (out of 500)
    Generations
    HidostR2

    View full-size slide

  132. 135
    Only 8/6987 robust features (Hidost)
    Robust classifier
    High false positives
    /Names
    /Names /JavaScript
    /Names /JavaScript /Names
    /Names /JavaScript /JS
    /OpenAction
    /OpenAction /JS
    /OpenAction /S
    /Pages
    USENIX Security 2019

    View full-size slide

  133. Malware Classification Moral
    To build robust, effective malware
    classifiers need robust features that are
    strong signals for malware.
    136
    If you have features like this – don’t need ML!
    There are scenarios where adversarial
    training “works” [more tomorrow].

    View full-size slide

  134. Recap: Adversarial Examples across Domains
    137
    Domain Classifier Space “Reality” Space
    Trojan Wars
    Judgment of Trojans
    !(#) = “gift”
    Physical Reality
    !∗(#) = invading army
    Malware
    Malware Detector
    !(#) = “benign”
    Victim’s Execution
    !∗(#) = malicious behavior
    Image
    Classification
    DNN Classifier
    !(#) = )
    Human Perception
    !∗(#) = *

    View full-size slide

  135. Tomorrow:
    Defenses
    138
    David Evans
    University of Virginia
    [email protected]
    https://www.cs.virginia.edu/evans

    View full-size slide