Upgrade to Pro — share decks privately, control downloads, hide ads and more …

FOSAD Trustworthy Machine Learning: Class 1

David Evans
August 26, 2019

FOSAD Trustworthy Machine Learning: Class 1

19th International School on Foundations of Security Analysis and Design
Mini-course on "Trustworthy Machine Learning"
https://jeffersonswheel.org/fosad2019
David Evans

Class 1: Introduction/Attacks

David Evans

August 26, 2019
Tweet

More Decks by David Evans

Other Decks in Education

Transcript

  1. Trustworthy
    Machine
    Learning
    David Evans
    University of Virginia
    jeffersonswheel.org
    Bertinoro, Italy
    26 August 2019
    19th International School on Foundations of Security Analysis and Design
    1: Introduction/Attacks

    View Slide

  2. Plan for the Course
    Monday (Today)
    Introduction
    ML Background
    Attacks
    Tuesday (Tomorrow)
    Defenses
    Wednesday
    Privacy, Fairness, Abuse
    1
    Overall Goals:
    broad and whirlwind survey* of an
    exciting emerging research area
    explain a few of my favorite research
    results in enough detail to understand
    them at a high-level
    introduce some open problems that I
    hope you will work on and solve
    * but highly biased by my own interests

    View Slide

  3. 2
    Why should
    we care
    about
    Trustworthy
    Machine
    Learning?

    View Slide

  4. 3
    “Unfortunately, our translation systems made an error last week
    that misinterpreted what this individual posted. Even though
    our translations are getting better each day, mistakes like these
    might happen from time to time and we’ve taken steps to
    address this particular issue. We apologize to him and his family
    for the mistake and the disruption this caused.”

    View Slide

  5. 4

    View Slide

  6. Amazon Employment
    5

    View Slide

  7. Risks from Artificial Intelligence
    6
    Benign developers and operators
    AI out of control
    AI inadvertently causes harm
    Malicious operators
    Build AI to do harm
    Malicious abuse of benign AI
    On Robots
    Joe Berger and Pascal Wyse
    (The Guardian, 21 July 2018)

    View Slide

  8. Harmful AI
    Benign developers and operators
    AI out of control
    AI causes harm (without creators objecting)
    Malicious operators
    Build AI to do harm
    7

    View Slide

  9. Out-of-Control AI
    8
    HAL, 2001: A Space Odyssey SkyNet, The Terminator

    View Slide

  10. Alignment Problem
    9
    Bostrom’s Paperclip Maximizer

    View Slide

  11. Harmful AI
    Benign developers and operators
    AI out of control
    AI inadvertently causes harm to humanity
    Malicious operators
    Build AI to do harm
    10

    View Slide

  12. Lost Jobs and Dignity
    11

    View Slide

  13. 12
    On Robots
    Joe Berger and Pascal Wyse
    (The Guardian, 21 July 2018)
    Human Jobs
    of the Future

    View Slide

  14. Inadvertent Bias and Discrimination
    13
    3rd lecture

    View Slide

  15. Harmful AI
    Benign developers
    AI out of control
    AI causes harm (without creators objecting)
    Malicious developers
    Using AI to do harm
    14
    Malice is (often) in the eye of the beholder
    (e.g., mass surveillance, pop-up ads, etc.)

    View Slide

  16. Automated Spear Phishing
    15
    “It’s slightly less effective [than manually generated] but it’s
    dramatically more efficient” (John Seymour)
    More malicious use
    of AI in 3rd lecture?

    View Slide

  17. Risks from Artificial Intelligence
    Benign developers and operators
    AI out of control
    AI inadvertently causes harm
    Malicious operators
    Build AI to do harm
    Malicious abuse of benign AI systems
    16
    rest of today and tomorrow

    View Slide

  18. Crash Course in
    Machine Learning
    17

    View Slide

  19. 18

    View Slide

  20. More Ambition
    19
    “The human race will have a
    new kind of instrument which
    will increase the power of the
    mind much more than optical
    lenses strengthen the eyes
    and which will be as far
    superior to microscopes or
    telescopes as reason is
    superior to sight.”

    View Slide

  21. More Ambition
    20
    Gottfried Wilhelm Leibniz (1679)

    View Slide

  22. 21
    Gottfried Wilhelm Leibniz (Universitat Altdorf, 1666) who advised:
    Jacob Bernoulli (Universitdt Basel, 1684) who advised:
    Johann Bernoulli (Universitdt Basel, 1694) who advised:
    Leonhard Euler (Universitat Basel, 1726) who advised:
    Joseph Louis Lagrange who advised:
    Simeon Denis Poisson who advised:
    Michel Chasles (Ecole Polytechnique, 1814) who advised:
    H. A. (Hubert Anson) Newton (Yale, 1850) who advised:
    E. H. Moore (Yale, 1885) who advised:
    Oswald Veblen (U. of Chicago, 1903) who advised:
    Philip Franklin (Princeton 1921) who advised:
    Alan Perlis (MIT Math PhD 1950) who advised:
    Jerry Feldman (CMU Math 1966) who advised:
    Jim Horning (Stanford CS PhD 1969) who advised:
    John Guttag (U. of Toronto CS PhD 1975) who advised:
    David Evans (MIT CS PhD 2000)
    my academic great-
    great-great-great-
    great-great-great-
    great-great-great-
    great-great-great-
    great-great-
    grandparent!

    View Slide

  23. More Precision
    22
    “The human race will have a
    new kind of instrument which
    will increase the power of the
    mind much more than optical
    lenses strengthen the eyes
    and which will be as far
    superior to microscopes or
    telescopes as reason is
    superior to sight.”
    Gottfried Wilhelm Leibniz (1679)
    Normal computing amplifies
    (quadrillions of times faster)
    and aggregates (enables
    millions of humans to work
    together) human cognitive
    abilities; AI goes beyond
    what humans can do.

    View Slide

  24. Operational Definition
    23
    If it is
    explainable,
    its not ML!
    “Artificial Intelligence”
    means making
    computers do things
    their programmers
    don’t understand well
    enough to program
    explicitly.

    View Slide

  25. Inherent Paradox of “Trustworthy” ML
    24
    If we could specify
    precisely what the model
    should do, we wouldn’t
    need ML to do it!
    “Artificial Intelligence”
    means making
    computers do things
    their programmers
    don’t understand well
    enough to program
    explicitly.

    View Slide

  26. Inherent Paradox of “Trustworthy” ML
    25
    If we could specify
    precisely what the model
    should do, we wouldn’t
    need ML to do it!
    Best we hope for is verifying certain properties
    M
    1
    M
    2
    ∀": $%
    " = $'
    (")
    DeepXplore: Automated
    Whitebox Testing of Deep
    Learning Systems. Kexin Pei,
    Yinzhi Cao, Junfeng Yang,
    Suman Jana. SOSP 2017
    Model Similarity

    View Slide

  27. Inherent Paradox of “Trustworthy” ML
    26
    Best we hope for is verifying certain properties
    M
    1
    M
    2
    ∀" ∈ $: &'
    " ≈ &)
    (")
    DeepXplore: Automated Whitebox
    Testing of Deep Learning Systems.
    Kexin Pei, Yinzhi Cao, Junfeng
    Yang, Suman Jana. SOSP 2017
    Model Similarity
    M
    ∀" ∈ $, ∀∆ ∈ .: & " ≈ &(" + ∆)
    " " + ∆
    Model Robustness
    0
    M
    0∗

    View Slide

  28. Adversarial Robustness
    27
    M
    ∀" ∈ $, ∀∆ ∈ ': ) " ≈ )(" + ∆)
    " " + ∆
    .
    M
    .∗
    Adversary’s Goal:
    find a “small” perturbation that changes
    model output
    targeted attack: in some desired way
    Defender’s Goal:
    Robust Model: find model where this is hard
    Detection: detect inputs that are adversarial

    View Slide

  29. Not a new problem...
    28
    Or do you think any Greek
    gift’s free of treachery? Is that
    Ulysses’s reputation? Either
    there are Greeks in hiding,
    concealed by the wood, or it’s
    been built as a machine to use
    against our walls, or spy on
    our homes, or fall on the city
    from above, or it hides some
    other trick: Trojans, don’t trust
    this horse. Whatever it is, I’m
    afraid of Greeks even those
    bearing gifts.’
    Virgil, The Aenid (Book II)

    View Slide

  30. Introduction to
    Deep Learning
    29

    View Slide

  31. Generic Classifier
    30
    !: # → Y
    Input: % ∈ ℝ(
    Output (label): ) ∈ {1, … , .}
    Natural distribution: 0 ⊆ %, ) pairs

    View Slide

  32. Neural Network
    31
    ! " = ! $ ! % &' … ! ) ! ' !(")
    “layer”:
    ! , : mostly from ℝ. → ℝ0

    View Slide

  33. Activation Layer
    32
    . . .
    Layer t − 1
    . . .
    #$,&
    '()
    *
    &
    ' = ,(∑
    $.)
    /(123)
    #
    $,&
    (' ())5
    $
    ('()))
    5
    $
    ('())

    View Slide

  34. Activation Layer
    33
    . . .
    Layer ! − 1
    . . .
    $%,'
    ()*
    +
    '
    ( = -(∑
    %/*
    0(234)
    $
    %,'
    (( )*)6
    %
    (()*))
    Activation function
    6
    %
    (()*)
    ReLU:
    Rectified
    Linear Unit
    - 6 = 7
    0, 6 < 0
    6, 6 ≥ 0

    View Slide

  35. “Fancy” Layers: Convolution
    34
    . . .
    Layer ! − 1
    $
    %
    & = (()
    *
    (&,-))
    . . .
    . . .
    . . .
    . . .
    . . .
    . . .
    . . .
    ×
    0--
    ⋯ 02-
    ⋮ ⋱ ⋮
    0-2
    ⋯ 022

    View Slide

  36. “Fancy” Layers: Max Pooling
    35
    Layer ! − 1
    $
    %
    & = (()
    *
    (&,-))
    . . .
    . . .
    . . .
    . . .
    . . .
    . . .
    . . .
    . . .
    ×
    0--
    ⋯ 02-
    ⋮ ⋱ ⋮
    0-2
    ⋯ 022

    View Slide

  37. “Fancy” Layers: Max Pooling
    36
    max(%&&
    , %&(
    , %(&
    , %((
    )
    max(%*&
    , %*(
    , %+&
    , %+(
    )
    max(%,&
    , %,(
    , %-&
    , %-(
    )

    View Slide

  38. Final Layer: SoftMax
    37
    . . .
    Layer ! − 1
    $%,'
    ()*
    +
    '
    ( = -(/( )*)
    SoftMax function
    - / =
    123

    '5*
    6 127
    | 9 = 1, … , ;
    [0.03, 0.32, 0.01, A. BC, 0.00, 0.01]
    /
    %
    (E)*)
    It’s a “cat” (0.63 confidence).

    View Slide

  39. DNNs in 1989
    38
    Backpropagation Applied to Handwritten Zip
    Code Recognition. Yann LeCun, et al., 1989.

    View Slide

  40. Turing Award in 2018
    39
    Yann Lecun Geoffrey Hinton Yoshua Bengio
    AT&T → Facebook/NYU Google/U. Toronto U. Montreal

    View Slide

  41. DNNs in 1989
    40
    Backpropagation Applied to Handwritten Zip
    Code Recognition. Yann LeCun, et al., 1989.

    View Slide

  42. MNIST
    41
    https://www.usenix.org/conference/usenixsecurity18/presentation/mickens
    James Mickens’ USENIX Security Symposium 2018 (Keynote)
    MNIST
    Dataset

    View Slide

  43. MNIST Dataset
    42
    2 8 7 6 8 6 5 9
    70 000 images
    (60 000 training, 10 000 testing)
    28×28 pixels, 8-bit grayscale
    scanned hand-written digits
    labeled by humans
    LeCun, Cortes, Burges [1998]

    View Slide

  44. MNIST Dataset
    43
    2 8 7 6 8 6 5 9
    70 000 images
    (60 000 training, 10 000 testing)
    28×28 pixels, 8-bit grayscale
    scanned hand-written digits
    labeled by humans
    LeCun, Cortes, Burges [1998]

    View Slide

  45. Progress in MNIST
    44
    Year Error Rate
    1998 [Yann
    LeCun, et al.]
    5% error rate
    (12.1% rejection
    for 1% error rate)
    2013 [..., Yann
    Le Cun, ...]
    0.21% (21 out of
    10,000 tests)

    View Slide

  46. CIFAR-10 (and CIFAR-100)
    45
    truck
    ship
    horse
    frog
    dog
    deer
    cat
    bird
    automobile
    airplane
    60 000 images
    32×32 pixels, 24-bit color
    human-labeled subset of
    images in 10 classes from
    Tiny Images Dataset
    Alex Krizhevsky [2009]

    View Slide

  47. 46
    14M high-resolution
    full color images
    Manually annotated in
    WordNet
    ~20,000 synonym set
    (~1000 images in each)

    View Slide

  48. Example CNN Architectures
    47
    Image from Deep Residual Learning for Image Recognition,
    Kaiming He Xiangyu Zhang Shaoqing Ren Jian Sun, 2015

    View Slide

  49. 48
    Image from Deep Residual Learning for Image Recognition,
    Kaiming He Xiangyu Zhang Shaoqing Ren Jian Sun, 2015
    Test error
    Training error
    Accuracy on CIFAR-10

    View Slide

  50. Inception
    49
    https://arxiv.org/pdf/1905.11946.pdf
    Image from Mingxing Tan, Quoc V.
    Le. EfficientNet: Rethinking
    Model Scaling for Convolutional
    Neural Networks. ICML 2019.

    View Slide

  51. Training a DNN
    50

    View Slide

  52. 51
    https://youtu.be/TVmjjfTvnFs

    View Slide

  53. Training a Network
    52
    select a network architecture, !
    " ← initialize with random parameters
    while (still improving):
    " ← adjust parameters(!, ", &, ')

    View Slide

  54. Goal of Training: Minimize Loss
    53
    Define a Loss Function:
    !"# =
    1
    &
    '
    ()*
    +
    ,-
    .(
    − 0(
    1
    Mean Square Error:
    (Maximize) Likelihood Estimation:
    ℒ = 3
    ()*
    +
    4 0 5) log ℒ = '
    ()*
    +
    log 4 0 5)
    (Maximize) Log-Likelihood Estimation:

    View Slide

  55. Training a Network
    54
    select a network architecture, !
    " ← initialize with random parameters
    while (still improving):
    " ← adjust parameters(!, ", &, ')

    View Slide

  56. Training a Network
    55
    select a network architecture, !
    " ← initialize with random parameters
    while ($%&&(!(
    , *, +) > goal and funding > 0):
    " ← adjust parameters(!, ", *, +)

    View Slide

  57. while (available_students > 0 and funding > 0):
    Finding a Good Architecture
    56
    select a network architecture, !
    " ← initialize with random parameters
    while ($%&&(!(
    , *, +) > goal and funding > 0):
    " ← adjust parameters(!, ", *, +)

    View Slide

  58. Gradient Descent
    57
    ℒ",$
    (&)
    &
    Goal: find & that
    minimizes ℒ",$
    (&).

    View Slide

  59. !
    Gradient Descent
    58
    ℒ#,%
    (!)
    Pick a random starting point
    Follow gradient (first derivative):
    to minimize, negative direction
    ℒ′#,%
    (!)
    !)
    = !)+,
    − . / ∇ℒ#,%
    (!)+,
    )

    View Slide

  60. !
    Gradient Descent: Non-Convex Loss
    59
    ℒ#,%
    (!)
    Pick a random starting point
    Follow gradient (first derivative):
    to minimize, negative direction
    ℒ′#,%
    (!)
    !)
    = !)+,
    − . / ∇ℒ#,%
    (!)+,
    )
    Repeat many times,
    hopefully find global
    minimum

    View Slide

  61. !
    Mini-Batch Stochastic Gradient Descent
    60
    ℒ#,%
    (!)
    Pick a random starting point
    Follow gradient (first derivative):
    to minimize, negative direction
    ℒ′#,%
    (!)
    !)
    = !)+,
    − . / ∇ℒ#,%
    (!)+,
    )
    Repeat many times,
    hopefully find global
    minimum
    To reduce computation, evaluate
    gradient of loss on randomly
    selected subset (“mini-batch”)

    View Slide

  62. Cost of Training
    61
    https://openai.com/blog/ai-and-compute/

    View Slide

  63. Cost of Training
    62
    https://openai.com/blog/ai-and-compute/

    View Slide

  64. 63

    View Slide

  65. Adversarial
    Machine Learning
    64

    View Slide

  66. Labelled
    Training Data
    ML
    Algorithm
    Feature
    Extraction
    Vectors
    Deployment
    Malicious / Benign
    Operational Data
    Trained Classifier
    Training
    (supervised learning)
    Statistical Machine Learning

    View Slide

  67. Labelled
    Training Data
    ML
    Algorithm
    Feature
    Extraction
    Vectors
    Deployment
    Malicious / Benign
    Operational Data
    Trained Classifier
    Training
    (supervised learning)
    Assumption: Training Data is Representative

    View Slide

  68. Deployment
    Adversaries Don’t Cooperate
    Assumption: Training Data is Representative
    Training
    Poisoning

    View Slide

  69. Adversaries Don’t Cooperate
    Assumption: Training Data is Representative
    Evading
    Deployment
    Training

    View Slide

  70. Adversarial Examples for DNNs
    69
    0.007 × [&'()*]
    + =
    “panda” “gibbon”
    Example from: Ian J. Goodfellow, Jonathon Shlens, Christian Szegedy.
    Explaining and Harnessing Adversarial Examples. 2014 (in ICLR 2015)

    View Slide

  71. 0
    200
    400
    600
    800
    1000
    1200
    1400
    1600
    1800
    2018
    2017
    2016
    2015
    2014
    2013
    70
    Papers on “Adversarial Examples”
    (Google Scholar)
    1826.68 papers
    expected in 2018!

    View Slide

  72. 0
    500
    1000
    1500
    2000
    2500
    3000
    2019
    2018
    2017
    2016
    2015
    2014
    2013
    71
    Papers on “Adversarial Examples”
    (Google Scholar)
    2901.67 papers
    expected in 2019!

    View Slide

  73. 0
    500
    1000
    1500
    2000
    2019
    2018
    2017
    2016
    2015
    2014
    2013
    72
    Dash of “Theory”
    ICML Workshop 2015
    15% of 2018 and 2019
    “adversarial examples”
    papers contain
    “theorem” and “proof”

    View Slide

  74. 73
    Battista Biggio, et al. ECML-KDD 2013

    View Slide

  75. Defining Adversarial Example
    74
    Assumption: small perturbation does not change class
    in “Reality Space” (human perception)
    Given seed sample, !, !" is an adversarial example iff:
    # !" = % Class is % (targeted)
    or # !" ≠ #(!) Class is different (untargeted)
    ∆ !, !" ≤ , Similar to seed !
    Difference below threshold

    View Slide

  76. 75
    Dog
    Random
    Direction
    Random
    Direction
    Slide by Nicholas Carlini

    View Slide

  77. 76
    Dog
    Random
    Direction
    Random
    Direction
    Slide by Nicholas Carlini
    Truck

    View Slide

  78. 77
    Dog
    Truck
    Adversarial
    Direction
    Random
    Direction
    Slide by Nicholas Carlini
    Airplane

    View Slide

  79. 78
    Weilin Xu et al. “Magic Tricks for Self-
    driving Cars”, Defcon-CAAD, 2018.
    Benign Malignant
    Melanoma Diagnosis
    Samuel G Finlayson et al. “Adversarial attacks on
    medical machine learning”, Science, 2019.
    Mahmood Sharif et al. “Accessorize to a Crime: Real and Stealthy
    Attacks on State-of-the-Art Face Recognition”, ACM CCS, 2016.
    =

    View Slide

  80. Natural Language
    79
    Examples by Hannah Chen
    Prediction: Positive (Confidence = 99.22)
    IMDB Movie Review Dataset
    Hilarious film, I had a great time watching it. The
    star (Cuneyt Arkin, sometimes credited as Steve
    Arkin) is a popular actor from Turkey. He has
    played in lots of tough-guy roles, epic-sword
    films, and romances. It was fun to see him with an
    international cast and some real lousy looking
    pair of gloves. If I remember it was also dubbed in
    English which made things even more funnier.
    (kinda like seeing John Wayne speak Turkish).

    View Slide

  81. Natural Language
    80
    Examples by Hannah Chen
    Prediction: Positive (Confidence = 91.06)
    IMDB Movie Review Dataset
    Hilarious film, I had a great time watching it. The
    star (Cuneyt Arkin, sometimes credited as Steve
    Arkin) is a popular actor from Turkey. He has
    played in lots of tough-guy roles, epic-sword
    films, and romances. It was fun to see him with an
    international cast and some real lousy looking
    pair of gloves. If I remember it was also dubbed in
    English which made things even more funnier.
    (kinda like seeing John Wayne speak Turkish).
    movies
    Target: Negative (Confidence = 8.94)

    View Slide

  82. Natural Language
    81
    Examples by Hannah Chen
    Prediction: Positive (Confidence = 92.28)
    IMDB Movie Review Dataset
    Hilarious film, I had a great time watching it. The
    star (Cuneyt Arkin, sometimes credited as Steve
    Arkin) is a popular actor from Turkey. He has
    played in lots of tough-guy roles, epic-sword
    films, and romances. It was fun to see him with an
    international cast and some real lousy looking
    pair of gloves. If I remember it was also dubbed in
    English which made things even more funnier.
    (kinda like seeing John Wayne speak Turkish).
    researching
    Target: Negative (Confidence = 7.72)

    View Slide

  83. Natural Language
    82
    Examples by Hannah Chen
    Prediction: Negative (Confidence = 73.33)
    IMDB Movie Review Dataset
    Hilarious film, I had a great time watching it. The
    star (Cuneyt Arkin, sometimes credited as Steve
    Arkin) is a popular actor from Turkey. He has
    played in lots of tough-guy roles, epic-sword
    films, and romances. It was fun to see him with an
    international cast and some real lousy looking
    pair of gloves. If I remember it was also dubbed in
    English which made things even more funnier.
    (kinda like seeing John Wayne speak Turkish).
    researching
    Target: Negative
    movies

    View Slide

  84. Defining Adversarial Example
    83
    Given seed sample, !, !" is an adversarial example iff:
    # !" = % Class is % (targeted)
    or # !" ≠ #(!) Class is different (untargeted)
    ∆ !, !" ≤ , Similar to seed !
    Difference below threshold
    ∆ -, -" is defined in some (simple!) metric space.

    View Slide

  85. Distance Metrics
    !"
    norms:
    84
    !"
    ($, $′) =
    )
    *+
    − *+
    - "
    ./
    “norm” (# different): ⋕ 1 *+
    ≠ *+
    -)
    !3
    norm: ∑ |*+
    − *+
    -|
    .6
    norm (“Euclidean”): ∑(*+
    −*
    +
    -)7
    .8
    norm: max(*+
    −*+
    -)
    Useful for theory and experiments, but not realistic!

    View Slide

  86. 85
    Images by Nicholas Carlini
    Original Image (!)

    View Slide

  87. 86
    Images by Nicholas Carlini
    Original Image (!) Adversarial Image: "#
    !, !% = '(

    View Slide

  88. 87
    Images by Nicholas Carlini
    Original Image (!) Adversarial Image: "#
    !, !% = '(

    View Slide

  89. Other Distance Metrics
    88
    Set of transformations:
    rotate, scale, “fog”, color, etc.
    NLP: word substitutions (synonym constraints)
    Semantic distance:
    ℬ "′) = ℬ(" Behavior we care about is the same
    Malware: it still behaves maliciously
    Vision: still looks like a “cat” to most humans
    We’ll get back to these...for now, let’s assume '(
    norms (like most research) despite flaws.

    View Slide

  90. 89
    Dog
    Truck
    Adversarial
    Direction
    Random
    Direction
    Slide by Nicholas Carlini
    Airplane
    How can we find nearby
    adversarial example?

    View Slide

  91. 90
    Slide by Nicholas Carlini

    View Slide

  92. 91
    Visualization by Nicholas Carlini

    View Slide

  93. Fast Gradient Sign
    92
    original 0.1 0.2 0.3 0.4 0.5
    Adversary Power: !
    "#
    -bounded adversary: max(abs(*+
    −*+
    -)) ≤ !
    *- = * − ! ⋅ sign(∇*
    6(*, 8))
    Goodfellow, Shlens, Szegedy 2014

    View Slide

  94. Impact of Adversarial Perturbations
    93
    Distance between layer output and its output for original seed
    FGSM
    ! = 0.0245
    CIFAR-10
    DenseNet
    95th percentile
    5th percentile

    View Slide

  95. Impact of Adversarial Perturbations
    94
    Distance between layer output and its output for original seed
    Random noise
    (same amount)
    FGSM
    ! = 0.0245
    CIFAR-10
    DenseNet

    View Slide

  96. Basic Iterative Method (BIM)
    95
    !"
    # = !
    for % iterations:
    !&'(
    # = clip-,/
    (!&
    # − 2 ⋅ sign(∇ 8 !&
    # , 9 )
    !# = !;

    A. Kurakin, I. Goodfellow, and S. Bengio 2016

    View Slide

  97. Projected Gradient Descent (PGD)
    96
    !"
    # = !
    for % iterations:
    !&'(
    # = project0,2
    (!&
    # − 5 ⋅ sign(∇ < !&
    # , = )
    !# = !?

    A. Kurakin, I. Goodfellow, and S. Bengio 2016

    View Slide

  98. Carlini/Wagner
    97
    min
    $
    (∆(', ' + *) + , ⋅ .(' + *))
    such that
    (/ + *) ∈ 0, 1 3
    Formulate optimization
    problem where . is defined
    objective function:
    . /4 ≥ 0 iff 7 /4 = 9
    model output matches target
    Nicholas Carlini, David Wagner IEEE S&P 2017
    Optimization problem that can be solved by
    standard optimizers
    Adam (SGD + momentum) [Kingman, Ba 2015]

    View Slide

  99. Carlini/Wagner
    98
    Formulate optimization
    problem where ! is defined
    objective function:
    ! "# ≥ 0 iff ( "# = *
    model output matches target
    ! "# = max
    . /0
    Z "#
    .
    − Z "#
    0
    ( 3 = ( 4 ( 5 67 … ( 9 ( 7 ((3)
    Z(")
    Nicholas Carlini, David Wagner IEEE S&P 2017
    min
    =
    (∆(3, 3 + A) + B ⋅ !(3 + A))
    such that
    (" + A) ∈ 0, 1 F
    softmax

    View Slide

  100. Carlini/Wagner: !" Attack
    99
    # $% = max
    * +,
    Z $%
    *
    − Z $%
    ,
    Nicholas Carlini, David Wagner IEEE S&P 2017
    min
    1
    (∆(4, 4 + 7) + 9 ⋅ #(4 + 7))
    such that
    ($ + 7) ∈ 0, 1 > 7?
    =
    1
    2
    (tanh C*
    + 1) + $*

    View Slide

  101. Carlini/Wagner: !" Attack
    100
    # $% = max
    * +,
    Z $%
    *
    − Z $%
    ,
    Nicholas Carlini, David Wagner IEEE S&P 2017
    min
    1
    ( 3
    "
    (tanh 6 + 1) − :
    "
    "
    + ; ⋅ #(3
    "
    (tanh 6 + 1) ))
    # $% = max(max
    * +,
    Z $%
    *
    − Z $%
    ,
    , −>)
    confidence parameter

    View Slide

  102. 101

    View Slide

  103. Impact of Adversarial Perturbations
    102
    Distance between layer output and its output for original seed
    Random noise
    (same amount)
    Carlini-
    Wagner L2
    CIFAR-10
    DenseNet

    View Slide

  104. Content-Space Attacks
    103
    What is there is no gradient to follow?

    View Slide

  105. Example: PDF Malware

    View Slide

  106. Finding Evasive Malware
    105
    Given seed sample, !, with desired malicious behavior
    find an adversarial example !" that satisfies:
    # !" = “&'()*(” Model misclassifies
    ℬ !′) = ℬ(! Malicious behavior preserved
    Generic attack: heuristically explore input
    space for !′ that satisfies definition.
    No requirement that ! ~ !′ except through ℬ.

    View Slide

  107. PDF Malware Classifiers
    Random Forest Random Forest
    Support Vector Machine
    Features
    Object counts,
    lengths,
    positions, …
    Object structural paths
    Very robust against “strongest
    conceivable mimicry attack”.
    Automated Features
    Manual Features
    PDFrate
    [ACSA 2012]
    Hidost16
    [JIS 2016]
    Hidost13
    [NDSS 2013]

    View Slide

  108. Variants
    Evolutionary Search
    Clone
    Benign PDFs
    Malicious PDF
    Mutation
    01011001101
    Variants
    Variants
    Select
    Variants




    Found
    Evasive?
    Benign
    Oracle
    Weilin Xu Yanjun Qi
    Fitness
    Selection
    Mutant
    Generation

    View Slide

  109. Variants
    Generating Variants
    Clone
    Benign PDFs
    Malicious PDF
    Mutation
    01011001101
    Variants
    Variants
    Select
    Variants




    Found
    Evasive?
    Fitness
    Selection
    Mutant
    Generation

    View Slide

  110. PDF Structure

    View Slide

  111. Variants
    Generating Variants
    Clone
    Benign PDFs
    Malicious PDF
    Mutation
    01011001101
    Variants
    Variants
    Select
    Variants




    Found
    Evasive?
    Fitness
    Selection
    Mutant
    Generation

    View Slide

  112. Variants
    Generating Variants
    Clone
    Benign PDFs
    Malicious PDF
    Mutation
    01011001101
    Variants
    Variants
    Select
    Variants




    Found
    Evasive?
    Found
    Evasive
    ?
    0
    /JavaScript
    eval(‘…’);
    /Root
    /Catalog
    /Pages
    Select random node
    Randomly transform: delete, insert, replace

    View Slide

  113. Variants
    Generating Variants
    Clone
    Benign PDFs
    Malicious PDF
    Mutation
    01011001101
    Variants
    Variants
    Select
    Variants
    Found
    Evasive?
    Found
    Evasive
    ?
    Select random node
    Randomly transform: delete, insert, replace
    Nodes from
    Benign PDFs
    0
    /JavaScript
    eval(‘…’);
    /Root
    /Catalog
    /Pages
    128
    546
    7
    63
    128

    View Slide

  114. Variants
    Selecting Promising Variants
    Clone
    Benign PDFs
    Malicious PDF
    Mutation
    01011001101
    Variants
    Variants
    Select
    Variants




    Found
    Evasive?
    Fitness
    Selection
    Mutant
    Generation

    View Slide

  115. Variants
    Selecting Promising Variants
    Clone
    Benign PDFs
    Malicious PDF
    Mutation
    01011001101
    Variants
    Variants
    Select
    Variants




    Found
    Evasive?
    Fitness Function
    Candidate Variant
    !(#$%&'()
    , #'(&++
    )
    Score
    Malicious
    0
    /JavaScript
    eval(‘…’);
    /Root
    /Catalog
    /Pages
    128
    Oracle
    Target Classifier

    View Slide

  116. Oracle: ℬ "′) = ℬ(" ?
    Execute candidate in
    vulnerable Adobe Reader in
    virtual environment
    Behavioral signature:
    malicious if signature matches
    https://github.com/cuckoosandbox
    Simulated network: INetSim
    Cuckoo
    HTTP_URL + HOST
    extracted from API traces

    View Slide

  117. Fitness Function
    Assumes lost malicious behavior will not be
    recovered
    !itness '′ = *
    1 − classi!ier_score '3 if ℬ '′) = ℬ('
    −∞ otherwise

    View Slide

  118. 0
    100
    200
    300
    400
    500
    0 100 200 300
    Seeds Evaded
    (out of 500)
    PDFRate
    Number of Mutations
    Hidost

    View Slide

  119. 0
    100
    200
    300
    400
    500
    0 100 200 300
    Seeds Evaded
    (out of 500)
    PDFRate
    Number of Mutations
    Hidost
    Simple
    transformations
    often worked

    View Slide

  120. 0
    100
    200
    300
    400
    500
    0 100 200 300
    Seeds Evaded
    (out of 500)
    PDFRate
    Number of Mutations
    Hidost
    (insert, /Root/Pages/Kids,
    3:/Root/Pages/Kids/4/Kids/5/)
    Works on 162/500 seeds

    View Slide

  121. 0
    100
    200
    300
    400
    500
    0 100 200 300
    Seeds Evaded
    (out of 500)
    PDFRate
    Number of Mutations
    Hidost
    Some seeds
    required complex
    transformations

    View Slide

  122. Malicious Label
    Threshold
    Original Malicious Seeds
    Evading
    PDFrate
    Classification Score
    Malware Seed (sorted by original score)
    Discovered Evasive Variants

    View Slide

  123. Discovered Evasive Variants
    Malicious Label
    Threshold
    Original Malicious Seeds
    Adjust threshold?
    Charles Smutz, Angelos
    Stavrou. When a Tree Falls:
    Using Diversity in Ensemble
    Classifiers to Identify
    Evasion in Malware
    Detectors. NDSS 2016.
    Classification Score
    Malware Seed (sorted by original score)

    View Slide

  124. Variants found with threshold = 0.25
    Variants found with threshold = 0.50
    Adjust threshold?
    Classification Score
    Malware Seed (sorted by original score)

    View Slide

  125. Variants
    Hide the Classifier Score?
    Clone
    Benign PDFs
    Malicious PDF
    Mutation
    01011001101
    Variants
    Variants
    Select
    Variants




    Found
    Evasive?
    Fitness Function
    Candidate Variant
    !(#$%&'()
    , #'(&++
    )
    Score
    Malicious
    0
    /JavaScript
    eval(‘…’);
    /Root
    /Catalog
    /Pages
    128
    Oracle
    Target Classifier

    View Slide

  126. Variants
    Binary Classifier Output is Enough
    Clone
    Benign PDFs
    Malicious PDF
    Mutation
    01011001101
    Variants
    Variants
    Select
    Variants




    Found
    Evasive?
    Fitness Function
    Candidate Variant
    !(#$%&'()
    , #'(&++
    )
    Score
    Malicious
    0
    /JavaScript
    eval(‘…’);
    /Root
    /Catalog
    /Pages
    128
    Oracle
    Target Classifier
    ACM CCS 2017

    View Slide

  127. Labelled
    Training Data
    ML
    Algorithm
    Feature
    Extraction
    Vectors
    Deployment
    Malicious / Benign
    Operational Data
    Trained Classifier
    Training
    (supervised learning)
    Retrain Classifier

    View Slide

  128. Labelled
    Training Data
    ML
    Algorithm
    Feature
    Extraction
    Vectors
    Training
    (supervised learning)
    Clone
    01011001
    101
    EvadeML
    Deployment

    View Slide

  129. 0
    100
    200
    300
    400
    500
    0 200 400 600 800
    Seeds Evaded (out of 500)
    Generations
    Hidost16
    Original classifier:
    Takes 614 generations
    to evade all seeds

    View Slide

  130. 0
    100
    200
    300
    400
    500
    0 200 400 600 800
    HidostR1
    Seeds Evaded (out of 500)
    Generations
    Hidost16

    View Slide

  131. 0
    100
    200
    300
    400
    500
    0 200 400 600 800
    HidostR1
    Seeds Evaded (out of 500)
    Generations
    Hidost16

    View Slide

  132. 0
    100
    200
    300
    400
    500
    0 200 400 600 800
    HidostR1
    HidostR2
    Seeds Evaded (out of 500)
    Generations
    Hidost16

    View Slide

  133. 0
    100
    200
    300
    400
    500
    0 200 400 600 800
    HidostR1
    HidostR2
    Seeds Evaded (out of 500)
    Generations
    Hidost16

    View Slide

  134. 0
    100
    200
    300
    400
    500
    0 200 400 600 800
    Hidost16
    Genome Contagio Benign
    Hidost16 0.00 0.00
    HidostR1 0.78 0.30
    HidostR2 0.85 0.53
    False Positive Rates
    HidostR1
    Seeds Evaded (out of 500)
    Generations
    HidostR2

    View Slide

  135. 0
    100
    200
    300
    400
    500
    0 200 400 600 800
    Hidost16
    Genome Contagio Benign
    Hidost16 0.00 0.00
    HidostR1 0.78 0.30
    HidostR2 0.85 0.53
    False Positive Rates
    HidostR1
    Seeds Evaded (out of 500)
    Generations
    HidostR2

    View Slide

  136. 135
    Only 8/6987 robust features (Hidost)
    Robust classifier
    High false positives
    /Names
    /Names /JavaScript
    /Names /JavaScript /Names
    /Names /JavaScript /JS
    /OpenAction
    /OpenAction /JS
    /OpenAction /S
    /Pages
    USENIX Security 2019

    View Slide

  137. Malware Classification Moral
    To build robust, effective malware
    classifiers need robust features that are
    strong signals for malware.
    136
    If you have features like this – don’t need ML!
    There are scenarios where adversarial
    training “works” [more tomorrow].

    View Slide

  138. Recap: Adversarial Examples across Domains
    137
    Domain Classifier Space “Reality” Space
    Trojan Wars
    Judgment of Trojans
    !(#) = “gift”
    Physical Reality
    !∗(#) = invading army
    Malware
    Malware Detector
    !(#) = “benign”
    Victim’s Execution
    !∗(#) = malicious behavior
    Image
    Classification
    DNN Classifier
    !(#) = )
    Human Perception
    !∗(#) = *

    View Slide

  139. Tomorrow:
    Defenses
    138
    David Evans
    University of Virginia
    [email protected]
    https://www.cs.virginia.edu/evans

    View Slide