$30 off During Our Annual Pro Sale. View Details »

Is "adversarial examples" an Adversarial Example?

Is "adversarial examples" an Adversarial Example?

Keynote talk at 1st Deep Learning and Security Workshop
May 24, 2018
co-located with the
39th IEEE Symposium on Security and Privacy
San Francisco, California

Abstract:
Over the past few years, there has been an explosion of research in
security of machine learning and on adversarial examples in
particular. Although this is in many ways a new and immature research
area, the general problem of adversarial examples has been a core
problem in information security for thousands of years. In this talk,
I'll look at some of the long-forgotten lessons from that quest and
attempt to understand what, if anything, has changed now we are in the
era of deep learning classifiers. I will survey the prevailing
definitions for "adversarial examples", argue that those definitions
are unlikely to be the right ones, and raise questions about whether
those definitions are leading us astray.

Bio:
David Evans (https://www.cs.virginia.edu/evans/) is a Professor of
Computer Science at the University of Virginia where he leads the
Security Research Group (https://www.jeffersonswheel.org). He is the author of an open computer science textbook
(http://www.computingbook.org) and a children's book on combinatorics and computability (http://www.dori-mic.org). He won the Outstanding Faculty Award from the State Council of Higher Education for Virginia, and was Program Co-Chair for the 24th ACM Conference on Computer and Communications Security (CCS 2017) and the 30th (2009) and 31st (2010) IEEE Symposia on Security and Privacy. He has SB, SM and PhD degrees in Computer Science from MIT and has been a faculty member at the University of Virginia since 1999.

David Evans

May 24, 2018
Tweet

More Decks by David Evans

Other Decks in Research

Transcript

  1. Is "adversarial
    examples" an
    Adversarial
    Example?
    David Evans
    University of Virginia
    evadeML.org
    Deep Learning and
    Security Workshop
    24 May 2018
    San Francisco, CA

    View Slide

  2. GDPR in
    effect May 25
    (tomorrow)!

    View Slide

  3. GDPR in
    effect now!

    View Slide

  4. GDPR in Effect 00:37:34
    Response Due 71:22:26
    Maximum Fine (Google) $2,120,889,281
    GDPR in Effect 00:37:35
    Response Due 71:22:25
    Maximum Fine (Google) $2,120,889,451
    “Manager’s nightmare, but a researcher’s paradise!”
    – David Basin
    GDPR in Effect 00:37:36
    Response Due 71:22:24
    Maximum Fine (Google) $2,120,889,622
    GDPR in Effect 00:37:37
    Response Due 71:22:23
    Maximum Fine (Google) $2,120,889,792
    GDPR in Effect 00:37:38
    Response Due 71:22:22
    Maximum Fine (Google) $2,120,889,962
    GDPR in Effect 00:37:39
    Response Due 71:22:21
    Maximum Fine (Google) $2,120,890,133
    GDPR in Effect 00:37:40
    Response Due 71:22:20
    Maximum Fine (Google) $2,120,890,304
    GDPR in Effect 00:37:41
    Response Due 71:22:19
    Maximum Fine (Google) $2,120,890,474
    GDPR in Effect 00:37:42
    Response Due 71:22:18
    Maximum Fine (Google) $2,120,890,645
    GDPR in Effect 00:37:43
    Response Due 71:22:17
    Maximum Fine (Google) $2,120,890,815
    GDPR in Effect 00:37:42
    Response Due 71:22:18
    Maximum Fine (Google) $2,120,890,986
    GDPR in Effect 00:37:43
    Response Due 71:22:17
    Maximum Fine (Google) $2,120,891,156
    GDPR in Effect 00:37:44
    Response Due 71:22:16
    Maximum Fine (Google) $2,120,891,327
    GDPR in Effect 00:37:45
    Response Due 71:22:15
    Maximum Fine (Google) $2,120,891,497
    GDPR in Effect 00:37:46
    Response Due 71:22:14
    Maximum Fine (Google) $2,120,891,667
    GDPR in Effect 00:37:47
    Response Due 71:22:13
    Maximum Fine (Google) $2,120,891,838
    GDPR in Effect 00:37:48
    Response Due 71:22:12
    Maximum Fine (Google) $2,120,891,838
    GDPR in Effect 00:37:49
    Response Due 71:22:11
    Maximum Fine (Google) $2,120,892,008
    GDPR in Effect 00:37:50
    Response Due 71:22:10
    Maximum Fine (Google) $2,120,892,179
    GDPR in Effect 00:37:51
    Response Due 71:22:09
    Maximum Fine (Google) $2,120,892,349
    GDPR in Effect 00:37:52
    Response Due 71:22:08
    Maximum Fine (Google) $2,120,892,520
    GDPR in Effect 00:37:53
    Response Due 71:22:07
    Maximum Fine (Google) $2,120,892,690
    GDPR in Effect 00:37:54
    Response Due 71:22:06
    Maximum Fine (Google) $2,120,892,861
    GDPR in Effect 00:37:55
    Response Due 71:22:05
    Maximum Fine (Google) $2,120,893,031
    GDPR in Effect 00:37:56
    Response Due 71:22:04
    Maximum Fine (Google) $2,120,893,202
    GDPR in Effect 00:37:57
    Response Due 71:22:03
    Maximum Fine (Google) $2,120,893,372
    GDPR in Effect 00:37:58
    Response Due 71:22:02
    Maximum Fine (Google) $2,120,893,543
    GDPR in Effect 00:37:59
    Response Due 71:22:01
    Maximum Fine (Google) $2,120,893,713
    GDPR in Effect 00:38:00
    Response Due 71:22:00
    Maximum Fine (Google) $2,120,893,884
    GDPR in Effect 00:38:01
    Response Due 71:21:59
    Maximum Fine (Google) $2,120,894,054
    GDPR in Effect 00:38:02
    Response Due 71:21:58
    Maximum Fine (Google) $2,120,894,224
    GDPR in Effect
    Response Due
    Maximum Fine (Google)
    GDPR in
    effect now!

    View Slide

  5. Article 22

    View Slide

  6. Is “adversarial
    examples” an
    Adversarial
    Example?

    View Slide

  7. 6
    Papers on “Adversarial Examples”
    (Google Scholar)
    675
    0
    200
    400
    600
    800
    1000
    1200
    2018
    (5/22)
    2017
    2016
    2015
    2014
    2013
    1241.5 papers
    expected in 2018!

    View Slide

  8. Adversarial Examples before Deep Learning
    7

    View Slide

  9. Adversarial Examples “before ML”
    Péter Ször (1970-2013)

    View Slide

  10. Adversarial Examples before “Oakland”
    9

    View Slide

  11. Adversarial Examples before “Oakland”
    10
    The crowd, uncertain, was split by opposing opinions.
    Then Laocoön rushes down eagerly from the heights of
    the citadel, to confront them all, a large crowd with
    him, and shouts from far off: ‘O unhappy citizens, what
    madness? ... Do you think the enemy’s sailed away? Or
    do you think any Greek gift’s free of treachery? Is that
    Ulysses’s reputation? Either there are Greeks in hiding,
    concealed by the wood, or it’s been built as a machine
    to use against our walls, or spy on our homes, or fall
    on the city from above, or it hides some other trick:
    Trojans, don’t trust this horse. Whatever it is, I’m afraid
    of Greeks even those bearing gifts.’
    Virgil, The Aenid (Book II)

    View Slide

  12. 11
    How should we define
    “adversarial example”?

    View Slide

  13. How should we define
    “adversarial example”?
    12
    “Adversarial examples are inputs to
    machine learning models that an
    attacker has intentionally designed to
    cause the model to make a mistake.”
    Ian Goodfellow, earlier today

    View Slide

  14. Adversarial Examples across Domains
    13
    Domain Classifier Space “Reality” Space
    Trojan Wars
    Judgment of Trojans
    !(#) = “gift”
    Physical Reality
    !∗(#) = invading army
    Malware
    Malware Detector
    !(#) = “benign”
    Victim’s Execution
    !∗(#) = malicious behavior
    Image
    Classification
    DNN Classifier
    !(#) = )
    Human Perception
    !∗(#) = *
    Later Next Not DL

    View Slide

  15. Malware Adversarial Examples
    14
    Classifier Space
    Oracle Space
    actual program execution
    https://github.com/cuckoosandbox
    Cuckoo

    View Slide

  16. “Oracle” Definition
    15
    Given seed sample, !, !" is an adversarial example iff:
    # !" = % Class is % (for malware, %= “benign”)
    ℬ !′) = ℬ(! Behavior we care about is the same
    Malware: evasive variant preserves malicious
    behavior of seed, but is classified as benign
    No requirement that ! ~ !′ except through ℬ.

    View Slide

  17. Definitions suggest Attacks
    16
    Given seed sample, !, !" is an adversarial example iff:
    # !" = % Class is % (for malware, %= “benign”)
    ℬ !′) = ℬ(! Behavior we care about is the same
    Generic attack: heuristically explore input
    space for !′ that satisfies definition.

    View Slide

  18. Variants
    Evolutionary Search
    Clone
    Benign PDFs
    Malicious PDF
    Mutation
    01011001101
    Variants
    Variants
    Select
    Variants




    Found
    Evasive?
    Benign
    Oracle
    Weilin Xu Yanjun Qi
    Fitness
    Selection
    Mutant
    Generation

    View Slide

  19. Variants
    Generating Variants
    Clone
    Benign PDFs
    Malicious PDF
    Mutation
    01011001101
    Variants
    Variants
    Select
    Variants




    Found
    Evasive?
    Fitness
    Selection
    Mutant
    Generation

    View Slide

  20. Variants
    Generating Variants
    Clone
    Benign PDFs
    Malicious PDF
    Mutation
    01011001101
    Variants
    Variants
    Select
    Variants




    Found
    Evasive?
    Found
    Evasive
    ?
    0
    /JavaScript
    eval(‘…’);
    /Root
    /Catalog
    /Pages
    Select random node
    Randomly transform: delete, insert, replace

    View Slide

  21. Variants
    Generating Variants
    Clone
    Benign PDFs
    Malicious PDF
    Mutation
    01011001101
    Variants
    Variants
    Select
    Variants
    Found
    Evasive?
    Found
    Evasive
    ?
    Select random node
    Randomly transform: delete, insert, replace
    Nodes from
    Benign PDFs
    0
    /JavaScript
    eval(‘…’);
    /Root
    /Catalog
    /Pages
    128
    546
    7
    63
    128

    View Slide

  22. Variants
    Selecting Promising Variants
    Clone
    Benign PDFs
    Malicious PDF
    Mutation
    01011001101
    Variants
    Variants
    Select
    Variants




    Found
    Evasive?
    Fitness
    Selection
    Mutant
    Generation

    View Slide

  23. Variants
    Selecting Promising Variants
    Clone
    Benign PDFs
    Malicious PDF
    Mutation
    01011001101
    Variants
    Variants
    Select
    Variants




    Found
    Evasive?
    Fitness Function
    Candidate Variant
    !(#$%&'()
    , #'(&++
    )
    Score
    Malicious
    0
    /JavaScript
    eval(‘…’);
    /Root
    /Catalog
    /Pages
    128
    Oracle
    Target Classifier

    View Slide

  24. Oracle: ℬ "′) = ℬ(" ?
    Execute candidate in
    vulnerable Adobe Reader in
    virtual environment
    Behavioral signature:
    malicious if signature matches
    https://github.com/cuckoosandbox
    Simulated network: INetSim
    Cuckoo
    HTTP_URL + HOST
    extracted from API traces

    View Slide

  25. Fitness Function
    Assumes lost malicious behavior will not be
    recovered
    !itness '′ = *
    1 − classi!ier_score '3 if ℬ '′) = ℬ('
    −∞ otherwise

    View Slide

  26. 0
    100
    200
    300
    400
    500
    0 100 200 300
    Seeds Evaded
    (out of 500)
    PDFRate
    Number of Mutations
    Hidost

    View Slide

  27. 0
    100
    200
    300
    400
    500
    0 100 200 300
    Seeds Evaded
    (out of 500)
    PDFRate
    Number of Mutations
    Hidost
    Simple
    transformations
    often worked

    View Slide

  28. 0
    100
    200
    300
    400
    500
    0 100 200 300
    Seeds Evaded
    (out of 500)
    PDFRate
    Number of Mutations
    Hidost
    (insert, /Root/Pages/Kids,
    3:/Root/Pages/Kids/4/Kids/5/)
    Works on 162/500 seeds

    View Slide

  29. 0
    100
    200
    300
    400
    500
    0 100 200 300
    Seeds Evaded
    (out of 500)
    PDFRate
    Number of Mutations
    Hidost
    Some seeds
    required complex
    transformations

    View Slide

  30. Attacks suggest Defenses*
    29
    Definitions suggest Attacks

    View Slide

  31. Attacks suggest Defenses*
    30
    * That only work against a very particular instantiation of that attack.
    Definitions suggest Attacks
    Maginot Line
    Enigma Plugboard

    View Slide

  32. Malicious Label
    Threshold
    Original Malicious Seeds
    Evading
    PDFrate
    Classification Score
    Malware Seed (sorted by original score)
    Discovered Evasive Variants

    View Slide

  33. Discovered Evasive Variants
    Malicious Label
    Threshold
    Original Malicious Seeds
    Adjust threshold?
    Charles Smutz, Angelos
    Stavrou. When a Tree Falls:
    Using Diversity in Ensemble
    Classifiers to Identify
    Evasion in Malware
    Detectors. NDSS 2016.
    Classification Score
    Malware Seed (sorted by original score)

    View Slide

  34. Variants found with threshold = 0.25
    Variants found with threshold = 0.50
    Adjust threshold?
    Classification Score
    Malware Seed (sorted by original score)

    View Slide

  35. Variants
    Hide the Classifier Score?
    Clone
    Benign PDFs
    Malicious PDF
    Mutation
    01011001101
    Variants
    Variants
    Select
    Variants




    Found
    Evasive?
    Fitness Function
    Candidate Variant
    !(#$%&'()
    , #'(&++
    )
    Score
    Malicious
    0
    /JavaScript
    eval(‘…’);
    /Root
    /Catalog
    /Pages
    128
    Oracle
    Target Classifier

    View Slide

  36. Variants
    Binary Classifier Output is Enough
    Clone
    Benign PDFs
    Malicious PDF
    Mutation
    01011001101
    Variants
    Variants
    Select
    Variants




    Found
    Evasive?
    Fitness Function
    Candidate Variant
    !(#$%&'()
    , #'(&++
    )
    Score
    Malicious
    0
    /JavaScript
    eval(‘…’);
    /Root
    /Catalog
    /Pages
    128
    Oracle
    Target Classifier
    ACM CCS 2017

    View Slide

  37. 36
    Defenses should be designed around clear
    definitions of adversary goals and capabilities,
    not around thwarting particular attacks.
    (The second oldest principle in security.)

    View Slide

  38. Adversarial Examples across Domains
    37
    Domain Classifier Space “Reality” Space
    Trojan Wars
    Judgment of Trojans
    !(#) = “gift”
    Physical Reality
    !∗(#) = invading army
    Malware
    Malware Detector
    !(#) = “benign”
    Victim’s Execution
    !∗(#) = malicious behavior
    Image
    Classification
    DNN Classifier
    !(#) = )
    Human Perception
    !∗(#) = *
    Next Done Not DL

    View Slide

  39. 38
    Adversarial Examples across Domains
    Domain Classifier Space “Reality” Space
    Trojan Wars
    Judgment of Trojans
    !(#) = “gift”
    Physical Reality
    !∗(#) = invading army
    Malware
    Malware Detector
    !(#) = “benign”
    Victim’s Execution
    !∗(#) = malicious behavior
    Image
    Classification
    DNN Classifier
    !(#) = )
    Human Perception
    !∗(#) = *
    Fixing (Breaking?) the Definition

    View Slide

  40. Image
    Classification
    DNN Classifier
    !(#) = &
    Human Perception
    !∗(#) = (
    39
    Fixing (Breaking?) the Definition

    View Slide

  41. Well-Trained Classifier
    40
    Model and visualization based on work by Beilun Wang, Ji Gao and Yanjun Qi (ICLR 2017 Workshop)
    Classifier Space
    (DNN Model)
    “Oracle” Space
    (human perception)

    View Slide

  42. Adversarial Examples
    41
    Model and visualization based on work by Beilun Wang, Ji Gao and Yanjun Qi (ICLR 2017 Workshop)
    Classifier Space
    (DNN Model)
    “Oracle” Space
    (human perception)

    View Slide

  43. Misleading Visualization
    42
    Cartoon Reality
    2 dimensions thousands of
    dimensions
    few samples near
    boundaries
    all samples near
    boundaries
    every sample
    near 1-3 classes
    every sample
    near all classes
    Classifier Space
    (DNN Model)

    View Slide

  44. Adversarial Examples
    43
    Adversary’s goal: find a small perturbation that changes class for classifier, but imperceptible to oracle.
    Classifier Space
    (DNN Model)
    “Oracle” Space
    (human perception)

    View Slide

  45. 44
    Battista Biggio, et al. ECML-KDD 2013

    View Slide

  46. “Biggio” Definition
    45
    Assumption (to map to earlier definition):
    small perturbation does not change class in “Reality Space”
    Given seed sample, !, !" is an adversarial example iff:
    # !" = % Class is % (targeted)
    ∆ !, !" ≤ ) Difference below threshold
    ∆ !, !" is defined in some (simple!) metric space:
    *+
    norm (# different), *,
    , *-
    norm (“Euclidean distance”), *.

    View Slide

  47. “Biggio” Definition
    46
    Given seed sample, !, !" is an adversarial example iff:
    # !" = % Class is % (targeted)
    ∆ !, !" ≤ ) Difference below threshold
    ∆ !, !" is defined in some (simple!) metric space:
    *+
    norm (# different), *,
    , *-
    norm (“Euclidean distance”), *.
    Problem #1: Every model with boundaries has adversarial examples.
    Problem #2: Very unnatural limit on adversary strength.
    Problem #3: Values all adversarial examples equally.

    View Slide

  48. DSML Papers
    47
    Biggio
    Definition
    (6)
    No
    Version
    On-Line
    (5)
    Oracle
    Definition
    (3)
    KFS, YKLALYP, RG
    AHHO, CW, GLSQ,
    HD, MW, SBC
    Building
    Classifiers (5)
    AMNKV, CSS,
    DAF, SHWS,
    ZCPS,
    Software
    (2)
    BGS, XLZX

    View Slide

  49. Impact of Adversarial Perturbations
    48
    Distance between layer output and its output for original seed
    FGSM
    ! = 0.0245
    CIFAR-10
    DenseNet
    95th percentile
    5th percentile

    View Slide

  50. Impact of Adversarial Perturbations
    49
    Distance between layer output and its output for original seed
    FGSM
    ! = 0.0245
    CIFAR-10
    DenseNet
    95th percentile
    5th percentile Mainuddin
    Jonas

    View Slide

  51. Impact of Adversarial Perturbations
    50
    Distance between layer output and its output for original seed
    Random noise
    (same amount)
    FGSM
    ! = 0.0245
    CIFAR-10
    DenseNet

    View Slide

  52. Impact of Adversarial Perturbations
    51
    Distance between layer output and its output for original seed
    Random noise
    (same amount)
    Carlini-
    Wagner L2
    CIFAR-10
    DenseNet

    View Slide

  53. Definitions Suggest Defenses
    52
    Given seed sample, !, !" is an adversarial example iff:
    # !" = % Class is % (targeted)
    ∆ !, !" ≤ ) Difference below threshold
    ∆ !, !" is defined in some (simple!) metric space:
    *+
    norm (# different), *,
    , *-
    norm (“Euclidean distance”), *.
    Suggested Defense: given an input !∗, see how the model behaves
    on 0(!∗) where 0(3) reverses transformations in ∆-space.

    View Slide

  54. Model
    Model
    Squeezer
    1
    Prediction0
    Prediction1
    "($%&'(
    , $%&'*
    , … , $%&',
    )
    Input
    Adversarial
    Legitimate
    Model’
    Squeezer
    k

    Predictionk
    Feature Squeezing Detection Framework
    Weilin Xu Yanjun Qi

    View Slide

  55. Model
    Model
    Squeezer
    1
    Prediction0
    Prediction1
    "($%&'(
    , $%&'*
    , … , $%&',
    )
    Input
    Adversarial
    Legitimate
    Model’
    Squeezer
    k

    Predictionk
    Feature Squeezing Detection Framework
    Feature Squeezer coalesces similar inputs into one point:
    • Barely change legitimate inputs.
    • Destruct adversarial perturbations.

    View Slide

  56. Coalescing by Feature Squeezing
    55
    Metric Space 1: Target Classifier Metric Space 2: “Oracle”
    Before: find a small perturbation that changes class for classifier, but imperceptible to oracle.
    Now: change class for both original and squeezed classifier, but imperceptible to oracle.

    View Slide

  57. Example Squeezer: Bit Depth Reduction
    0
    0.1
    0.2
    0.3
    0.4
    0.5
    0.6
    0.7
    0.8
    0.9
    1
    0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
    3-bit
    1-bit
    8-bit
    Input
    Output
    56
    Signal Quantization

    View Slide

  58. Example Squeezer: Bit Depth Reduction
    0
    0.1
    0.2
    0.3
    0.4
    0.5
    0.6
    0.7
    0.8
    0.9
    1
    0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
    3-bit
    1-bit
    8-bit
    Input
    Output
    57
    Signal Quantization
    Seed
    1 1 4 2 2
    1 1 1 1 1
    CW
    2
    CW

    BIM
    FGSM

    View Slide

  59. Other Potential Squeezers
    58
    C Xie, et al. Mitigating Adversarial Effects Through Randomization, ICLR 2018.
    J Buckman, et al. Thermometer Encoding: One Hot Way To Resist Adversarial
    Examples, ICLR 2018.
    D Meng and H Chen, MagNet: a Two-Pronged Defense against Adversarial
    Examples, CCS 2017; A Prakash, et al., Deflecting Adversarial Attacks with Pixel
    Deflection, CVPR 2018;...
    Thermometer Encoding (learnable bit depth reduction)
    Image denoising using autoencoder, wavelet, JPEG, etc.
    Image resizing
    ...
    Spatial Smoothers: median filter, non-local means

    View Slide

  60. “Feature Squeezing” (Vacuous) Conjecture
    For any distance-limited adversarial method,
    there exists some feature squeezer that
    accurately detects its adversarial examples.
    59
    Intuition: if the perturbation is small (in some simple
    metric space), there is some squeezer that coalesces
    original and adversarial example into same sample.

    View Slide

  61. Feature Squeezing Detection
    Model
    (7-layer
    CNN)
    Model
    Model
    Bit Depth-
    1
    Median
    2×2
    Prediction0
    Prediction1
    Prediction2
    Yes
    Input
    Adversarial
    No
    Legitimate
    max '(
    )*
    , )(
    , '(
    )*
    , )2 > -

    View Slide

  62. Detecting Adversarial Examples
    Distance between original input and its squeezed version
    Adversarial
    inputs
    (CW attack)
    Legitimate
    inputs

    View Slide

  63. 62
    0
    200
    400
    600
    800
    0.0 0.4 0.8 1.2 1.6 2.0
    Number of Examples
    Legitimate
    Adversarial
    Maximum !"
    distance between original and squeezed input
    threshold = 0.0029
    detection: 98.2%, FP < 4%
    Training a detector
    (MNIST)
    set the detection threshold to keep
    false positive rate below target

    View Slide

  64. ImageNet Configuration
    Model
    (MobileNet)
    Model
    Model
    Bit Depth-
    5
    Median
    2×2
    Prediction0
    Prediction1
    Prediction2
    Yes
    Input
    Adversarial
    No
    Legitimate
    max(()
    (*+
    , {*)
    , *.
    , */
    }) > 3
    Model
    Non-local
    Mean
    Prediction3

    View Slide

  65. 64
    0
    20
    40
    60
    80
    100
    120
    140
    0.0 0.4 0.8 1.2 1.6 2.0
    Legitimate
    Adversarial
    Maximum !"
    distance between original and squeezed input
    threshold = 1.24
    detection: 85%, FP < 5%
    Training a detector
    (ImageNet)

    View Slide

  66. How should we
    evaluate defenses?
    65

    View Slide

  67. Threat Models
    Oblivious attack: The adversary has full knowledge of
    the target model, but is not aware of the detector.
    Adaptive attack: The adversary has full knowledge of
    the target model and the detector.
    66

    View Slide

  68. (Generic) Adaptive Adversary
    Adaptive CW
    2
    attack, unbounded adversary:
    Warren He, James Wei, Xinyun Chen, Nicholas Carlini,
    Dawn Song, Adversarial Example Defense: Ensembles of
    Weak Defenses are not Strong, USENIX WOOT’17.
    !"#"!"$% & '( − * + , ∗ Δ ', '( + 0 ∗ 12
    3456%('′)
    67
    Misclassification term Distance term Detection term

    View Slide

  69. Adaptive Adversarial Examples
    68
    No successful adversarial examples were found
    for images originally labeled as 3 or 8.
    Mean L2
    2.80
    4.14
    4.67
    Attack
    Untargeted
    Targeted
    (next)
    Targeted
    (least likely)

    View Slide

  70. Adaptive Adversary Success Rates
    69
    0.68
    0.06
    0.01
    0.44
    0.01
    0.24
    0.0
    0.1
    0.2
    0.3
    0.4
    0.5
    0.6
    0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
    Adversary’s Success Rate
    Clipped ε
    Targeted
    (Next)
    Targeted
    (LL)
    Untargeted
    Unbounded
    Typical !

    View Slide

  71. Revisiting Attacker’s Goal
    Find one adversarial example Find many adversarial examples
    Suya Yuan Tian

    View Slide

  72. Attacker Visibility
    “White-box attacker”
    Knows model architecture and all parameters
    “Black-box attacker”
    Interacts with model through API
    Limited number of interactions
    Output is vector
    decision-based: output is just class
    “bird”, 0.09
    “horse”, 0.84
    ...

    View Slide

  73. Black-Box Cost Variance
    72
    CIFAR-10
    MNIST
    Number of Queries
    Number of Adversarial Examples
    Pin-Yu Chen, Huan Zhang,
    Yash Sharma, Jinfeng Yi,
    Cho-Jui Hsieh. ZOO: Zeroth
    Order Optimization Based
    Black-box Attacks to Deep
    Neural Networks without
    Training Substitute
    Models. AISec 2017.
    (1000 queries per
    iteration, 256 max iters)
    fails for 14
    117,820 Average for MNIST
    60,378
    CIFAR-10 overall average
    15,795 (26%)
    CIFAR-10 lowest-cost 20 ave:
    50,240
    Average for
    lowest 20
    Target:
    least-likely class
    max !
    2
    = 3
    256,000

    View Slide

  74. Easy and Hard Examples
    “Easy”
    “Hard”
    “Easy” images: 5 with fewest number of queries needed to find adversarial example
    “Hard” images: 5 with highest number of queries (failed)
    256,000 query attempts without success
    14,592
    0 → (least likely) 1
    43,008 43,776 49,152 49,920

    View Slide

  75. Easy and Hard Examples
    “Easy”
    “Hard”
    “Easy” images: 5 with fewest number of queries needed to find adversarial example
    “Hard” images: 5 with highest number of queries (failed)
    256,000 query attempts without success
    14,592
    0 → (least likely) 1 “airplane” → “frog”
    43,008 43,776 49,152 49,920 9,728 10,496 10,752 12,288 13,824
    256,000 query attempts without success

    View Slide

  76. White-Box Cost Variance
    75
    CIFAR-10
    MNIST
    Number of Iterations
    Number of Adversarial Examples
    Carlini-Wagner L
    2
    Attack
    82 CIFAR-10 average
    Target:
    least-likely class
    MNIST: max !
    2
    = 3.0
    CIFAR-10: max !
    2
    = 1.0
    2000
    566 Average for MNIST
    174
    Average for
    lowest 20

    View Slide

  77. White-Box Cost Variance
    76
    CIFAR-10
    MNIST
    Number of Iterations
    Number of Adversarial Examples
    Carlini-Wagner L
    2
    Attack
    Target:
    least-likely class
    MNIST: max !
    2
    = 3.0
    CIFAR-10: max !
    2
    = 1.0
    2000
    566
    174
    Average for
    lowest 20
    CIFAR-10 lowest 20 (average: 3.6)
    82 CIFAR-10 average

    View Slide

  78. How does
    cost-variance
    impact attack
    cost?
    77

    View Slide

  79. 78
    CIFAR-10
    Average queries per AE found (× 10$)
    Random target selection
    Greedy heuristic
    Oracle Optimal
    Simple Greedy Search Works Well
    Number of Adversarial Examples
    MNIST
    Number of Adversarial Examples
    ZOO Black-Box Attack
    Target: 20 MNIST CIFAR
    Greedy/Optimal 1.50 1.30
    Random/Optimal 2.37 3.86
    Target: 50 MNIST CIFAR
    Greedy/Optimal 1.46 1.21
    Random/Optimal 1.96 2.45

    View Slide

  80. White-Box Batch Attack Cost
    79
    Random target
    selection
    Greedy heuristic
    Oracle
    Optimal
    CIFAR-10
    Average iterations per AE found
    Number of Adversarial Examples
    MNIST
    Number of Adversarial Examples
    CW L
    2
    Attack
    Target: 20 MNIST CIFAR
    Greedy/Optimal 2.01 1.22
    Random/Optimal 3.20 20.05
    Target: 50 MNIST CIFAR
    Greedy/Optimal 1.76 1.50
    Random/Optimal 2.45 15.11

    View Slide

  81. Madry Defense
    80
    Accuracy
    “9”
    “7”
    “0”
    Batch (10 samples, sorted by initial distance)
    MNIST
    airplane
    cars
    deer
    CIFAR-10
    Aleksander Madry,
    Aleksandar Makelov,
    Ludwig Schmidt, Dimitris
    Tsipras, Adrian Vladu.
    Towards Deep Learning
    Models Resistant to
    Adversarial Attacks.
    https://github.com/Madr
    yLab/mnist_challenge

    View Slide

  82. History of the
    destruction
    of Troy, 1498
    Conclusion

    View Slide

  83. Security State-of-the-Art
    Attack success
    probability
    Threat models Proofs
    Cryptography !"#!$
    information
    theoretic, resource
    bounded
    required
    System Security !"%!
    capabilities,
    motivations,
    rationality
    common
    Adversarial
    Machine Learning
    !&; !"#*
    white-box,
    black-box
    making
    progress?
    82

    View Slide

  84. 83
    Ali Rahimi
    NIPS Test-of-Time Award Speech
    (Dec 2017)
    ”If you're building photo-
    sharing systems alchemy
    is okay but we're beyond
    that; now we're building
    systems that govern
    healthcare and mediate
    our civic dialogue”

    View Slide

  85. 84
    Ali Rahimi
    NIPS Test-of-Time Award Speech
    (Dec 2017)
    ”If you're building photo-
    sharing systems alchemy
    is okay but we're beyond
    that; now we're building
    systems that govern
    healthcare and mediate
    our civic dialogue”

    View Slide

  86. Alchemy (~700 − 1660)
    Well-defined, testable goal (turn
    lead into gold)
    Established theory (four elements:
    earth, fire, water, air)
    Methodical experiments and lab
    techniques (Jabir ibn Hayyan in
    8th century)
    Wrong and ultimately unsuccessful,
    but led to modern chemistry.

    View Slide

  87. 86
    Domain Classifier Space “Reality” Space
    Trojan Wars
    Judgment of Trojans
    !(#) = “gi=”
    Physical Reality
    !∗(#) = invading army
    Malware
    Malware Detector
    !(#) = “benign”
    Victim’s Execution
    !∗(#) = malicious behavior
    Image
    Classification
    DNN Classifier
    !(#) = )
    Human Perception
    !∗(#) = *

    View Slide

  88. 87
    Domain Classifier Space “Reality” Space
    Trojan Wars
    Judgment of Trojans
    !(#) = “gift”
    Physical Reality
    !∗(#) = invading army
    Malware
    Malware Detector
    !(#) = “benign”
    Victim’s Execution
    !∗(#) = malicious behavior
    Image
    Classification
    DNN Classifier
    !(#) = )
    Human Perception
    !∗(#) = *
    Academic
    Research
    Conferences, Fun
    !(+,s) = “awesome”
    Systems, Society, Ideas
    !∗ +,s = ?

    View Slide

  89. David Evans
    University of Virginia
    [email protected]
    EvadeML.org
    Weilin Xu Yanjun Qi Fnu Suya Yuan Tian Mainuddin Jonas
    Funding: NSF, Intel

    View Slide

  90. 89

    View Slide

  91. David Evans
    University of Virginia
    [email protected]
    EvadeML.org
    Weilin Xu Yanjun Qi Fnu Suya Yuan Tian Mainuddin Jonas
    Funding: NSF, Intel

    View Slide

  92. 91
    @_youhadonejob1

    View Slide

  93. 92

    View Slide