Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Lessons from the Last 3000 Years of Adversarial Examples

Lessons from the Last 3000 Years of Adversarial Examples

Invited Talk at Huawei Strategy and Technology Workshop (STW)
Shenzhen, China
15 May 2018

David Evans

May 15, 2018
Tweet

More Decks by David Evans

Other Decks in Research

Transcript

  1. Lessons from the
    Last 3000 Years
    of Adversarial
    Examples
    David Evans
    University of Virginia
    evadeML.org
    Huawei STW
    Shenzhen, China
    15 May 2018

    View Slide

  2. Machine Learning Does Amazing Things
    1

    View Slide

  3. … and can solve all Security Problems!
    Fake
    Spam
    IDS
    Malware
    Fake Accounts

    “Fake News”

    View Slide

  4. Labelled
    Training Data
    ML
    Algorithm
    Feature
    Extraction
    Vectors
    Deployment
    Malicious / Benign
    Operational Data
    Trained Classifier
    Training
    (supervised learning)

    View Slide

  5. ML
    Algorithm
    Feature
    Extraction
    Vectors
    Deployment
    Malicious / Benign
    Operational Data
    Training
    (supervised learning)
    Assumption: Training Data is Representative
    Trained Classifier
    Labelled
    Training Data

    View Slide

  6. Adversaries Don’t Cooperate
    Assumption: Training Data is Representative
    Deployment
    Training

    View Slide

  7. Deployment
    Adversaries Don’t Cooperate
    Assumption: Training Data is Representative
    Training
    Poisoning

    View Slide

  8. Adversaries Don’t Cooperate
    Assumption: Training Data is Representative
    Evading
    Deployment
    Training

    View Slide

  9. Adversarial Examples
    8
    0.007 × [&'()*]
    + =
    “panda” “gibbon”
    Example from: Ian J. Goodfellow, Jonathon Shlens, Christian Szegedy.
    Explaining and Harnessing Adversarial Examples. 2014 (in ICLR 2015)

    View Slide

  10. 9
    Papers on “Adversarial Examples” (Google Scholar)
    0
    200
    400
    600
    800
    1000
    1200
    2018 (5/13)
    2017
    2016
    2015
    2014
    2013
    654

    View Slide

  11. Adversarial Examples before Deep Learning
    10

    View Slide

  12. Evasive Malware
    Péter Ször (1970-2013)

    View Slide

  13. 12
    History of the
    destruction of
    Troy, 1498
    Trojans, lumber
    into the city, heart
    fear any
    clever/packed
    ambush of the
    Argives.
    Homer, The Illiad
    (~1200 BCE)

    View Slide

  14. Adversarial Examples across Domains
    13
    Domain Classifier Space Oracle Space
    Trojan Wars
    Judgment of Trojans
    !(#) = “gi>”
    Physical Reality
    !∗(#) = invading army
    Malware
    Malware Detector
    !(#) = “benign”
    Victim’s Execution
    !∗(#) = malicious behavior
    Image
    Classification
    DNN Classifier
    !(#) = )
    Human Perception
    !∗(#) = *

    View Slide

  15. Adversarial Examples across Domains
    14
    Domain Classifier Space Oracle Space
    Trojan Wars
    Judgment of Trojans
    !(#) = “gi@”
    Physical Reality
    !∗(#) = invading army
    Malware
    Malware Detector
    !(#) = “benign”
    Victim’s Execution
    !∗(#) = malicious behavior
    Image
    Classification
    DNN Classifier
    !(#) = )
    Human Perception
    !∗(#) = *
    Today

    View Slide

  16. Adversarial Examples across Domains
    15
    Domain Classifier Space Oracle Space
    Trojan Wars
    Judgment of Trojans
    !(#) = “gi@”
    Physical Reality
    !∗(#) = invading army
    Malware
    Malware Detector
    !(#) = “benign”
    Victim’s Execution
    !∗(#) = malicious behavior
    Image
    Classification
    DNN Classifier
    !(#) = )
    Human Perception
    !∗(#) = *
    Today
    Fixing (Breaking?) the Definition

    View Slide

  17. Adversarial Examples across Domains
    16
    Trojan Wars !(#) = “gi=” !∗(#) = invading army
    Malware
    Malware Detector
    !(#) = “benign”
    Victim’s Execution
    !∗(#) = malicious behavior
    Image
    Classification
    DNN Classifier
    !(#) = )
    Human Perception
    !∗(#) = *
    Fixing (Breaking?) the Definition

    View Slide

  18. Goal of Machine Learning Classifier
    17
    Classifier Space
    (DNN Model)
    “Oracle” Space
    (human perception)
    Model and visualization based on work by Beilun Wang, Ji Gao and Yanjun Qi (ICLR 2017 Workshop)

    View Slide

  19. Well-Trained Classifier
    18
    Model and visualization based on work by Beilun Wang, Ji Gao and Yanjun Qi (ICLR 2017 Workshop)
    Classifier Space
    (DNN Model)
    “Oracle” Space
    (human perception)

    View Slide

  20. Adversarial Examples
    19
    Model and visualization based on work by Beilun Wang, Ji Gao and Yanjun Qi (ICLR 2017 Workshop)
    Classifier Space
    (DNN Model)
    “Oracle” Space
    (human perception)

    View Slide

  21. Misleading Visualization
    20
    Cartoon Reality
    2 dimensions
    thousands of
    dimensions
    few samples near
    boundaries
    all samples near
    boundaries
    every sample near
    1-3 classes
    every sample near
    all classes
    Classifier Space
    (DNN Model)

    View Slide

  22. Adversarial Examples
    21
    Adversary’s goal: find a small perturbation that changes class for classifier, but imperceptible to oracle.
    Classifier Space
    (DNN Model)
    “Oracle” Space
    (human perception)

    View Slide

  23. Adversarial Examples Definition
    22
    Given seed sample, !, !" is an adversarial example iff:
    # !" = % Class is % (targeted)
    ∆ !, !" ≤ ) Difference below threshold
    ∆ !, !" is defined in some (simple!) metric space:
    *+ norm (# different), *,, *-norm (“Euclidean distance”), *.
    Assumption (to map to earlier definition):
    small perturbation does not change class in Oracle space

    View Slide

  24. !"
    Adversary: Fast Gradient Sign
    23
    original 0.1 0.2 0.3 0.4 0.5
    Adversary Power: #
    !" adversary power: max(()
    −()
    +) < #
    ()
    + = ()
    − # ⋅ sign(∇loss7
    (())

    View Slide

  25. Many Other Adversarial Methods
    24
    +
    “1”
    100% confidence
    “4”
    100%
    =
    + “2”
    99.9%
    =
    +
    “2”
    83.8%
    =
    BIM ("#
    )
    JSMA ("%
    ) CW
    2
    ("&
    )
    Original Perturbation Adversarial Examples

    View Slide

  26. Impact of Adversarial Perturbations
    25
    95th percentile
    5th percentile CIFAR-10
    DenseNet
    FGSM attack
    ε = 0.1

    View Slide

  27. Impact of Adversarial Perturbations
    26
    FGSM Attack
    Perturbation
    Random
    Perturbation

    View Slide

  28. Defense Strategies
    1. Hide the gradients
    27

    View Slide

  29. Defense Strategies
    1. Hide the gradients
    − Transferability results
    28

    View Slide

  30. Defense Strategies
    1. Hide the gradients
    − Transferability results
    − Clever adversaries can still find adversarial examples
    29

    View Slide

  31. Defense Strategies
    1. Hide the gradients
    − Transferability results
    − Clever adversaries can still find adversarial examples
    2. Build a robust classifier
    − Adversarial retraining, increasing model capacity, etc.
    30

    View Slide

  32. Defense Strategies
    1. Hide the gradients
    − Transferability results
    − Clever adversaries can still find adversarial examples
    2. Build a robust classifier
    − Adversarial retraining, increasing model capacity, etc.
    − If we could build a perfect model, we would!
    31

    View Slide

  33. Defense Strategies
    1. Hide the gradients
    − Transferability results
    − Clever adversaries can still find adversarial examples
    2. Build a robust classifier
    − Adversarial retraining, increasing model capacity, etc.
    − If we could build a perfect model, we would!
    32
    Our strategy: “Feature Squeezing”
    reduce search space available to the adversary
    Weilin Xu Yanjun Qi

    View Slide

  34. Model
    Model
    Squeezer
    1
    Prediction0
    Prediction1
    "($%&'(
    , $%&'*
    , … , $%&',
    )
    Yes
    Input
    Adversarial
    No
    Legitimate
    Model’
    Squeezer
    k

    Predictionk
    Detection Framework

    View Slide

  35. Model
    Model
    Squeezer
    1
    Prediction0
    Prediction1
    "($%&'(
    , $%&'*
    , … , $%&',
    )
    Yes
    Input
    Adversarial
    No
    Legitimate
    Model’
    Squeezer
    k

    Predictionk
    Detection Framework
    Feature Squeezer coalesces similar inputs into one pointL:
    • Little change for legitimate inputs.
    • Destruct adversarial perturbations.

    View Slide

  36. Bit Depth Reduction
    0
    0.1
    0.2
    0.3
    0.4
    0.5
    0.6
    0.7
    0.8
    0.9
    1
    0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
    3-bit
    1-bit
    8-bit
    Reduce to 1-bit
    !"
    = round(!"
    ×2)/2
    Reduce to 1-bit
    !"
    = round(!"
    ×2)/2
    [0.312 0.271 …… 0.159 0.651]
    X*
    [0.012 0.571 …… 0.159 0.951]
    X
    Input
    Output
    35
    [0. 1. …… 0. 1. ]
    [0. 0. …… 0. 1. ]
    Signal Quantization
    Adversarial Example
    Normal Example

    View Slide

  37. Bit Depth Reduction
    36
    Seed
    1 1 4 2 2
    1 1 1 1 1
    CW
    2
    CW

    BIM
    FGSM

    View Slide

  38. Spatial Smoothing: Median Filter
    Replace a pixel with median of its neighbors.
    Effective in eliminating ”salt-and-pepper” noise (!"
    attacks)
    37
    Image from https://sultanofswing90.wordpress.com/tag/image-processing/
    3×3 Median Filter

    View Slide

  39. Spatial Smoothing: Non-local Means
    Replace a patch with weighted mean of similar patches (in region).
    38
    !
    "#
    "$
    !% = ' ((!, "+
    )×"+
    Preserves edges, while removing noise.

    View Slide

  40. 39
    Airplane
    94.4%
    Truck
    99.9%
    Automobile
    56.5%
    Airplane
    98.4%
    Airplane
    99.9%
    Ship
    46.0%
    Airplane
    98.3%
    Airplane
    80.8%
    Airplane
    70.0%
    Median Filter
    (2×2)
    Non-local Means
    (13-3-4)
    Original BIM (L

    ) JSMA (L
    0
    )

    View Slide

  41. Other Potential Squeezers
    40
    C Xie, et al. Mitigating Adversarial Effects Through Randomization, ICLR 2018.
    J Buckman, et al. Thermometer Encoding: One Hot Way To Resist Adversarial
    Examples, ICLR 2018.
    D Meng and H Chen, MagNet: a Two-Pronged Defense against Adversarial Examples,
    in CCS 2017; A Prakash, et al., Deflecting Adversarial Attacks with Pixel Deflection,
    CVPR 2018;...
    Thermometer Encoding (learnable bit depth reduction)
    Image denoising using autoencoder, wavelet, JPEG, etc.
    Image resizing
    ...

    View Slide

  42. “Feature Squeezing” (Vacuous) Conjecture
    For any distance-limited adversarial method, there
    exists some feature squeezer that accurately
    detects its adversarial examples.
    41
    Intuition: if the perturbation is small (in some simple
    metric space), there is some squeezer that coalesces
    original and adversarial example into same sample.

    View Slide

  43. Feature Squeezing Detection
    Model
    (7-layer
    CNN)
    Model
    Model
    Bit Depth-
    1
    Median
    2×2
    Prediction0
    Prediction1
    Prediction2
    Yes
    Input
    Adversarial
    No
    Legitimate
    max '(
    )*
    , )(
    , '(
    )*
    , )2 > -

    View Slide

  44. Detecting Adversarial Examples
    43
    Distance between original input and its squeezed version
    Adversarial
    inputs
    (CW attack)
    Legitimate
    inputs

    View Slide

  45. 44
    0
    200
    400
    600
    800
    0.0 0.4 0.8 1.2 1.6 2.0
    Number of Examples
    Legitimate
    Adversarial
    Maximum !"distance between original and squeezed input
    threshold = 0.0029
    detection: 98.2%, FP < 4%
    Training a detector
    (MNIST)
    set the detection threshold to keep
    false positive rate below target

    View Slide

  46. Aggregated Detection Results
    Dataset Squeezers Threshold
    False
    Positive
    Rate
    Detection
    Rate
    (SAEs)
    ROC-AUC
    Exclude
    FAEs
    MNIST
    Bit Depth (1-bit),
    Median (2x2)
    0.0029 3.98% 98.2% 99.44%
    CIFAR-10
    Bit Depth (5-bit),
    Median (2x2),
    Non-local Mean (13-3-2)
    1.1402 4.93% 84.5% 95.74%
    ImageNet
    Bit Depth (5-bit),
    Median (2x2),
    Non-local Mean (11-3-4)
    1.2128 8.33% 85.9% 94.24%
    45

    View Slide

  47. Threat Models
    Oblivious attack: The adversary has full knowledge of the
    target model, but is not aware of the detector.
    Adaptive attack: The adversary has full knowledge of the
    target model and the detector.
    46

    View Slide

  48. Adaptive Adversary
    Adaptive CW2
    attack, unbounded adversary:
    Warren He, James Wei, Xinyun Chen, Nicholas Carlini, Dawn Song,
    Adversarial Example Defense: Ensembles of Weak Defenses are not
    Strong, USENIX WOOT’17.
    !"#"!"$% & '( − * + , ∗ Δ ', '( + 0 ∗ 12
    3456%('′)
    47
    Misclassification term Distance term Detection term

    View Slide

  49. Adaptive Adversarial Examples
    48
    No successful adversarial examples were found for
    images originally labeled as 3 or 8.
    Mean L
    2
    2.80
    4.14
    4.67
    Attack
    Untargeted
    Targeted
    (next)
    Targeted
    (least likely)

    View Slide

  50. Adaptive Adversary Success Rates
    49
    0.68
    0.06
    0.01
    0.44
    0.01
    0.24
    0.0
    0.1
    0.2
    0.3
    0.4
    0.5
    0.6
    0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
    Adversary’s Success Rate
    Clipped ε
    Targeted
    (Next)
    Targeted (LL)
    Untargeted
    Unbounded
    Typical !

    View Slide

  51. Security State-of-the-Art
    Attack success
    probability
    Threat models Proofs
    Cryptography !"#!$
    information
    theoretic, resource
    bounded
    required
    System Security !"%!
    capabilities,
    motivations,
    rationality
    common
    Adversarial
    Machine Learning
    !&; !"##*
    white-box,
    black-box
    rare!
    50

    View Slide

  52. Revisiting Attacker’s Goal
    Find one adversarial example Find many adversarial examples
    Suya Yuan Tian

    View Slide

  53. Attacker Visibility
    “White-box attacker”
    Knows model architecture and all parameters
    “Black-box attacker”
    Interacts with model through API
    Limited number of interactions
    Output is vector
    decision-based: output is just class
    “bird”, 0.09
    “horse”, 0.84
    ...

    View Slide

  54. Black-Box Batch Attacker
    0 50 100 150
    Images
    0
    1
    2
    3
    4
    5
    6
    7
    8
    Number of Queries
    10 5
    Effort (number of
    model interactions)
    to find adversarial
    example varies by
    seed
    most only require a few thousand queries
    ZOO-attack on MNIST
    a few require
    > 10x more
    effort

    View Slide

  55. Easy and Hard Examples
    “Easy”
    “Hard”
    1024 1280 1536 2560 2816
    “Easy” images: 5 with fewest number of queries needed to find adversarial example
    “Hard” images: 5 with highest number of queries needed (or failed)
    138,240 101,376 97,792 75,008 71,424
    768,000 query attempts without success
    4608 6912 12,800 13,568 14,336
    2 → 7 “bird” → “horse”

    View Slide

  56. Greedy Search Works
    0 50 100 150
    Number of Images Selected
    0
    2
    4
    6
    8
    10
    12
    Average Number of Queries
    10 4
    Greedy Search
    Random Search
    Retroactive Optimal
    MNIST CIFAR-10
    0 50 100 150
    Number of Images Selected
    0
    0.5
    1
    1.5
    2
    2.5
    3
    Average Number of Queries
    10 4
    Greedy Search
    Random Search
    Retroactive Optimal

    View Slide

  57. Conclusions
    Domain Expertise still matters
    Machine learning models designed
    without domain knowledge will not
    be robust against motivated
    adversaries
    Immature, but fun and active
    research area
    Need to make progress toward
    meaningful threat models,
    robustness measures, verifiable
    defenses Workshop to be held at DSN
    (Luxembourg, 25 June)
    Workshop to be held at IEEE S&P
    (San Francisco, 24 May)

    View Slide

  58. David Evans
    University of Virginia
    [email protected]
    EvadeML.org
    Weilin Xu Yanjun Qi Suya Yuan Tian Mainuddin Jonas
    Funding: NSF, Intel

    View Slide