Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Ensembles of Many Diverse Weak Defenses can be Strong

Pooyan Jamshidi
September 01, 2020

Ensembles of Many Diverse Weak Defenses can be Strong

Despite achieving state-of-the-art performance across many domains, machine learning systems are highly vulnerable to subtle adversarial perturbations. Although defense approaches have been proposed in recent years, many have been bypassed by even weak adversarial attacks. Previous studies showed that ensembles created by combining multiple weak defenses (i.e., input data transformations) are still weak. In this talk, I will show that it is indeed possible to construct effective ensembles using weak defenses to block adversarial attacks. However, to do so requires a diverse set of such weak defenses. Based on this motivation, I will present Athena, an extensible framework for building effective defenses to adversarial attacks against machine learning systems. I will talk about the effectiveness of ensemble strategies with a diverse set of many weak defenses that comprise transforming the inputs (e.g., rotation, shifting, noising, denoising, and many more) before feeding them to target deep neural network classifiers. I will also discuss the effectiveness of the ensembles with adversarial examples generated by various adversaries in different threat models. In the second half of the talk, I will explain why building defenses based on the idea of many diverse weak defenses works, when it is most effective, and what its inherent limitations and overhead are. Finally, I will show our recent advancement toward synthesizing effective ensemble defenses automatically by identifying complementary weak defenses over the induced space of weak defenses using a combination of search and optimization.

Pooyan Jamshidi

September 01, 2020
Tweet

More Decks by Pooyan Jamshidi

Other Decks in Research

Transcript

  1. Ensembles of Many Diverse
    Weak Defenses can be Strong
    Ying

    Meng
    Jianhai

    Su
    Forest

    Agostinelli
    Pooyan

    Jamshidi
    Jason

    O’Kane
    Biplav 

    Srivastava
    Invited Talk

    View Slide

  2. Artificial Intelligence and Systems Laboratory
    (AISys Lab)
    Machine
    Learning
    Computer
    Systems
    Autonomy
    Learning-enabled
    Autonomous Systems
    https://pooyanjamshidi.github.io/AISys/ 2

    View Slide

  3. Research Directions at AISys
    3
    Theory:

    - Transfer Learning

    - Causal Invariances

    - Structure Learning

    - Concept Learning

    - Physics-Informed


    Applications:

    - Systems

    - Autonomy

    - Robotics
    Well-known Physics
    Big Data
    Limited known Physics
    Small Data
    Causal AI
    Thanks to NASA 

    for supporting 

    our research

    View Slide

  4. So what this talk is
    about?
    The Security of

    Machine Learning
    Deep
    4

    View Slide

  5. Adversarial Examples
    [Engstrom, Tran, Tsipras, Schmidt, Madry 2018]:
    Rotation + Translation can fool classifiers
    [Athalye, Engstrom, Ilyas, Kwok 2017]:
    3D-printed model classified as rifle from most viewpoints
    [Goodfellow et al. 2014]: Imperceptible noise
    can fool DNN classifiers
    5

    View Slide

  6. Adversarial Examples (Security)
    [Sharif et al. 2016]: Glasses the fool face classifiers [Carlini et al. 2016]: Voice commands that
    are imperceptible by humans
    6

    View Slide

  7. Adversarial Examples (RL, NLP)
    [Huang et al. 2017]: Small input changes
    can decrease RL performance
    [Jia Liang 2017]: Irrelevant sentences confused
    reading comprehension systems
    7

    View Slide

  8. Should we be worried?
    8
    Probably not here! But we should be worried here!
    [Pei et al. 2017]: DeepXplore: Automated Whitebox
    Testing of Deep Learning Systems
    [Tian et al. 2017]: DeepTest: Automated Testing of
    Deep-Neural-Network-driven Autonomous Cars
    [Athalye, Engstrom, Ilyas, Kwok 2017]:
    3D-printed model classified as rifle from most viewpoints

    View Slide

  9. Where Do Adversarial Examples Come From?
    Distribution D
    θ
    Orange
    Chimpanzee
    Palm tree fθ
    fθ1
    (x, y) = palm tree
    fθ2
    (x, y) = orange
    , Orange
    (x, y) =
    Find θ* such that
    (x,y)∼D
    ℒ(θ*, x, y) Is small
    Goal of ML:
    9

    View Slide

  10. Where Do Adversarial Examples Come From?
    minθ
    ℒ(θ, x, y)
    maxδ
    ℒ(θ, x+δ, y)
    ||δ||
    p
    ≤ ϵ
    Gradient Descent
    to find good parameters
    θ
    10
    [Ilyas et al. 2019]: Adversarial Examples
    Are Not Bugs, They Are Features
    “Adversarial vulnerability is a direct result of our models’
    sensitivity to well-generalizing features in the data.”

    View Slide

  11. Athena:
    A Framework for Defending
    Machine Learning Systems
    Against Adversarial Attacks
    11

    View Slide

  12. Key idea behind our approach:
    Input transformation
    (7, 0.9) (9, 0.56) (7, 0.4)
    +δ Rotate 180
    12

    View Slide

  13. O
    riginal BIM
    _l?
    FG
    SM JSM
    A
    BIM
    _l2
    PG
    D
    DF_l2
    input
    perturbation
    com
    press
    (h & v)
    denoise
    (nl_m
    eans)
    )
    C
    W_l2
    O
    nePixel M
    IM
    Insights
    - Effectiveness of WDs varies

    - WDs complement each other

    -> A defense based on ensemble of
    WDs can be independent of
    particular type of adversarial attack

    WD: Weak Defense

    View Slide

  14. Quality and quantity of weak defenses matter
    Number of weak defenses
    Test accuracy
    0.00
    0.20
    0.40
    0.60
    0.80
    1.00
    10 20 30 40 50 60 70
    14
    Adversarial Attack: DeepFool

    View Slide

  15. Diversity of weak defenses matters
    15
    Adversarial Attack: One-Pixel
    Error Rate Undefended: 0.5588

    PGD-ADT: Adversarial Training
    Diverse
    Ensemble
    Baseline
    Defense
    Homogeneous
    Ensemble

    View Slide

  16. Diversity of weak defenses matters
    16
    Adversarial Attack: BIM_l2

    Error Rate Undefended: 0.92
    Adversarial Attack: MIM

    Error Rate Undefended: 0.94
    Adversarial Attack: PGD

    Error Rate Undefended: 0.96

    View Slide

  17. Each weak defense is essentially a model trained
    on a particular type of transformation
    17
    Train a classifier
    Ti fti
    x
    Transform x
    for all x in D
    Train a weak defense
    xti

    View Slide

  18. Athena produces the final output based on agreement
    between weak defenses at deployment time
    Ensemble of n Weak Defenses
    ft1
    Predict x by WDs
    Ensemble
    strategy
    x y
    yt1
    T1
    Ti
    Tn
    xt1
    xti
    xtn
    yti
    ytn
    fti
    ftn
    7
    7
    9
    7
    18

    View Slide

  19. Evaluation of
    Athena
    19

    View Slide

  20. Threat model: What we can assume about the
    knowledge of the adversary and its strength
    20
    Knows the parameters of
    Blackbox
    Greybox
    Zero-knowledge
    Target

    Classifier
    Weak

    Defenses
    Ensemble

    Strategy
    Existence of

    Defense
    Whitebox

    View Slide

  21. Although the effectiveness of each weak defense varies,
    Athena is able to decrease the error rate effectively
    21
    Adversarial Attack: FGSM

    Model: 28×10 Wide ResNet

    Dataset: CIFAR100
    Athena (ensemble strategy):

    - MV: Majority Voting

    - T2MV: Top-2 MV

    - AVEO: Average of Output

    - RD: Random Defense

    Baseline Defense:

    PGD-ADT: Adversarial Training

    RS: Randomized Smoothing
    Athena
    Baseline
    Defenses
    Undefended
    Model
    Tradeoff on
    benign samples

    View Slide

  22. Although the effectiveness of each weak defense varies,
    Athena is able to decrease the error rate effectively
    22
    FGSM BIM_l2 BIM_linf
    CW_l2 JSMA PGD
    Model: 28×10 Wide ResNet

    Dataset: CIFAR100

    View Slide

  23. Threat model
    23
    Knows the parameters of
    Blackbox
    Greybox
    Zero-knowledge
    Target

    Classifier
    Weak

    Defenses
    Ensemble

    Strategy
    Existence of

    Defense
    Whitebox

    View Slide

  24. Blackbox attack: The transferability-based
    approach
    24
    T
    rain a s
    u
    b
    s
    titu
    e clas
    s
    ifier
    fsub
    fens
    C
    o
    llect train
    in
    g d
    ata s
    et
    2
    1
    C
    raft ad
    v
    ers
    arial exam
    p
    les
    fo
    r th
    e s
    u
    b
    s
    titu
    te clas
    s
    ifier
    3
    Dbb
    x
    x'
    { x|x in D}
    A
    ttack th
    e en
    s
    em
    b
    le m
    o
    d
    el
    4

    View Slide

  25. Athena lowered the “transferability” of adversarial
    examples from the surrogate model to the target model
    25
    Adversarial Attack: BIM_linf

    Model: 28×10 Wide ResNet

    Dataset: CIFAR100
    Transferability Rate
    Undefended
    Model
    Athena

    View Slide

  26. Athena lowered the transferability of adversarial
    examples from the surrogate model to the target model
    26

    View Slide

  27. Athena forces the “optimization-based” blackbox attack
    to generate adversarial examples with larger perturbation
    27
    Adversarial Attack: HopSkipJump

    Model: 28×10 Wide ResNet

    Dataset: CIFAR100
    Undefended
    Model
    Athena

    View Slide

  28. Threat model
    28
    Knows the parameters of
    Blackbox
    Greybox
    Zero-knowledge
    Target

    Classifier
    Weak

    Defenses
    Ensemble

    Strategy
    Existence of

    Defense
    Whitebox

    View Slide

  29. A strong adaptive white-box adversary may be
    able to successfully bypass the defense
    29
    Athena
    Weak
    Defenses
    Undefended
    Model

    View Slide

  30. However, it becomes very easy to “detect” such
    attacks, so a defense+detection would be robust
    Detection + MV ens
    MV ens
    Detector
    Max Normalized Dissimilarity
    Detected Rate
    0.2 0.4 0.6 0.8 1.0
    1.00
    0.75
    0.50
    0.25
    0.00
    1.00
    0.75
    0.50
    0.25
    0.00
    30
    Gray-box White-box
    0.2
    0.4
    0.6
    0.8
    1.0
    0.1
    0.3
    0.5
    0.7
    0.9
    Gray-box White-box
    0.2
    0.4
    0.6
    0.8
    1.0
    0.1
    0.3
    0.5
    0.7
    0.9
    Gray-box White-box

    View Slide

  31. Also, it comes with a high cost
    31
    Dissimilarity
    Time for generating one AE (second)

    View Slide

  32. Is Athena
    a general defense?
    Will it work with different types
    of machine learning models?
    32

    View Slide

  33. Athena performs similarly well with other types of
    machine learning models (DNNs, SVMs, RF)
    33
    Adversarial Attack: FGSM

    Model: ResNet Shake-Shake reg

    Dataset: CIFAR100

    View Slide

  34. Athena is effective similarly with other types of
    models
    34
    Model: ResNet Shake-Shake reg.

    Dataset: CIFAR100
    FGSM BIM_l2
    BIM_linf CW_l2
    JSMA PGD

    View Slide

  35. However, the effectiveness of defense may vary
    depending on the type of models
    35
    Adversarial Attack: FGSM

    Model: SVM

    Dataset: MNIST
    Adversarial Attack: CW_l2

    Model: SVM

    Dataset: MNIST

    View Slide

  36. What is the
    overhead of Athena?
    - Memory

    - Inference Time
    36

    View Slide

  37. The memory overhead of Athena is linear with number of
    WDs, the inference time is on par with model inference
    37
    Ensemble of n Weak Defenses
    ft1
    Predict x by WDs
    Ensemble
    strategy
    x y
    yt1
    T1
    Ti
    Tn
    xt1
    xti
    xtn
    yti
    ytn
    fti
    ftn
    Transformation
    Time
    Inference
    Time

    View Slide

  38. Athena is:
    - Flexible

    - Extensible

    - General

    - Moderate overhead
    38

    View Slide

  39. 39
    https://arxiv.org/abs/2001.00308
    Athena is open source

    View Slide