$30 off During Our Annual Pro Sale. View Details »

Architecture Support for Robust Deep Learning: Exploiting Software 1.0 Techniques to Defend Software 2.0

Yuhao Zhu
October 28, 2020

Architecture Support for Robust Deep Learning: Exploiting Software 1.0 Techniques to Defend Software 2.0

Yuhao Zhu

October 28, 2020
Tweet

More Decks by Yuhao Zhu

Other Decks in Research

Transcript

  1. Yuhao Zhu
    http://horizon-lab.org
    Architecture Support for Robust
    Deep Learning: Exploiting Software 1.0
    Techniques to Defend Software 2.0

    View Slide

  2. 2
    Algorithm &
    Software
    Hardware
    Architecture
    Architecture + X
    Web & Cloud
    ▹ [ISCA 2019]
    ▹ [PLDI 2016]
    ▹ [HPCA 2015]
    ▹ [HPCA 2013]
    ▹ [HPCA 2016]
    ▹ [MICRO 2015]
    ▹ [ISCA 2014]
    Robust ML
    ▹ [CVPR 2020]
    ▹ [CVPR 2019]
    ▹ [ICLR 2019]
    ▹ [MICRO 2020]
    Visual Computing
    ▹ [MICRO 2020]
    ▹ [ISCA 2019]
    ▹ [ISCA 2018]
    ▹ [IROS 2020]
    ▹ [CVPR 2019]
    ▹ [MICRO 2019]
    ▹ [FPGA 2020]

    View Slide

  3. 2
    Algorithm &
    Software
    Hardware
    Architecture
    Architecture + X
    Web & Cloud
    ▹ [ISCA 2019]
    ▹ [PLDI 2016]
    ▹ [HPCA 2015]
    ▹ [HPCA 2013]
    ▹ [HPCA 2016]
    ▹ [MICRO 2015]
    ▹ [ISCA 2014]
    Robust ML
    ▹ [CVPR 2020]
    ▹ [CVPR 2019]
    ▹ [ICLR 2019]
    ▹ [MICRO 2020]
    Visual Computing
    ▹ [MICRO 2020]
    ▹ [ISCA 2019]
    ▹ [ISCA 2018]
    ▹ [IROS 2020]
    ▹ [CVPR 2019]
    ▹ [MICRO 2019]
    ▹ [FPGA 2020]

    View Slide

  4. Deep Learning Isn’t Robust
    3
    “Hidden person”
    Object mis-detection
    Person
    https://www.vox.com/2017/9/12/16294510/fatal-tesla-crash-self-driving-elon-musk-autopilot

    View Slide

  5. Adversarial Robustness
    4
    https://blog.csiro.au/vaccinating-machine-learning-against-attacks/
    Cx = f(x) != f(x+Δx)

    View Slide

  6. Adversarial Robustness
    4
    https://blog.csiro.au/vaccinating-machine-learning-against-attacks/
    Cx = f(x) != f(x+Δx)
    Efficiency
    (inference-time)
    +
    Accuracy

    View Slide

  7. Software 1.0 vs. Software 2.0
    5
    A
    I
    G H
    F
    C
    E
    B
    D
    r=0
    count[r]++
    r=0;
    count[r]++
    1
    2 4
    A
    I
    G H
    F
    C
    E
    B
    D
    1
    2
    0
    0
    0 3
    0
    2
    8
    (a) (b)
    Apply th
    (Section
    increme
    The dumm
    , whic
    count. The du
    responds to r
    backedge. T
    corre
    the backedge
    Figure 10
    and edge val
    edges. As a
    guish the fou
    Software 2.0: Neural networks
    as self-written programs.
    Software 1.0: Explicit instructions
    with explicit logics.
    Efficient Path Profiling, MICRO 1996

    View Slide

  8. Software 1.0 vs. Software 2.0
    6
    A
    I
    G H
    F
    C
    E
    B
    D
    r=0
    count[r]++
    r=0;
    count[r]++
    1
    2 4
    A
    I
    G H
    F
    C
    E
    B
    D
    1
    2
    0
    0
    0 3
    0
    2
    8
    (a) (b)
    Apply th
    (Section
    increme
    The dumm
    , whic
    count. The du
    responds to r
    backedge. T
    corre
    the backedge
    Figure 10
    and edge val
    edges. As a
    guish the fou
    Software 2.0: Neural networks
    as self-written programs.
    Software 1.0: Explicit instructions
    with explicit logics.
    Efficient Path Profiling, MICRO 1996
    Benign input

    View Slide

  9. Software 1.0 vs. Software 2.0
    7
    A
    I
    G H
    F
    C
    E
    B
    D
    r=0
    count[r]++
    r=0;
    count[r]++
    1
    2 4
    A
    I
    G H
    F
    C
    E
    B
    D
    1
    2
    0
    0
    0 3
    0
    2
    8
    (a) (b)
    Apply th
    (Section
    increme
    The dumm
    , whic
    count. The du
    responds to r
    backedge. T
    corre
    the backedge
    Figure 10
    and edge val
    edges. As a
    guish the fou
    Software 2.0: Neural networks
    as self-written programs.
    Software 1.0: Explicit instructions
    with explicit logics.
    Efficient Path Profiling, MICRO 1996
    Adversarial input

    View Slide

  10. Software 1.0 vs. Software 2.0
    7
    A
    I
    G H
    F
    C
    E
    B
    D
    r=0
    count[r]++
    r=0;
    count[r]++
    1
    2 4
    A
    I
    G H
    F
    C
    E
    B
    D
    1
    2
    0
    0
    0 3
    0
    2
    8
    (a) (b)
    Apply th
    (Section
    increme
    The dumm
    , whic
    count. The du
    responds to r
    backedge. T
    corre
    the backedge
    Figure 10
    and edge val
    edges. As a
    guish the fou
    Software 2.0: Neural networks
    as self-written programs.
    Software 1.0: Explicit instructions
    with explicit logics.
    Efficient Path Profiling, MICRO 1996
    Adversarial input

    View Slide

  11. Exploiting Dynamic Behaviors of a DNN
    8
    Software 2.0: Neural networks
    as self-written programs.
    Software 1.0: Explicit instructions
    with explicit logics.
    Adversarial input
    Use program “hot paths” for:
    ‣ Profile-guided optimizations
    ‣ Feedback-driven optimizations
    ‣ Tracing JIT compilation
    ‣ Dynamic deadcode elimination
    ‣ Run-time accelerator offloading
    ‣ Dynamic remote offloading
    ‣ …

    View Slide

  12. Exploiting Dynamic Behaviors of a DNN
    9
    Software 2.0: Neural networks
    as self-written programs.
    Software 1.0: Explicit instructions
    with explicit logics.
    Use program “hot paths” for:
    ‣ Profile-guided optimizations
    ‣ Feedback-driven optimizations
    ‣ Tracing JIT compilation
    ‣ Dynamic deadcode elimination
    ‣ Run-time accelerator offloading
    ‣ Dynamic remote offloading
    ‣ …
    Can we exploit DNN
    paths to detect and
    defend against
    adversarial attacks?
    ✓ Inference-time detection
    ✓ Accurate detection

    View Slide

  13. Defining Activation Path in a DNN
    10
    ‣ Loosely, an activation path is a
    collection of important neurons and
    connections that contribute
    significantly to the inference output.
    ‣ Activation path is input-specific, just
    like a program path.
    Activation path for a
    benign input
    Activation path for
    an adversarial input

    View Slide

  14. Extracting Activation Paths
    11
    ‣ Necessarily a backward process, starting from the last layer.
    Inference
    Extraction

    View Slide

  15. Extracting Activation Paths
    11
    ‣ Necessarily a backward process, starting from the last layer.
    ▹ Last layer Ln has only one important neuron n.
    Inference
    Extraction

    View Slide

  16. Extracting Activation Paths
    11
    ‣ Necessarily a backward process, starting from the last layer.
    ▹ Last layer Ln has only one important neuron n.
    ▹ Important neurons in Ln-1 are given by the minimal set of neurons whose cumulative partial
    sums contribute to at least of n, where (0 <= <= 1). Recursive definition.
    Inference
    Extraction

    View Slide

  17. Extracting Activation Paths
    11
    ‣ Necessarily a backward process, starting from the last layer.
    ▹ Last layer Ln has only one important neuron n.
    ▹ Important neurons in Ln-1 are given by the minimal set of neurons whose cumulative partial
    sums contribute to at least of n, where (0 <= <= 1). Recursive definition.
    0.3
    0.4
    0.2
    1.0
    0.1
    0.2
    0.2
    0.3
    0.3
    0.2
    0.4
    0.4
    0.1
    0.2
    -0.1
    0.9
    0.1
    -1.0
    2.1
    0.5
    0.06
    0.44
    =
    X
    0.46
    Input Layer
    Kernel Output Layer
    Inference
    Extraction

    View Slide

  18. Extracting Activation Paths
    11
    ‣ Necessarily a backward process, starting from the last layer.
    ▹ Last layer Ln has only one important neuron n.
    ▹ Important neurons in Ln-1 are given by the minimal set of neurons whose cumulative partial
    sums contribute to at least of n, where (0 <= <= 1). Recursive definition.
    0.3
    0.4
    0.2
    1.0
    0.1
    0.2
    0.2
    0.3
    0.3
    0.2
    0.4
    0.4
    0.1
    0.2
    -0.1
    0.9
    0.1
    -1.0
    2.1
    0.5
    0.06
    0.44
    =
    X
    0.46
    Input Layer
    Kernel Output Layer
    Inference
    Extraction
    Important
    neuron from
    previous layer

    View Slide

  19. Extracting Activation Paths
    11
    ‣ Necessarily a backward process, starting from the last layer.
    ▹ Last layer Ln has only one important neuron n.
    ▹ Important neurons in Ln-1 are given by the minimal set of neurons whose cumulative partial
    sums contribute to at least of n, where (0 <= <= 1). Recursive definition.
    0.3
    0.4
    0.2
    1.0
    0.1
    0.2
    0.2
    0.3
    0.3
    0.2
    0.4
    0.4
    0.1
    0.2
    -0.1
    0.9
    0.1
    -1.0
    2.1
    0.5
    0.06
    0.44
    =
    X
    0.46
    Input Layer
    Kernel Output Layer
    Inference
    Extraction
    Important neurons at the current layer if = 0.5
    Important
    neuron from
    previous layer

    View Slide

  20. Extracting Activation Paths
    12
    ‣ Necessarily a backward process, starting from the last layer.
    ▹ Last layer Ln has only one important neuron n.
    ▹ Important neurons in Ln-1 are given by the minimal set of neurons whose cumulative partial
    sums contribute to at least of n, where (0 <= <= 1).
    0.3
    0.4
    0.2
    1.0
    0.1
    0.2
    0.2
    0.3
    0.3
    0.2
    0.4
    0.4
    0.1
    0.2
    -0.1
    0.09
    0.1
    -1.0
    2.1
    0.5
    0.06
    0.44
    =
    X
    6 = 0.1 x 2.1 + 1.0 x 0.09 + 0.4 x 0.2 + 0.3 x 0.2 + 0.2 x 0.1
    x 2.1 + 1.0 x 0.09 > 0.6 x 0.46, assuming θ = 0.6
    0.46
    t Feature Map Kernel Output Feature Map
    (OFMap)
    2.63
    1.1
    1.2
    0.9
    0.2
    1.2
    1.9
    1.0
    1.0
    1.1

    0.1
    0.2
    0.2
    0.7
    0.2 1.0
    5.97
    4.31
    1.95
    5.14
    3.14
    2.88
    3.57
    0.3 0.9
    0.2
    0.8
    0.9
    =
    5.47 = 2.0 x 0.7 + 1.4 x 0.9 + 1.5 x 0.8 + 1.0 x 0.9 + ……
    2.0 x 0.7 + 1.4 x 0.9 + 1.5 x 0.8 > 0.6 x 5.47, assuming θ = 0.6
    2.0
    1.4
    5.47
    1.5
    Input Feature Map Kernel Output Feature Map
    (OFMap)
    Important Neuron Extraction
    in Fully-connected Layer
    Important Neuron Extraction
    in Convolution Layer
    Const
    from
    Inference
    Extraction

    View Slide

  21. Extracting Activation Paths
    12
    ‣ Necessarily a backward process, starting from the last layer.
    ▹ Last layer Ln has only one important neuron n.
    ▹ Important neurons in Ln-1 are given by the minimal set of neurons whose cumulative partial
    sums contribute to at least of n, where (0 <= <= 1).
    0.3
    0.4
    0.2
    1.0
    0.1
    0.2
    0.2
    0.3
    0.3
    0.2
    0.4
    0.4
    0.1
    0.2
    -0.1
    0.09
    0.1
    -1.0
    2.1
    0.5
    0.06
    0.44
    =
    X
    6 = 0.1 x 2.1 + 1.0 x 0.09 + 0.4 x 0.2 + 0.3 x 0.2 + 0.2 x 0.1
    x 2.1 + 1.0 x 0.09 > 0.6 x 0.46, assuming θ = 0.6
    0.46
    t Feature Map Kernel Output Feature Map
    (OFMap)
    2.63
    1.1
    1.2
    0.9
    0.2
    1.2
    1.9
    1.0
    1.0
    1.1

    0.1
    0.2
    0.2
    0.7
    0.2 1.0
    5.97
    4.31
    1.95
    5.14
    3.14
    2.88
    3.57
    0.3 0.9
    0.2
    0.8
    0.9
    =
    5.47 = 2.0 x 0.7 + 1.4 x 0.9 + 1.5 x 0.8 + 1.0 x 0.9 + ……
    2.0 x 0.7 + 1.4 x 0.9 + 1.5 x 0.8 > 0.6 x 5.47, assuming θ = 0.6
    2.0
    1.4
    5.47
    1.5
    Input Feature Map Kernel Output Feature Map
    (OFMap)
    Important Neuron Extraction
    in Fully-connected Layer
    Important Neuron Extraction
    in Convolution Layer
    Const
    from
    Inference
    Extraction
    Important neuron
    from previous layer

    View Slide

  22. Extracting Activation Paths
    12
    ‣ Necessarily a backward process, starting from the last layer.
    ▹ Last layer Ln has only one important neuron n.
    ▹ Important neurons in Ln-1 are given by the minimal set of neurons whose cumulative partial
    sums contribute to at least of n, where (0 <= <= 1).
    0.3
    0.4
    0.2
    1.0
    0.1
    0.2
    0.2
    0.3
    0.3
    0.2
    0.4
    0.4
    0.1
    0.2
    -0.1
    0.09
    0.1
    -1.0
    2.1
    0.5
    0.06
    0.44
    =
    X
    6 = 0.1 x 2.1 + 1.0 x 0.09 + 0.4 x 0.2 + 0.3 x 0.2 + 0.2 x 0.1
    x 2.1 + 1.0 x 0.09 > 0.6 x 0.46, assuming θ = 0.6
    0.46
    t Feature Map Kernel Output Feature Map
    (OFMap)
    2.63
    1.1
    1.2
    0.9
    0.2
    1.2
    1.9
    1.0
    1.0
    1.1

    0.1
    0.2
    0.2
    0.7
    0.2 1.0
    5.97
    4.31
    1.95
    5.14
    3.14
    2.88
    3.57
    0.3 0.9
    0.2
    0.8
    0.9
    =
    5.47 = 2.0 x 0.7 + 1.4 x 0.9 + 1.5 x 0.8 + 1.0 x 0.9 + ……
    2.0 x 0.7 + 1.4 x 0.9 + 1.5 x 0.8 > 0.6 x 5.47, assuming θ = 0.6
    2.0
    1.4
    5.47
    1.5
    Input Feature Map Kernel Output Feature Map
    (OFMap)
    Important Neuron Extraction
    in Fully-connected Layer
    Important Neuron Extraction
    in Convolution Layer
    Const
    from
    Inference
    Extraction
    Important neurons at the current layer if = 0.5 Important neuron
    from previous layer

    View Slide

  23. Extracting Activation Paths
    13
    ‣ controls the differentiability of an activation path, i.e., how many important
    neurons are included in an activation path.
    ▹ = 1 includes all neurons.
    0.3
    0.4
    0.2
    1.0
    0.1
    0.2
    0.2
    0.3
    0.3
    0.2
    0.4
    0.4
    0.1
    0.2
    -0.1
    0.09
    0.1
    -1.0
    2.1
    0.5
    0.06
    0.44
    =
    X
    6 = 0.1 x 2.1 + 1.0 x 0.09 + 0.4 x 0.2 + 0.3 x 0.2 + 0.2 x 0.1
    x 2.1 + 1.0 x 0.09 > 0.6 x 0.46, assuming θ = 0.6
    0.46
    t Feature Map Kernel Output Feature Map
    (OFMap)
    2.63
    1.1
    1.2
    0.9
    0.2
    1.2
    1.9
    1.0
    1.0
    1.1

    0.1
    0.2
    0.2
    0.7
    0.2 1.0
    5.97
    4.31
    1.95
    5.14
    3.14
    2.88
    3.57
    0.3 0.9
    0.2
    0.8
    0.9
    =
    5.47 = 2.0 x 0.7 + 1.4 x 0.9 + 1.5 x 0.8 + 1.0 x 0.9 + ……
    2.0 x 0.7 + 1.4 x 0.9 + 1.5 x 0.8 > 0.6 x 5.47, assuming θ = 0.6
    2.0
    1.4
    5.47
    1.5
    Input Feature Map Kernel Output Feature Map
    (OFMap)
    Important Neuron Extraction
    in Fully-connected Layer
    Important Neuron Extraction
    in Convolution Layer
    Const
    from
    Inference
    Extraction

    View Slide

  24. Representing Activation Paths
    14
    ‣ An activation path (for an input) is represented by a bit mask.
    0
    0
    1
    0
    1
    1
    0
    0
    0
    1
    0
    0
    1
    0

    View Slide

  25. From Input Paths to Class Paths
    15
    ‣ A class path for a class c aggregates (e.g., bitwise OR) the activation paths of
    different inputs that are correctly predicted as class c.
    ‣ Generated from a training dataset.
    1
    0
    1
    0
    1
    1
    0
    0
    1
    1
    0
    0
    1
    0
    Pc
    = ⋃
    x∈¯
    xc
    P(x)

    View Slide

  26. From Input Paths to Class Paths
    15
    ‣ A class path for a class c aggregates (e.g., bitwise OR) the activation paths of
    different inputs that are correctly predicted as class c.
    ‣ Generated from a training dataset.
    1
    0
    1
    0
    1
    1
    0
    0
    1
    1
    0
    0
    1
    0
    Pc
    = ⋃
    x∈¯
    xc
    P(x)
    Activation path for correctly-
    predicted benign input x

    View Slide

  27. From Input Paths to Class Paths
    15
    ‣ A class path for a class c aggregates (e.g., bitwise OR) the activation paths of
    different inputs that are correctly predicted as class c.
    ‣ Generated from a training dataset.
    1
    0
    1
    0
    1
    1
    0
    0
    1
    1
    0
    0
    1
    0
    Pc
    = ⋃
    x∈¯
    xc
    P(x)
    Activation path for correctly-
    predicted benign input x
    Class path for class c

    View Slide

  28. Path Similarities
    16
    ‣ Class paths of different classes are very different.
    S =
    |Pc1
    ⋂Pc2
    |
    |Pc1
    ⋃Pc2
    |
    Class paths of
    c1 and c2.

    View Slide

  29. Path Similarities
    16
    ‣ Class paths of different classes are very different.
    ResNet18 @ CIFAR-10
    S =
    |Pc1
    ⋂Pc2
    |
    |Pc1
    ⋃Pc2
    |
    Class paths of
    c1 and c2.

    View Slide

  30. Path Similarities
    16
    ‣ Class paths of different classes are very different.
    ResNet18 @ CIFAR-10 AlexNet @ ImageNet (10 random classes)
    S =
    |Pc1
    ⋂Pc2
    |
    |Pc1
    ⋃Pc2
    |
    Class paths of
    c1 and c2.

    View Slide

  31. Path Similarities
    17
    ‣ Activation paths of benign inputs are similar to their corresponding class path.
    ‣ Activation paths of adversarial inputs are different from the class path.
    S =
    |P(x)⋂Pc
    |
    P(x)

    View Slide

  32. Path Similarities
    17
    ‣ Activation paths of benign inputs are similar to their corresponding class path.
    ‣ Activation paths of adversarial inputs are different from the class path.
    S =
    |P(x)⋂Pc
    |
    P(x)
    Class path

    View Slide

  33. Path Similarities
    17
    ‣ Activation paths of benign inputs are similar to their corresponding class path.
    ‣ Activation paths of adversarial inputs are different from the class path.
    S =
    |P(x)⋂Pc
    |
    P(x)
    Activation path for input x. x is
    not used in generating Pc.
    Class path

    View Slide

  34. Path Similarities
    17
    ‣ Activation paths of benign inputs are similar to their corresponding class path.
    ‣ Activation paths of adversarial inputs are different from the class path.
    Figure 4: Normal example and perturbations from different attacks. The perturbations are enh
    the differences.
    (a) (b) (c) (d)
    LeNet
    S =
    |P(x)⋂Pc
    |
    P(x)
    Activation path for input x. x is
    not used in generating Pc.
    Class path

    View Slide

  35. Path Similarities
    17
    ‣ Activation paths of benign inputs are similar to their corresponding class path.
    ‣ Activation paths of adversarial inputs are different from the class path.
    Figure 4: Normal example and perturbations from different attacks. The perturbations are enh
    the differences.
    (a) (b) (c) (d)
    LeNet
    S =
    |P(x)⋂Pc
    |
    P(x)
    Activation path for input x. x is
    not used in generating Pc.
    Class path
    Benign inputs

    View Slide

  36. Basic Idea of the Detection Algorithm
    18
    ‣ For an input x that is predicted as class c, if the activation path of x does not
    resemble the class path of c, x is likely an adversarial sample.

    View Slide

  37. Basic Idea of the Detection Algorithm
    18
    ‣ For an input x that is predicted as class c, if the activation path of x does not
    resemble the class path of c, x is likely an adversarial sample.
    Class Path
    Construction
    Training Data Class Path
    Activation Path
    Extraction
    Input
    Inference
    Adversarial
    Classification
    Adversarial? +
    Output
    Activation
    Path
    Offline
    Online

    View Slide

  38. Basic Idea of the Detection Algorithm
    18
    ‣ For an input x that is predicted as class c, if the activation path of x does not
    resemble the class path of c, x is likely an adversarial sample.
    Class Path
    Construction
    Training Data Class Path
    Activation Path
    Extraction
    Input
    Inference
    Adversarial
    Classification
    Adversarial? +
    Output
    Activation
    Path
    Offline
    Online
    Random forest. Lightweight
    and works effectively.

    View Slide

  39. Run-time Overhead
    19
    Layer 1 Layer 2 Layer N Layer N Layer N-1 Layer 1
    …… ……
    Forward Inference Backward Path Extraction

    View Slide

  40. Run-time Overhead
    19
    Layer 1 Layer 2 Layer N Layer N Layer N-1 Layer 1
    …… ……
    Forward Inference Backward Path Extraction
    Compute
    Store Partial
    Sums

    View Slide

  41. Run-time Overhead
    19
    Layer 1 Layer 2 Layer N Layer N Layer N-1 Layer 1
    …… ……
    Forward Inference Backward Path Extraction
    Compute
    Store Partial
    Sums
    1
    8
    64
    512
    Memory Overhead
    Lenet
    Alexnet
    Resnet18

    View Slide

  42. Run-time Overhead
    19
    Layer 1 Layer 2 Layer N Layer N Layer N-1 Layer 1
    …… ……
    Forward Inference Backward Path Extraction
    Compute
    Store Partial
    Sums
    Rank
    Neurons
    Read Partial
    Sums
    Construct
    Path
    1
    8
    64
    512
    Memory Overhead
    Lenet
    Alexnet
    Resnet18

    View Slide

  43. Run-time Overhead
    19
    Layer 1 Layer 2 Layer N Layer N Layer N-1 Layer 1
    …… ……
    Forward Inference Backward Path Extraction
    Compute
    Store Partial
    Sums
    Rank
    Neurons
    Read Partial
    Sums
    Construct
    Path
    1
    8
    64
    512
    Memory Overhead
    Lenet
    Alexnet
    Resnet18
    40
    30
    20
    10
    0
    Operation Overhead (%)
    0.9
    0.7
    0.5
    0.3
    0.1
    Threshold θ

    View Slide

  44. Run-time Overhead
    19
    Layer 1 Layer 2 Layer N Layer N Layer N-1 Layer 1
    …… ……
    Forward Inference Backward Path Extraction
    Compute
    Store Partial
    Sums
    Rank
    Neurons
    Read Partial
    Sums
    Construct
    Path
    1
    8
    64
    512
    Memory Overhead
    Lenet
    Alexnet
    Resnet18
    40
    30
    20
    10
    0
    Operation Overhead (%)
    0.9
    0.7
    0.5
    0.3
    0.1
    Threshold θ
    15.4 X (AlexNet) and 50.7 X
    (ResNet) overhead for a pure
    software implementation.

    View Slide

  45. Hiding Cost: Extraction Direction
    20
    Layer 1 Layer 2 Layer N
    Layer 1 Layer N-1 Layer N
    ……
    ……
    Inference
    Extraction
    ‣ The important neurons are determined locally now.
    ‣ Pros: overlap inference with extraction.
    ‣ Cons: locally important neurons aren’t necessarily eventually important.

    View Slide

  46. Reducing Cost: Selective Extraction + Thresholding
    21
    Layer 1 Layer 2 Layer N
    Layer 1 Layer N-1 Layer N
    ……
    ……
    Inference
    Extraction

    View Slide

  47. Reducing Cost: Selective Extraction + Thresholding
    21
    Layer 1 Layer 2 Layer N
    Layer 1 Layer N-1 Layer N
    ……
    ……
    Inference
    Extraction

    View Slide

  48. Reducing Cost: Selective Extraction + Thresholding
    21
    Layer 1 Layer 2 Layer N
    Layer 1 Layer N-1 Layer N
    ……
    ……
    Inference
    Extraction
    0.3
    0.4
    0.2
    1.0
    0.1
    0.2
    0.2
    0.3
    0.3
    0.2
    0.4
    0.4
    0.1
    0.2
    -0.1
    0.9
    0.1
    -1.0
    2.1
    0.5
    0.06
    0.44
    =
    X
    0.46
    Input Layer
    Kernel Output Layer
    0.46 = 0.1 x 2.1 + 1.0 x 0.09 + 0.4 x 0.2 + 0.3 x 0.2 + 0.2 x 0.1
    Cumulative threshold:

    View Slide

  49. Reducing Cost: Selective Extraction + Thresholding
    21
    Layer 1 Layer 2 Layer N
    Layer 1 Layer N-1 Layer N
    ……
    ……
    Inference
    Extraction
    Absolute threshold:
    0.1 x 2.1 > 0.1

    1.0 x 0.09 < 0.1

    0.4 x 0.2 < 0.1

    0.3 x 0.2 < 0.1

    0.2 x 0.1 < 0.1
    0.3
    0.4
    0.2
    1.0
    0.1
    0.2
    0.2
    0.3
    0.3
    0.2
    0.4
    0.4
    0.1
    0.2
    -0.1
    0.9
    0.1
    -1.0
    2.1
    0.5
    0.06
    0.44
    =
    X
    0.46
    Input Layer
    Kernel Output Layer
    0.46 = 0.1 x 2.1 + 1.0 x 0.09 + 0.4 x 0.2 + 0.3 x 0.2 + 0.2 x 0.1
    Cumulative threshold:

    View Slide

  50. Algorithm Design Space
    22
    Forward
    Direction
    Backward
    Extraction
    Absolute
    Thresholding Cumulative
    Thresholding
    N
    1

    View Slide

  51. Algorithm Design Space
    22
    Forward
    Direction
    Backward
    Extraction
    Absolute
    Thresholding Cumulative
    Thresholding
    N
    1

    View Slide

  52. Hardware Support
    23
    ‣ Nothing special. Nothing outrageous. Minimal extension to conventional NPU.
    DNN Accelerator
    SRAM (Weights, Feature
    Maps, Partial Sums, Masks)
    Path Constructor
    Sort & Merge Accumulate
    Controller
    SRAM (Code,
    Paths)
    DRAM
    Input/Output
    Weights
    Feature Maps
    Partial Sums
    Masks
    Mask
    Gen.
    SRAM (Partial sums, Partial masks, Masks)
    Paths

    View Slide

  53. Hardware Support
    23
    ‣ Nothing special. Nothing outrageous. Minimal extension to conventional NPU.
    DNN Accelerator
    SRAM (Weights, Feature
    Maps, Partial Sums, Masks)
    Path Constructor
    Sort & Merge Accumulate
    Controller
    SRAM (Code,
    Paths)
    DRAM
    Input/Output
    Weights
    Feature Maps
    Partial Sums
    Masks
    Mask
    Gen.
    SRAM (Partial sums, Partial masks, Masks)
    Paths
    i w
    x
    + psum
    >?
    thd
    MUX
    0/1
    mode
    to SRAM
    to SRAM

    View Slide

  54. Hardware Support
    23
    ‣ Nothing special. Nothing outrageous. Minimal extension to conventional NPU.
    DNN Accelerator
    SRAM (Weights, Feature
    Maps, Partial Sums, Masks)
    Path Constructor
    Sort & Merge Accumulate
    Controller
    SRAM (Code,
    Paths)
    DRAM
    Input/Output
    Weights
    Feature Maps
    Partial Sums
    Masks
    Mask
    Gen.
    SRAM (Partial sums, Partial masks, Masks)
    Paths

    SRAM
    Merge Unit
    Sort
    Unit
    Sort
    Unit
    i w
    x
    + psum
    >?
    thd
    MUX
    0/1
    mode
    to SRAM
    to SRAM

    View Slide

  55. Mapping an Algorithmic Variant to Hardware
    24
    ‣ Key: statically schedule the execution.

    View Slide

  56. Mapping an Algorithmic Variant to Hardware
    24
    ‣ Key: statically schedule the execution.
    Forward extraction
    for j = 1 to L {
    inf(j)

    }

    View Slide

  57. Mapping an Algorithmic Variant to Hardware
    24
    ‣ Key: statically schedule the execution.
    Forward extraction
    for j = 1 to L {
    inf(j)

    }
    Software
    pipelining
    inf(1)
    for j = 1 to L-1 {
    inf(j+1)

    }

    View Slide

  58. Mapping an Algorithmic Variant to Hardware
    24
    ‣ Key: statically schedule the execution.
    Forward extraction
    for j = 1 to L {
    inf(j)

    }
    Software
    pipelining
    inf(1)
    for j = 1 to L-1 {
    inf(j+1)

    }

    Cumulative thresholding
    for i = 1 to N {
    ldpsum(i)
    sort(i)
    acum(i)
    }

    View Slide

  59. Mapping an Algorithmic Variant to Hardware
    24
    ‣ Key: statically schedule the execution.
    Forward extraction
    for j = 1 to L {
    inf(j)

    }
    Software
    pipelining
    inf(1)
    for j = 1 to L-1 {
    inf(j+1)

    }

    Cumulative thresholding
    for i = 1 to N {
    ldpsum(i)
    sort(i)
    acum(i)
    }
    Software
    pipelining
    ldpsum(1)
    sort(1)
    for i = 1 to N-1 {
    ldpsum(i+1)
    sort(i+1)
    acum(i)
    }
    acum(N)

    View Slide

  60. Mapping an Algorithmic Variant to Hardware
    24
    ‣ Key: statically schedule the execution.
    Forward extraction
    for j = 1 to L {
    inf(j)

    }
    0.3
    0.4
    0.2
    1.0
    0.1
    0.2
    0.2
    0.3
    0.3
    0.2
    0.4
    0.4
    0.1
    0.2
    -0.1
    0.09
    0.1
    -1.0
    2.1
    0.5
    0.06
    0.44
    =
    X
    0.46 = 0.1 x 2.1 + 1.0 x 0.09 + 0.4 x 0.2 + 0.3 x 0.2 + 0.2 x 0.1
    0.1 x 2.1 + 1.0 x 0.09 > 0.6 x 0.46, assuming θ = 0.6
    0.46
    Important Neurons identified in the current layer: 1.0, 0.1
    Important Neurons in the OFMap (identified before): 0.46
    Input Feature Map Kernel Output Feature Map
    (OFMap)
    2.63
    1.1
    1.2
    0.9
    0.2
    1.2
    1.9
    1.0
    1.0
    1.1

    0.1
    0.2
    0.2
    0.7
    0.2 1.0
    5.97
    4.31
    1.95
    5.14
    3.14
    2.88
    3.57
    0.3 0.9
    0.2
    0.8
    0.9
    =
    Important Neurons identified in the current layer: 2.0, 1.4, 1.5
    Important Neurons in the OFMap (identified before): 5.47
    5.47 = 2.0 x 0.7 + 1.4 x 0.9 + 1.5 x 0.8 + 1.0 x 0.9 + ……
    2.0 x 0.7 + 1.4 x 0.9 + 1.5 x 0.8 > 0.6 x 5.47, assuming θ = 0.6
    2.0
    1.4
    5.47
    1.5
    Input Feature Map Kernel Output Feature Map
    (OFMap)
    Important Neuron Extraction
    in Fully-connected Layer
    Important Neuron Extraction
    in Convolution Layer
    C
    Recompute rather than store the partial sums
    (only 5% of the partial sums are later used).
    Software
    pipelining
    inf(1)
    for j = 1 to L-1 {
    inf(j+1)

    }

    Cumulative thresholding
    for i = 1 to N {
    ldpsum(i)
    sort(i)
    acum(i)
    }
    Software
    pipelining
    ldpsum(1)
    sort(1)
    for i = 1 to N-1 {
    ldpsum(i+1)
    sort(i+1)
    acum(i)
    }
    acum(N)

    View Slide

  61. Mapping an Algorithmic Variant to Hardware
    24
    ‣ Key: statically schedule the execution.
    Forward extraction
    for j = 1 to L {
    inf(j)

    }
    0.3
    0.4
    0.2
    1.0
    0.1
    0.2
    0.2
    0.3
    0.3
    0.2
    0.4
    0.4
    0.1
    0.2
    -0.1
    0.09
    0.1
    -1.0
    2.1
    0.5
    0.06
    0.44
    =
    X
    0.46 = 0.1 x 2.1 + 1.0 x 0.09 + 0.4 x 0.2 + 0.3 x 0.2 + 0.2 x 0.1
    0.1 x 2.1 + 1.0 x 0.09 > 0.6 x 0.46, assuming θ = 0.6
    0.46
    Important Neurons identified in the current layer: 1.0, 0.1
    Important Neurons in the OFMap (identified before): 0.46
    Input Feature Map Kernel Output Feature Map
    (OFMap)
    2.63
    1.1
    1.2
    0.9
    0.2
    1.2
    1.9
    1.0
    1.0
    1.1

    0.1
    0.2
    0.2
    0.7
    0.2 1.0
    5.97
    4.31
    1.95
    5.14
    3.14
    2.88
    3.57
    0.3 0.9
    0.2
    0.8
    0.9
    =
    Important Neurons identified in the current layer: 2.0, 1.4, 1.5
    Important Neurons in the OFMap (identified before): 5.47
    5.47 = 2.0 x 0.7 + 1.4 x 0.9 + 1.5 x 0.8 + 1.0 x 0.9 + ……
    2.0 x 0.7 + 1.4 x 0.9 + 1.5 x 0.8 > 0.6 x 5.47, assuming θ = 0.6
    2.0
    1.4
    5.47
    1.5
    Input Feature Map Kernel Output Feature Map
    (OFMap)
    Important Neuron Extraction
    in Fully-connected Layer
    Important Neuron Extraction
    in Convolution Layer
    C
    Recompute rather than store the partial sums
    (only 5% of the partial sums are later used).
    Software
    pipelining
    inf(1)
    for j = 1 to L-1 {
    inf(j+1)

    }

    Cumulative thresholding
    for i = 1 to N {
    ldpsum(i)
    sort(i)
    acum(i)
    }
    Software
    pipelining
    ldpsum(1)
    sort(1)
    for i = 1 to N-1 {
    ldpsum(i+1)
    sort(i+1)
    acum(i)
    }
    acum(N)
    Recompute
    csps(1)
    sort(1)
    for i = 1 to N-1 {
    csps(i+1)
    sort(i+1)
    acum(i)
    }
    acum(N)

    View Slide

  62. Putting It Together
    25
    Algorithm Framework
    Canary
    Class
    Paths
    Programming Interface
    output = Inference()
    foreach Layer
    ExtractImptNeurons()
    GenMask()
    return Classify()
    Compiler
    Optimizations
    ✓ Layer-Level Pipelining
    ✓ Neuron-Level Pipelining
    ✓ Comp.-Mem. Trade-off
    ISA
    .set rfsize 0x200
    mov r3, rfsize
    findrf r4, r1
    sort r1, r3, r6
    acum r6, r1, r5
    Offline Profiling &
    Extraction
    Extraction Inference
    Reduce Cost
    Hide Cost
    Algorithm Knobs
    DNN Models and
    Legitimate Training
    Samples
    Classification
    Hardware Architecture
    Memory
    Augmented DNN
    Accelerator
    Programmable
    Path Extractor
    Selective
    Extraction
    Extraction
    Direction
    Thresholding
    Mechanism

    View Slide

  63. Evaluation Setup
    26
    ‣ Hardware: a cycle-level simulator parameterized with synthesis results from
    RTL implementation (Silvaco’s Open-Cell 15nm).
    ‣ Baselines: EP [CVPR 2019], CDRP [CVPR 2018], DeepFense [ICCAD 2018]
    ‣ Dataset: ImageNet, CIFAR-100
    ‣ Network: ResNet18, AlexNet, VGG
    ‣ Attacks:
    ▹ BIM, CWL2, DeepFool, FGSM, and JSMA, which comprehensively cover all three types of
    input perturbation measures (L0, L2, and L∞).
    ▹ Adaptive attacks, which are specifically designed to defeat our detection mechanisms.
    https://github.com/ptolemy-dl/ptolemy

    View Slide

  64. Evaluation Setup
    27
    ‣ Variants:
    ▹ BwCU: Backward + cumulative thresholding
    ▹ BwAb: Backward + absolute thresholding
    ▹ FwAb: Forward + absolute thresholding
    ▹ Hybrid: BwAb + BwCu
    Forward
    Direction
    Backward
    Extraction
    Absolute
    Thresholding Cumulative
    Thresholding
    N
    1

    View Slide

  65. Detection Accuracy
    28
    1.00
    0.95
    0.90
    0.85
    0.80
    0.75
    Accuracy
    BwCu
    BwAb
    FwAb
    Hybrid EP
    CDRP
    0.95
    0.90
    0.85
    0.80
    Accuracy
    BwCu
    BwAb
    FwAb
    Hybrid EP
    CDRP
    AlexNet @ ImageNet ResNet18 @ CIFAR-100

    View Slide

  66. Detection Overhead
    29
    1
    2
    4
    8
    16
    Latency Overhead
    BwCu
    BwAb
    FwAb
    Hybrid EP
    1
    2
    4
    8
    Energy Overhead
    1
    4
    16
    64
    256
    Latency Overhead
    1
    4
    16
    64
    256
    Energy Overhead
    BwCu
    BwAb
    FwAb
    Hybrid EP
    ‣ CDRP requires re-training, unsuitable for inference-time attack detection.

    View Slide

  67. Compare with DeepFense
    30
    1.00
    0.95
    0.90
    0.85
    0.80
    Accuracy
    BwCu
    BwAb
    FwAb
    Hybrid
    DFL
    DFM
    DFH
    PTOLEMY DeepFense
    1
    4
    16
    Latency Overhead
    1
    4
    16
    Energy Overhead
    BwCu
    BwAb
    FwAb
    Hybrid
    DFL
    DFM
    DFH
    PTOLEMY DeepFense
    ‣ DeepFense uses modular redundancy to defend against adversarial samples.
    ‣ Ptolemy is both more accurate and faster.

    View Slide

  68. Adaptive Attacks
    31
    ‣ If attackers know our defense mechanism, what can they do?
    x
    c t
    Input xt
    True Class
    xa=x+Δx
    c

    View Slide

  69. Adaptive Attacks
    31
    ‣ If attackers know our defense mechanism, what can they do?
    ‣ Given an input x that has a true class c, add minimal amount of perturbation
    Δx to generate an adversarial input xa such that the path of xa resembles a
    totally different input xt whose true class is t (!= c).
    x
    c t
    Input xt
    True Class
    xa=x+Δx
    c

    View Slide

  70. Adaptive Attacks
    31
    ‣ If attackers know our defense mechanism, what can they do?
    ‣ Given an input x that has a true class c, add minimal amount of perturbation
    Δx to generate an adversarial input xa such that the path of xa resembles a
    totally different input xt whose true class is t (!= c).
    x
    c t
    Input xt
    True Class
    xa=x+Δx
    c

    i
    zi
    (x + δx) − zi
    (xt
    ) 2
    2
    Loss function when generating
    adversarial samples.

    View Slide

  71. Adaptive Attacks
    31
    ‣ If attackers know our defense mechanism, what can they do?
    ‣ Given an input x that has a true class c, add minimal amount of perturbation
    Δx to generate an adversarial input xa such that the path of xa resembles a
    totally different input xt whose true class is t (!= c).
    ‣ Average Δx is 0.007 (in MSE): xa still looks like x.
    x
    c t
    Input xt
    True Class
    xa=x+Δx
    c

    i
    zi
    (x + δx) − zi
    (xt
    ) 2
    2
    Loss function when generating
    adversarial samples.

    View Slide

  72. Adaptive Attacks
    32
    ‣ 100% attack success rate without our defense.
    ‣ Adaptive attacks are more effective than non-adaptive attacks.
    ‣ When an adversarial input resembles the activations in more layers, it has a
    better chance fooling our defense.
    1.0
    0.8
    0.6
    0.4
    0.2
    0.0
    Accuracy
    BwCu FwAb
    AT1
    BIM
    AT2
    CWL2
    AT3
    DeepFool
    AT8
    FGSM
    JSMA

    View Slide

  73. Early Termination (for Backward Extraction)
    33
    1
    4
    16
    Norm. Latency
    8 7 6 5 4 3 2 1
    Termination Layer
    1
    2
    4
    8
    Norm. Energy
    0.95
    0.89
    0.83
    0.77
    Accuracy
    8 7 6 5 4 3 2 1
    Termination Layer
    ‣ Terminating earlier increases accuracy but lowers overhead.
    ‣ Accuracy plateaus beyond layer 6 (i.e., extracting 3 layers only) but overhead
    still keeps increasing.
    AlexNet (8 layers in total), BwCu
    Terminating earlier

    View Slide

  74. Late Start (for Forward Extraction)
    34
    ‣ Starting earlier increases accuracy and energy overhead.
    ‣ Not much impact on latency, which is hidden anyway.
    AlexNet (8 layers in total), FwAb
    0.95
    0.89
    0.83
    0.77
    Accuracy
    8 7 6 5 4 3 2 1
    Start Layer
    1.04
    1.03
    1.02
    1.01
    1.00
    Norm. Latency
    1.20
    1.15
    1.10
    1.05
    1.00
    Norm. Energy
    8 7 6 5 4 3 2 1
    Start Layer
    Starting later

    View Slide

  75. Optics, Sensor, Computing Co-Defense
    35
    Image
    https://thesmartphonephotographer.com/phone-camera-sensor/
    Vision Algorithm

    View Slide

  76. Optics, Sensor, Computing Co-Defense
    35
    Image
    https://thesmartphonephotographer.com/phone-camera-sensor/
    Light
    Optics Sensor Vision Algorithm

    View Slide

  77. Optics, Sensor, Computing Co-Defense
    35
    Image
    https://thesmartphonephotographer.com/phone-camera-sensor/
    Light
    Optics Sensor Vision Algorithm
    Lens Model
    Phase Profile
    Spectral
    Sensitivity
    Function
    Noise and
    Quantization
    Models
    Layer weights
    Sparsity
    Bit width
    Network
    Loss

    View Slide

  78. Optics, Sensor, Computing Co-Defense
    35
    Image
    https://thesmartphonephotographer.com/phone-camera-sensor/
    Light
    Optics Sensor Vision Algorithm
    How to co-design optics, image sensor, and DNN to improve the system robustness?
    Lens Model
    Phase Profile
    Spectral
    Sensitivity
    Function
    Noise and
    Quantization
    Models
    Layer weights
    Sparsity
    Bit width
    Network
    Loss

    View Slide

  79. “Adversarial Attacks” in Neural Scientific Computing
    36
    https://blog.yiningkarlli.com/2015/06/attenuated-transmission.html
    Physics simulation Detecting extreme weather
    https://www.slideshare.net/SAMSI_Info/program-on-mathematical-and-statistical-methods-for-climate-and-the-earth-system-deep-learning-for-extreme-weather-detection-prabhat-aug-23-2017
    https://www.osti.gov/servlets/purl/1471083

    View Slide

  80. “Adversarial Attacks” in Neural Scientific Computing
    ‣ Numerical stability casted as an adversarial robustness problem.
    36
    https://blog.yiningkarlli.com/2015/06/attenuated-transmission.html
    Physics simulation Detecting extreme weather
    https://www.slideshare.net/SAMSI_Info/program-on-mathematical-and-statistical-methods-for-climate-and-the-earth-system-deep-learning-for-extreme-weather-detection-prabhat-aug-23-2017
    https://www.osti.gov/servlets/purl/1471083

    View Slide

  81. “Adversarial Attacks” in Neural Scientific Computing
    ‣ Numerical stability casted as an adversarial robustness problem.
    ▹ Can DNNs improve numerical instability of scientific computing algorithms?
    36
    https://blog.yiningkarlli.com/2015/06/attenuated-transmission.html
    Physics simulation Detecting extreme weather
    https://www.slideshare.net/SAMSI_Info/program-on-mathematical-and-statistical-methods-for-climate-and-the-earth-system-deep-learning-for-extreme-weather-detection-prabhat-aug-23-2017
    https://www.osti.gov/servlets/purl/1471083

    View Slide

  82. “Adversarial Attacks” in Neural Scientific Computing
    ‣ Numerical stability casted as an adversarial robustness problem.
    ▹ Can DNNs improve numerical instability of scientific computing algorithms?
    ▹ Will DNNs introduce new robustness issues to scientific computing?
    36
    https://blog.yiningkarlli.com/2015/06/attenuated-transmission.html
    Physics simulation Detecting extreme weather
    https://www.slideshare.net/SAMSI_Info/program-on-mathematical-and-statistical-methods-for-climate-and-the-earth-system-deep-learning-for-extreme-weather-detection-prabhat-aug-23-2017
    https://www.osti.gov/servlets/purl/1471083

    View Slide

  83. Connections Between Software 1.0 and Software 2.0
    37
    Software 1.0: Explicit
    instructions with explicit logics.
    Software 2.0: Neural networks
    as self-written programs.
    Program
    Path/Trace
    Adversarial defense
    Program optimizations
    Over-parameterization
    Approximate computing Model compression
    Modular
    Redundancy
    Ensemble
    Fault tolerance
    ???

    View Slide

  84. Summary
    ‣ Robustness is a major roadblock & under-explored in our community.
    ‣ Ptolemy defends DNNs from adversarial attacks leveraging dynamic program
    path, a critical connection between Software 1.0 and Software 2.0.
    ‣ Ptolemy is not an algorithm; it’s an algorithmic framework.
    ‣ Overhead could be very low with the right choice of algorithms variants and
    static instruction scheduling.
    ‣ Hardware extension is minimal and principled.
    38

    View Slide

  85. Architecture Support for Robust
    Deep Learning: Exploiting Software 1.0
    Techniques to Defend Software 2.0
    Qiuyue Sun Sam Triest
    Yu Feng
    Jingwen Leng Amir Taherin Yawo Siatitse
    http://horizon-lab.org
    Yuxian Qiu
    Yiming Gan

    View Slide

  86. Path Similarities
    40
    ‣ Activation paths of benign inputs are similar to the class path.
    ‣ Activation paths of adversarial inputs are different from the class path.
    mple and perturbations from different attacks. The perturbations are enhanced by 100 ti
    AlexNet on
    ImageNet
    Sl =
    |P(x)l⋂Pl
    c
    |
    P(x)l
    Activation path for
    input x at layer l.
    Class path at layer l.
    nhanced by 100 times to highlight

    View Slide

  87. Adaptive Attacks
    41
    ‣ Detection accuracy doesn’t change much as more perturbation (Δx) is added,
    likely because the perturbation is very small — a desirable property.
    ‣ Ptolemy is not more vulnerable when the attacker simply targets a similar class
    when generating the attacks.
    0.90
    0.85
    0.80
    0.75
    0.70
    Accuracy
    35
    30
    25
    20
    15
    10
    5
    0
    Distortion/Perturbation (x10-3
    MSE)
    1.00
    0.95
    0.90
    0.85
    0.80
    Accuracy
    0.30
    0.20
    0.10
    0.00
    Path Similarity

    View Slide

  88. Choice of the Final Classifier
    42
    Figure 4: ROC for AlexNet on Ima-
    geNet with weight-based joint simi-
    larity.
    Figure 5: Detection accuracy comparison under different
    (a) Linear model.
    Figure 7: Impact of attack
    (a) Linear model.
    Figure 8: Effective path la
    4.1. Detection Accuracy
    Fig. 5 shows the detecti
    under different defense mod
    tacks, we find that random fo
    linear model performs worst a
    gap between random forest
    which indicates that effectiv
    ‣ Random forest works the best while having low overhead.
    ‣ C.f., training a dedicated DNN to detect adversarial inputs.

    View Slide