Slide 1

Slide 1 text

Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization

Slide 2

Slide 2 text

Contents ・Abstract ・Introduction ・Related Work ・Approach ・Evaluation ・Appendix

Slide 3

Slide 3 text

Abstract ・”Visual explanations” makes CNN more transparent ・Grad-CAM is applicable to a wide variety of CNN model-families (1) CNNs with FC layer (e.g. VGG) (2) CNNs used for structured outputs (e.g. captioning) (3) CNNs used in tasks with multi-modal inputs (e.g. VQA) ・We do not have to change architecture of the model ・We do not have to re-train the model ・In the context of image classification models, Visualizations by Grad-CAM (a) show seemingly unreasonable predictions have reasonable explanations (b) help achieve model generalization by identifying datasets bias

Slide 4

Slide 4 text

Introduction ・ DNN lacks decomposability into intuitive and understandable ・ Transparency of models is important to build trust in intelligent systems. explain why they predict what they predict Transparency is useful at three different stages of AI evolution 1. When AI is significantly weaker than humans, the goal is to identify the failure modes, helping researchers 2. When AI is on par with humans, the goal is establish appropriate trust and confidence in users. 3. When AI is significantly stronger than humans (e.g. chess or Go), the goal of explanations is in machine teaching.

Slide 5

Slide 5 text

What makes a good visual explanation? (1) Class-discriminative (i.e. localize the target category in the image) (2) High-resolution (i.e. capture fine-grained detail) (1) × (2) 〇 (1) 〇 (2) × (1) 〇 (2) 〇 (b) and (h) are similar (d) focus on stripes

Slide 6

Slide 6 text

Related work Class Activation Mapping (CAM) FC layer → Conv layer + Global Average Pooling + Softmax ☹ Accuracy will decrease ☹ It only works for Image recognition ☹ Architecture have to be changed There typically exists a trade-off between accuracy and simplicity or interpretability

Slide 7

Slide 7 text

Approach Deeper representations in a CNN capture higher-level visual constructs. Convolutional features naturally retain spatial information. lost in fully-connected layers Last Convolutional layers have high-level semantics and detailed spatial information The neurons in these layers look for sematic class-specific information To understand the importance of each neuron for decision of interest, Grad-CAM uses the gradient information flowing into the last convolutional layer.

Slide 8

Slide 8 text

Approach To understand the importance of each neuron for decision of interest, Grad-CAM uses the gradient information flowing into the last convolutional layer. = 1 ෍ =1 ෍ =1 Grad−Cam = ෍ ∈ ℝ× : feature maps of the last conv. layer : channel index : class index : before softmax Gradients via backprop Global average pooling Importance of feature map for class c Coarse heat map We are only interested in the features that have a positive influence

Slide 9

Slide 9 text

Evaluating Localization The ImageNet localization challenge Images given Images label given Bounding boxes estimate [51] requires a change in the model architecture, necessitates re-training and achieves worse classification error. Given an image ↓ Obtain class predictions from our network ↓ Generate Grad-CAM maps for each of the predicted classes ↓ Binarize with threshold of 15% of the max intensity ↓ Draw our bounding box around the single largest segment

Slide 10

Slide 10 text

Evaluating Visualization +2 +1 0 -1 -2 Guided Backpropagation 1.0 Guided Grad-CAM 1.27 n = 52

Slide 11

Slide 11 text

Analyzing Failure Modes for VGG-16

Slide 12

Slide 12 text

Effect of adversarial noise on VGG-16 = 1 ෍ =1 ෍ =1 Grad−Cam = ෍

Slide 13

Slide 13 text

Identifying bias in dataset Model trained on biased datasets may perpetuate biases and sterotypes ( gender, race, age) Nurse Doctor Biased dataset VGG16 This model achieves good accuracy on validation images. But at test time the model did not generalize as well (82%) 55 Women 195 Men 225 Women 25 Men

Slide 14

Slide 14 text

Identifying bias in dataset The model had learned to look at the person’s Face, hairstyle to distinguish nurse from doctors. ☹ learning a gender stereotype The unbiased model made the right prediction looking at the white coat, and the stethoscope. Generalized Grad-CAM can help detect and remove biases in datasets Fair and ethical model

Slide 15

Slide 15 text

Supplementary material 1: Guided Grad-CAM on different layers

Slide 16

Slide 16 text

Supplementary material 2: Guided Grad-CAM on VQA