Attribute - Python library for Neural Network Interpretability

Slide 1

Slide 1 text

Attribute : Neural Network Interpretability in Pytorch and TensorFlow Plaksha Tech Leaders Fellowship Capstone Project Praveen Sridhar Mentored by Nikhil Narayan, Lead Data Scientist, mﬁne.co

Slide 2

Slide 2 text

Introduction Predictions of Deep Learning models are diﬃcult to explain and interpret Regulated industries like medical and ﬁnance industries often fall short from using Deep learning techniques due to this. Techniques for Interpretability help users to : ● Trust and understand why predictions are made in a certain way ● Provides accountability for usage in important decision making

Slide 3

Slide 3 text

Interpretability in the Medical Industry [Mukundhan et al. Google Research]

Slide 4

Slide 4 text

Objective 1. Research and implement algorithms for Neural Network Interpretability in the domain of Computer Vision

Slide 5

Slide 5 text

Literature Review

Slide 6

Slide 6 text

Pixel-space Attribution [Srinivas et al. NIPS 2019]

Slide 7

Slide 7 text

Saliency Heatmaps Grad CAM Full Grad Integrated Gradients Original Image [Srinivas et al. NIPS 2019]

Slide 8

Slide 8 text

Neuron Visualization Gradient Ascent Integrated Gradients SUMMIT [Hohman et al.]

Slide 9

Slide 9 text

Implementation

Slide 10

Slide 10 text

Attribute Python Library for Neural Network Interpretability in PyTorch and TensorFlow

Slide 11

Slide 11 text

Design ● Uniﬁed API for diﬀerent types of techniques ● Benchmarking for comparisons

Slide 12

Slide 12 text

Details of Algorithms implemented

Slide 13

Slide 13 text

Gradient Attribution CEO In this, the gradient of output with respect to input image is taken as the attribution map

Slide 14

Slide 14 text

Smooth Gradients CEO Smoothed version of gradient attribution. Random noise is added to the input image and averaged to get the final attribution

Slide 15

Slide 15 text

Integrated Gradients CEO

Slide 16

Slide 16 text

Integrated Gradients (continued) CEO

Slide 17

Slide 17 text

GradCAM CEO Generalization of CAM (Class Activation Maps) to arbitrary architectures. CAM works only for networks with last layer as an Global Average Pooling layer Alpha = Weights[:, class_index] # (512,) FeatureMaps = getLastConvLayer() # (7,7,512) CAM = Alpha * FeatureMaps # (7,7) Upsample to original image size and overlay In Grad CAM, the equivalent to global average pooling is performed on the gradients of the output with respect to the feature maps Aij

Slide 18

Slide 18 text

FullGrad CEO FullGrad saliency uses a decomposition of the neural network into input sensitivity and neuron sensitivity components to give an exact representation of attribution

Slide 19

Slide 19 text

FullGrad CEO Official implementation requires modification of the network definition to get intermediate gradients and biases

Slide 20

Slide 20 text

FullGrad CEO In the attribute library, the implementation is flexible and automatically extracts the required intermediate gradients and bias values using internal PyTorch APIs

Slide 21

Slide 21 text

Sample Outputs

Slide 22

Slide 22 text

Sample Outputs GradCAM

Slide 23

Slide 23 text

Benchmarks

Slide 24

Slide 24 text

Conclusion The required Neural Network Library was implemented in both PyTorch and TensorFlow supporting the following algorithms ● Gradient Attribution ● Integrated Gradients ● Smoothed Gradients ● GradCAM ● FullGrad The library features a consistent API across diﬀerent techniques as well as a benchmarking utility

Slide 25

Slide 25 text

Future Scope The project can be expanded to ● More techniques for interpretability, especially neuron visualisation ● Support other modalities like Text and Speech A possible line of research was found while testing FullGrad technique, where it was discovered that heat maps produced were not very class discriminative. A combination of ideas from GradCAM could potentially solve this.

Slide 26

Slide 26 text

Thank you