Attribute - Python library for Neural Network Interpretability

Attribute : Neural Network Interpretability in Pytorch and TensorFlow Plaksha
Tech Leaders Fellowship Capstone Project Praveen Sridhar Mentored by Nikhil Narayan, Lead Data Scientist, mﬁne.co

Introduction Predictions of Deep Learning models are diﬃcult to explain
and interpret Regulated industries like medical and ﬁnance industries often fall short from using Deep learning techniques due to this. Techniques for Interpretability help users to : • Trust and understand why predictions are made in a certain way • Provides accountability for usage in important decision making

Interpretability in the Medical Industry [Mukundhan et al. Google Research]

Objective 1. Research and implement algorithms for Neural Network Interpretability
in the domain of Computer Vision

Literature Review

Pixel-space Attribution [Srinivas et al. NIPS 2019]

Saliency Heatmaps Grad CAM Full Grad Integrated Gradients Original Image
[Srinivas et al. NIPS 2019]

Neuron Visualization Gradient Ascent Integrated Gradients SUMMIT [Hohman et al.]

Implementation

Attribute Python Library for Neural Network Interpretability in PyTorch and
TensorFlow

Design • Uniﬁed API for diﬀerent types of techniques •
Benchmarking for comparisons

Details of Algorithms implemented

Gradient Attribution CEO In this, the gradient of output with
respect to input image is taken as the attribution map

Smooth Gradients CEO Smoothed version of gradient attribution. Random noise
is added to the input image and averaged to get the final attribution

Integrated Gradients CEO

Integrated Gradients (continued) CEO

GradCAM CEO Generalization of CAM (Class Activation Maps) to arbitrary
architectures. CAM works only for networks with last layer as an Global Average Pooling layer Alpha = Weights[:, class_index] # (512,) FeatureMaps = getLastConvLayer() # (7,7,512) CAM = Alpha * FeatureMaps # (7,7) Upsample to original image size and overlay In Grad CAM, the equivalent to global average pooling is performed on the gradients of the output with respect to the feature maps Aij

FullGrad CEO FullGrad saliency uses a decomposition of the neural
network into input sensitivity and neuron sensitivity components to give an exact representation of attribution

FullGrad CEO Official implementation requires modification of the network definition
to get intermediate gradients and biases

FullGrad CEO In the attribute library, the implementation is flexible
and automatically extracts the required intermediate gradients and bias values using internal PyTorch APIs

Sample Outputs

Sample Outputs GradCAM

Benchmarks

Conclusion The required Neural Network Library was implemented in both
PyTorch and TensorFlow supporting the following algorithms • Gradient Attribution • Integrated Gradients • Smoothed Gradients • GradCAM • FullGrad The library features a consistent API across diﬀerent techniques as well as a benchmarking utility

Future Scope The project can be expanded to • More
techniques for interpretability, especially neuron visualisation • Support other modalities like Text and Speech A possible line of research was found while testing FullGrad technique, where it was discovered that heat maps produced were not very class discriminative. A combination of ideas from GradCAM could potentially solve this.

Thank you

Attribute - Python library for Neural Network I...

Attribute - Python library for Neural Network Interpretability

Praveen Sridhar

More Decks by Praveen Sridhar

Other Decks in Programming

Featured

Transcript

Attribute : Neural Network Interpretability in Pytorch and TensorFlow Plaksha

Introduction Predictions of Deep Learning models are diﬃcult to explain

Interpretability in the Medical Industry [Mukundhan et al. Google Research]

Objective 1. Research and implement algorithms for Neural Network Interpretability

Literature Review

Pixel-space Attribution [Srinivas et al. NIPS 2019]

Saliency Heatmaps Grad CAM Full Grad Integrated Gradients Original Image

Neuron Visualization Gradient Ascent Integrated Gradients SUMMIT [Hohman et al.]

Implementation