Keep calm and trust your model - On Explainability of Machine Learning Models

*notice issued in public interest by the Queen of NeuralNetworkLand

On Explainability of Machine Learning Models Praveen Sridhar @psbots praveen.ai

About me Machine Learning R&D for Machine Learning Engineer at
Praveen Sridhar @psbots praveen.ai

As we improve the accuracy with complex models, it is
becoming increasingly difﬁcult to explain how it is making those predictions Problem : source : DARPA

Interpretability is of prime importance in regulated industries Clinical &
Pharmacy Finance Medical Diagnosis Insurance Legal Defence source : Lipton 2017

Lets start with the basics Which models are explainable in
the ﬁrst place?

Linear, monotonic functions For any change in any independent variable,
the response function : ✦ changes in only one direction ✦ and at a magnitude represented by a readily available co-efﬁcient

The culprit : Non-Linear, non-monotonic functions Most algorithms create functions
that change in any direction and at varying rate Lets see how these can be explained

Model Specific approaches These are special methods for explainability very
specific to the model itself. ✦ Tree Interpreter ✦ Bayesian Rule Lists ✦ Model specific Visualisations ✦ Attention mechanism as explanations ✦ Generating explanations as part of model

Tree Interpreter intuitively, for each decision that a tree (or
a forest) makes there is a path (or paths) from the root of the tree to the leaf, consisting of a series of decisions, guarded by a particular feature, each of which contribute to the ﬁnal predictions. source : treeinterpreter

Tree Interpreter Behind the scenes : for a Decision Tree,
the prediction function can be written as K : number of features, c : value at the root of the node contrib(x,k) : contribution from the kth feature for a Random Forest, this can be expanded as simply the average of the bias terms plus the average contribution of each feature

Bayesian Rule Lists BRLs are interpretability taken to its extreme!
You can just read the model and understand its predictions. Take a look at this BRL classiﬁer ouput for the Titanic dataset : source : Letham et. al 2015

Bayesian Rule Lists Behind the scenes : The task of
a BRL is to create an ordered list of assertions ‘a’ (eg “if x is a male and an adult”) and then ﬁnd the probability of a predicted label ‘y’ source : Letham et. al 2015

Model Speciﬁc Visualisations Andrew NG’s famous ‘cat’ neuron FC8 Layer
‘Deep’ Visualisation of AlexNet source : Jason et. al 2015

Salesforce Einstein AI’s abstractive summarization Attention Mechanism as Explanations source
: Salesforce einstein.ai

Attention Mechanism as Explanations

Attention Mechanism as Explanations Behind the scenes : An attention
mechanism ✦ takes ’n’ arguments ‘y1,y2….,yn’ ✦ and a context vector ‘c’ to return a vector z which is the ‘summary’ of ‘y’ focusing on the information linked to the context ‘c’ more formally, it returns a weighted mean of ‘y’ with the weights chosen as per the relevance of each ‘yi’ given the context ‘c’

Generating explanations as part of model MIT’s CSAIL Lab’s research
on “Making computers explain themselves” source : Lei et. al 2016

Generating explanations as part of model Berkeley AI Lab’s research
on “Generating Visual Explanations” source : Lisa Anne et. al 2016

Model Agnostic approaches These are general methods for explainability ✦
Global Scope ➡ Variable Importance measure ➡ Residual plots ➡ Partial Dependance plots ➡ Surrogate models ✦ Local Scope ➡ LIME (Local Interpretable Model-agnostic Explanations)

Mushroom source : Reibero et. al 2016 LIME (Locally Interpretable
Model-agnostic Explanations)

Behind the scenes : Idea : while treating the model
as a black box, perturb the instance we want to explain and learn a sparse linear model around it, as an explanation LIME (Locally Interpretable Model-agnostic Explanations) Intuitively, an “explanation” is a local linear approximation of the model’s behaviour. source : Reibero et. al 2016

Layerwise Relevance Propagation Technique for determining which inputs in feature
vector contribute the strongest to the Neural Network output

Layerwise Relevance Propagation Behind the scenes : The goal of
LRP is to deﬁne some relevance measure R over the input vector such that we can express the network output as the sum of the values of R if we can decompose this function in terms of its partial derivatives, we can use that decomposition to approximate the relevance propagation function.

Layerwise Relevance Propagation Behind the scenes : The process is
similar in spirit to BackPropagation Deep Taylor Decomposition we can use a Taylor series to approximate the value of a function f(x) near a point x0 with The closer that x is to x0, the better the approximation. One clever thing that we can do is set x0 to be a “root point” of the forward propagation function, that is, a point such that f(x0)=0. This simpliﬁes the above Taylor expression to

Layerwise Relevance Propagation Behind the scenes : Root points of
the forward propagation function are located at the local decision boundary, so the gradients along that boundary point give us the most information about how the function separates the input by class.

LIVE DEMO Let’s apply some of the approaches we discussed.
Thanks to the following open source packages : ✦ Tree Interpreter : https://github.com/andosa/treeinterpreter ✦ LIME : https://github.com/marcotcr/lime ✦ ELI5 : https://github.com/TeamHG-Memex/eli5 ✦ BRL : https://github.com/tmadl/sklearn-expertsys https://github.com/psbots/explainableML

Thank you! Praveen Sridhar @psbots praveen.ai [email protected] mail me at

Keep calm and trust your model - On Explainabil...

Keep calm and trust your model - On Explainability of Machine Learning Models

Praveen Sridhar

More Decks by Praveen Sridhar

Other Decks in Technology

Featured

Transcript

*notice issued in public interest by the Queen of NeuralNetworkLand

On Explainability of Machine Learning Models Praveen Sridhar @psbots praveen.ai

About me Machine Learning R&D for Machine Learning Engineer at

As we improve the accuracy with complex models, it is

Interpretability is of prime importance in regulated industries Clinical &

Lets start with the basics Which models are explainable in

Linear, monotonic functions For any change in any independent variable,

The culprit : Non-Linear, non-monotonic functions Most algorithms create functions

Model Speciﬁc approaches These are special methods for explainability very

Tree Interpreter intuitively, for each decision that a tree (or

Tree Interpreter Behind the scenes : for a Decision Tree,

Bayesian Rule Lists BRLs are interpretability taken to its extreme!

Bayesian Rule Lists Behind the scenes : The task of

Model Speciﬁc Visualisations Andrew NG’s famous ‘cat’ neuron FC8 Layer

Salesforce Einstein AI’s abstractive summarization Attention Mechanism as Explanations source

Attention Mechanism as Explanations

Attention Mechanism as Explanations Behind the scenes : An attention

Generating explanations as part of model MIT’s CSAIL Lab’s research

Generating explanations as part of model Berkeley AI Lab’s research

Model Agnostic approaches These are general methods for explainability ✦

Mushroom source : Reibero et. al 2016 LIME (Locally Interpretable

Behind the scenes : Idea : while treating the model

Layerwise Relevance Propagation Technique for determining which inputs in feature

Layerwise Relevance Propagation Behind the scenes : The goal of

Layerwise Relevance Propagation Behind the scenes : The process is

Layerwise Relevance Propagation Behind the scenes : Root points of

LIVE DEMO Let’s apply some of the approaches we discussed.

Thank you! Praveen Sridhar @psbots praveen.ai [email protected] mail me at