'Why Should I Trust You?' Explaining the Predictions of Any Classifier

Slide 1

Slide 1 text

"Why Should I Trust You?" Explaining the Predictions of Any Classifier FATML 2016 November 18th 2016 Sameer Singh Marco Tulio Ribeiro Carlos Guestrin

Slide 2

Slide 2 text

Machine Learning is everywhere 2

Slide 3

Slide 3 text

Only 1 mistake! Predict wolf vs husky 3

Slide 4

Slide 4 text

Visual Question Answering Is there a moustache in the picture? > Yes What is the moustache made of? > Banana 4

Slide 5

Slide 5 text

Is it doing what we want? Or a neural network with more than a thousand layers? 5

Slide 6

Slide 6 text

They are essentially black-boxes How can we trust the predictions are correct? Trust How can we understand and predict the behavior? Predict How do we improve it to prevent potential mistakes? Improve 6

Slide 7

Slide 7 text

Fairness in ML I would like to apply for a loan.. Your loan has been denied Machine Learning What? Why? Cannot explain.. [0.25,-4.5,3.5,-10.4,…] Currently 7

Slide 8

Slide 8 text

Only 1 mistake! Predict wolf vs husky We’ve built a snow detector… 8

Slide 9

Slide 9 text

Slide 10

Slide 10 text

Visual Question Answering What is the moustache made of? > Banana What are the eyes made of? > Bananas What? > Banana What is? > Banana 10

Slide 11

Slide 11 text

What an explanation looks like Why did this happen? From: Keith Richards Subject: Christianity is the answer NTTP-Posting-Host: x.x.com I think Christianity is the one true religion. If you’d like to know more, send me a note 11

Slide 12

Slide 12 text

Three must-haves for a good explanation Definitely not interpretable Potentially interpretable Interpretability Humans can easily interpret reasoning 12

Slide 13

Slide 13 text

Three must-haves for a good explanation x y Learned model Not faithful to model Interpretability Humans can easily interpret reasoning Faithful Describe how the model actually works 13

Slide 14

Slide 14 text

Three must-haves for a good explanation Interpretability Humans can easily interpret reasoning Faithful Describe how the model actually works Model Agnostic Can explain any classifier 14

Slide 15

Slide 15 text

Local, Interpretable, Model-Agnostic Explanations (LIME) 1. Pick a model class interpretable by humans - May not be globally faithful… 2. Locally approximate global (blackbox) model - Simple model is globally bad, but locally good Line, shallow decision tree, sparse features, … Locally-faithful simple decision boundary ➔ Good explanation for prediction 15

Slide 16

Slide 16 text

LIME: General framework Instance Explanation Family Universe of possible explanations to search over Faithfulness of Explanation Is the explanation faithful to the model in context of x Interpretability Is the explanation simple enough to read? 16

Slide 17

Slide 17 text

1. Sample points around xi 2. Use complex model to predict labels for each sample 3. Weigh samples according to distance to xi 4. Learn new simple model on weighted samples 5. Use simple model to explain Sparse Linear Explanations 17

Slide 18

Slide 18 text

Perturbed Instances P(Labrador ) Sampling example - images Original Image 0.92 0.001 0.34 P(labrador) = 0.21 Locally weighted regression Explanation 18

Slide 19

Slide 19 text

Explaining Google’s Inception NN P( ) = 0.21 P( ) = 0.24 P( ) = 0.32 19

Slide 20

Slide 20 text

Only 1 mistake! Predict wolf vs husky 20

Slide 21

Slide 21 text

Explanations for neural network prediction We’ve built a great snow detector… ☹ 21

Slide 22

Slide 22 text

Did ML experts notice it? 0. 25. 50. 75. 100. Didn't trust the model "Snow insight" % of subjects (out of 27) Before explanations After explanations 22

Slide 23

Slide 23 text

Explanations should match our intuitions Original Image “Bad” Classifier “Good” Classifier 23

Slide 24

Slide 24 text

Explanation for a bad classifier From: Keith Richards Subject: Christianity is the answer NTTP-Posting-Host: x.x.com I think Christianity is the one true religion. If you’d like to know more, send me a note After looking at the explanation, we shouldn’t trust the model! 24

Slide 25

Slide 25 text

Explanation for a good classifier It seems to be picking up on more reasonable things.. good! 25

Slide 26

Slide 26 text

Comparing models with explanations 26

Slide 27

Slide 27 text

Comparing models If we picked based on accuracy, we would get it wrong. 40 53 65 78 90 103 Guessing w/ Explanations % picked better model 89% of users identify the more trustworthy model 27

Slide 28

Slide 28 text

Problem with linear explanations Gives a “general idea”, but not precise. What can I change, and not change, to keep the same prediction? 28

Slide 29

Slide 29 text

Prediction Invariance Instance Set of Constraints on x Conjunction of constraints on individual features Quality of Constraints Whether all instances that satisfy the constraints e have the same prediction Interpretability Are the set of constraints small? Find a set of constraints so that any other change to the input does not change the prediction 29

Slide 30

Slide 30 text

Tabular Data 30

Slide 31

Slide 31 text

Visual QA 31

Slide 32

Slide 32 text

Image Classification 32

Slide 33

Slide 33 text

“Programs” as Explanations Instance Set of Programs Universe of all possible programs/functions on x Quality of Program Does the function e on instances similar to x perform similar to the model Interpretability Is the program short? There are so many choices for interpretable models: Rules, decision trees, decision sets, falling lists, sparse linear models, etc. Why decide between them, search over ALL possible “programs”! 33

Slide 34

Slide 34 text

Salary Prediction 34

Slide 35

Slide 35 text

Model-Agnostic Explanations Model-agnostic explanations are critical for understanding ML! Can be applied on future models as well LIME Local, Interpretable Model-Agnostic Explanations Linear explanations: useful for complex text and image classifiers Prediction Invariant explanations: a more precise definition “Programs” as explanations: generalizes over any representations 35

Slide 36

Slide 36 text

Thank you! LIME Local, Interpretable Model-Agnostic Explanations Linear explanations: useful for complex text and image classifiers Prediction Invariant explanations: a more precise definition “Programs” as explanations: generalizes over any representations www.sameersingh.org github.com/marcotcr/lime