Slide 1

Slide 1 text

Opening the black box: Attempts to understand the results of machine learning models Michael Tiernay, PhD 05/12/2017

Slide 2

Slide 2 text

The Problem

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

Inferential Models Might Be Bad −300 0 300 600 0 100 200 300 temperature pressure

Slide 5

Slide 5 text

Non-Parametric Models Are Hard to Understand

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

Local Vs. Global Interpretability

Slide 9

Slide 9 text

Local Vs. Global Interpretability 1. Local Interpretability - Focus on how a model works around a single or cluster of simililar observations 2. Global Interpretability - Focus on how a model works across all observations (i.e. coefficients from a liner regression)

Slide 10

Slide 10 text

Why Do we Want Local Interpretability? Undrestand why a prediction is positive/negative Trust individual predictions (i.e. reasons for a prediction make sense to domain experts) Provide guidence for intervening strategies (i.e. the cancer is predicted to be caused by X, which can be treated with Y) These problems have been addressed by recent literature

Slide 11

Slide 11 text

Why Do we Want Local Interpretability? Intuit: I work with develops models that detect risk/fraud among merchants who use our Quickbooks products to perform credit card / ACH transactions with their customers. So if we’re evaluating individual transactions, and we deem some to be high risk, then we pass them along to agents who review them more closely and determine whether to take some sort of action on the transaction. However, we want to be able to provide guidance to these agents - we don’t want to simply provide some sort of risk score, we want to provide some sort of human-readable intuitions regarding the score to point the agents in (what we believe to be) the right direction with their investigation Edmunds: Dealer Churn

Slide 12

Slide 12 text

Why Do we Want Global Interpretability? Hypothesis Generation: Model can help generate new ideas that can be tested experimentally A global understanding of the ‘causes’ of an outcome can drive significant business/product changes This problem has not received much attention in the machine learning literature

Slide 13

Slide 13 text

Let’s Look At a Toy Example

Slide 14

Slide 14 text

Logistic Regression Example

Slide 15

Slide 15 text

Raw Data ## Survived Pclass Sex Age ## 1 class_0 3 2 22 ## 2 class_1 1 1 38 ## 3 class_1 3 1 26 ## 4 class_1 1 1 35 ## 5 class_0 3 2 35 ## 7 class_0 1 2 54

Slide 16

Slide 16 text

Logistic Regression ## ## Call: ## NULL ## ## Deviance Residuals: ## Min 1Q Median 3Q Max ## -2.7270 -0.6799 -0.3947 0.6483 2.4668 ## ## Coefficients: ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) 7.578137 0.618091 12.261 < 2e-16 *** ## Pclass -1.288545 0.139259 -9.253 < 2e-16 *** ## Sex -2.522131 0.207283 -12.168 < 2e-16 *** ## Age -0.036929 0.007628 -4.841 1.29e-06 *** ## --- ## Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 ##

Slide 17

Slide 17 text

Interaction terms? ## Sex Pclass n survived ## 1 1 1 85 0.9647059 ## 2 1 2 74 0.9189189 ## 3 1 3 102 0.4607843 ## 4 2 1 101 0.3960396 ## 5 2 2 99 0.1515152 ## 6 2 3 253 0.1501976

Slide 18

Slide 18 text

Lime (Locally Interpretable Model-agnostic Explanations)

Slide 19

Slide 19 text

Lime (Locally Interpretable Model-agnostic Explanations) Mainly created for images and text Model agnostic Focus on one observation (x) at a time Sample other observations (z) weighted by distance to x Compute f(z) (The predicted outcome) Select K features with LASSO then compute least squares Coefficients from LS are ‘local effects’

Slide 20

Slide 20 text

Local Interpretability Example

Slide 21

Slide 21 text

Titanic Effects (Men)

Slide 22

Slide 22 text

Titanic Effects (Women)

Slide 23

Slide 23 text

Shortcomings of Lime Good out-of-the-box solution that requires little thought Doesn’t allow control over local space Need to center/scale features for distance calculation Different effects up and down of binary/ordinal features

Slide 24

Slide 24 text

Ice Box

Slide 25

Slide 25 text

Ice Box For 1 Feature, predict the likelihood of survival across the entire range of possible values for that feature Generalization of partial-dependence plots

Slide 26

Slide 26 text

Gender −4 −2 0 2 4 Sex partial log−odds 1 2

Slide 27

Slide 27 text

Class −4 −2 0 2 4 Pclass partial log−odds 1 2 3

Slide 28

Slide 28 text

My Simulation Ideas (Distribution of effects)

Slide 29

Slide 29 text

My Simulation Ideas (Distribution of effects) Local makes sense Control the changes BUT you can generalize up to the entire population Similar to average partial effects in econometrics HELP! [email protected]

Slide 30

Slide 30 text

Links

Slide 31

Slide 31 text

https://www.oreilly.com/ideas/ ideas-on-interpreting-machine-learning https://github.com/marcotcr/lime https://www.oreilly.com/learning/ introduction-to-local-interpretable-model-agnostic-expl http: //marcotcr.github.io/lime/tutorials/Tutorial%20-% 20continuous%20and%20categorical%20features.html https://arxiv.org/pdf/1602.04938.pdf http://scweiss.blogspot.com/2015/12/ beyond-beta-relationships-between.html https://cran.r-project.org/web/packages/ICEbox/ ICEbox.pdf https://arxiv.org/pdf/1309.6392.pdf https://stats.stackexchange.com/questions/73449/ average-partial-effects