Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Opening the Black Box: Attempts to Understand the Results of Machine Learning Models - Michael Tiernay - LA Data Science Meetup - May 2017

Data Science LA
May 12, 2017
1.5k

Opening the Black Box: Attempts to Understand the Results of Machine Learning Models - Michael Tiernay - LA Data Science Meetup - May 2017

Data Science LA

May 12, 2017
Tweet

More Decks by Data Science LA

Transcript

  1. Opening the black box: Attempts to understand the results of

    machine learning models Michael Tiernay, PhD 05/12/2017
  2. Inferential Models Might Be Bad −300 0 300 600 0

    100 200 300 temperature pressure
  3. Local Vs. Global Interpretability 1. Local Interpretability - Focus on

    how a model works around a single or cluster of simililar observations 2. Global Interpretability - Focus on how a model works across all observations (i.e. coefficients from a liner regression)
  4. Why Do we Want Local Interpretability? Undrestand why a prediction

    is positive/negative Trust individual predictions (i.e. reasons for a prediction make sense to domain experts) Provide guidence for intervening strategies (i.e. the cancer is predicted to be caused by X, which can be treated with Y) These problems have been addressed by recent literature
  5. Why Do we Want Local Interpretability? Intuit: I work with

    develops models that detect risk/fraud among merchants who use our Quickbooks products to perform credit card / ACH transactions with their customers. So if we’re evaluating individual transactions, and we deem some to be high risk, then we pass them along to agents who review them more closely and determine whether to take some sort of action on the transaction. However, we want to be able to provide guidance to these agents - we don’t want to simply provide some sort of risk score, we want to provide some sort of human-readable intuitions regarding the score to point the agents in (what we believe to be) the right direction with their investigation Edmunds: Dealer Churn
  6. Why Do we Want Global Interpretability? Hypothesis Generation: Model can

    help generate new ideas that can be tested experimentally A global understanding of the ‘causes’ of an outcome can drive significant business/product changes This problem has not received much attention in the machine learning literature
  7. Raw Data ## Survived Pclass Sex Age ## 1 class_0

    3 2 22 ## 2 class_1 1 1 38 ## 3 class_1 3 1 26 ## 4 class_1 1 1 35 ## 5 class_0 3 2 35 ## 7 class_0 1 2 54
  8. Logistic Regression ## ## Call: ## NULL ## ## Deviance

    Residuals: ## Min 1Q Median 3Q Max ## -2.7270 -0.6799 -0.3947 0.6483 2.4668 ## ## Coefficients: ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) 7.578137 0.618091 12.261 < 2e-16 *** ## Pclass -1.288545 0.139259 -9.253 < 2e-16 *** ## Sex -2.522131 0.207283 -12.168 < 2e-16 *** ## Age -0.036929 0.007628 -4.841 1.29e-06 *** ## --- ## Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 ##
  9. Interaction terms? ## Sex Pclass n survived ## 1 1

    1 85 0.9647059 ## 2 1 2 74 0.9189189 ## 3 1 3 102 0.4607843 ## 4 2 1 101 0.3960396 ## 5 2 2 99 0.1515152 ## 6 2 3 253 0.1501976
  10. Lime (Locally Interpretable Model-agnostic Explanations) Mainly created for images and

    text Model agnostic Focus on one observation (x) at a time Sample other observations (z) weighted by distance to x Compute f(z) (The predicted outcome) Select K features with LASSO then compute least squares Coefficients from LS are ‘local effects’
  11. Shortcomings of Lime Good out-of-the-box solution that requires little thought

    Doesn’t allow control over local space Need to center/scale features for distance calculation Different effects up and down of binary/ordinal features
  12. Ice Box For 1 Feature, predict the likelihood of survival

    across the entire range of possible values for that feature Generalization of partial-dependence plots
  13. My Simulation Ideas (Distribution of effects) Local makes sense Control

    the changes BUT you can generalize up to the entire population Similar to average partial effects in econometrics HELP! [email protected]