Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Opening the Black Box: Attempts to Understand the Results of Machine Learning Models - Michael Tiernay - LA Data Science Meetup - May 2017

Data Science LA
May 12, 2017
1.5k

Opening the Black Box: Attempts to Understand the Results of Machine Learning Models - Michael Tiernay - LA Data Science Meetup - May 2017

Data Science LA

May 12, 2017
Tweet

More Decks by Data Science LA

Transcript

  1. Opening the black box: Attempts to understand
    the results of machine learning models
    Michael Tiernay, PhD
    05/12/2017

    View full-size slide

  2. Inferential Models Might Be Bad
    −300
    0
    300
    600
    0 100 200 300
    temperature
    pressure

    View full-size slide

  3. Non-Parametric Models Are Hard to Understand

    View full-size slide

  4. Local Vs. Global Interpretability

    View full-size slide

  5. Local Vs. Global Interpretability
    1. Local Interpretability - Focus on how a model works around a
    single or cluster of simililar observations
    2. Global Interpretability - Focus on how a model works across all
    observations (i.e. coefficients from a liner regression)

    View full-size slide

  6. Why Do we Want Local Interpretability?
    Undrestand why a prediction is positive/negative
    Trust individual predictions (i.e. reasons for a prediction make
    sense to domain experts)
    Provide guidence for intervening strategies (i.e. the cancer is
    predicted to be caused by X, which can be treated with Y)
    These problems have been addressed by recent literature

    View full-size slide

  7. Why Do we Want Local Interpretability?
    Intuit: I work with develops models that detect risk/fraud
    among merchants who use our Quickbooks products to
    perform credit card / ACH transactions with their customers.
    So if we’re evaluating individual transactions, and we deem
    some to be high risk, then we pass them along to agents who
    review them more closely and determine whether to take some
    sort of action on the transaction. However, we want to be able
    to provide guidance to these agents - we don’t want to simply
    provide some sort of risk score, we want to provide some sort
    of human-readable intuitions regarding the score to point the
    agents in (what we believe to be) the right direction with their
    investigation
    Edmunds: Dealer Churn

    View full-size slide

  8. Why Do we Want Global Interpretability?
    Hypothesis Generation: Model can help generate new ideas
    that can be tested experimentally
    A global understanding of the ‘causes’ of an outcome can drive
    significant business/product changes
    This problem has not received much attention in the machine
    learning literature

    View full-size slide

  9. Let’s Look At a Toy Example

    View full-size slide

  10. Logistic Regression Example

    View full-size slide

  11. Raw Data
    ## Survived Pclass Sex Age
    ## 1 class_0 3 2 22
    ## 2 class_1 1 1 38
    ## 3 class_1 3 1 26
    ## 4 class_1 1 1 35
    ## 5 class_0 3 2 35
    ## 7 class_0 1 2 54

    View full-size slide

  12. Logistic Regression
    ##
    ## Call:
    ## NULL
    ##
    ## Deviance Residuals:
    ## Min 1Q Median 3Q Max
    ## -2.7270 -0.6799 -0.3947 0.6483 2.4668
    ##
    ## Coefficients:
    ## Estimate Std. Error z value Pr(>|z|)
    ## (Intercept) 7.578137 0.618091 12.261 < 2e-16 ***
    ## Pclass -1.288545 0.139259 -9.253 < 2e-16 ***
    ## Sex -2.522131 0.207283 -12.168 < 2e-16 ***
    ## Age -0.036929 0.007628 -4.841 1.29e-06 ***
    ## ---
    ## Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1
    ##

    View full-size slide

  13. Interaction terms?
    ## Sex Pclass n survived
    ## 1 1 1 85 0.9647059
    ## 2 1 2 74 0.9189189
    ## 3 1 3 102 0.4607843
    ## 4 2 1 101 0.3960396
    ## 5 2 2 99 0.1515152
    ## 6 2 3 253 0.1501976

    View full-size slide

  14. Lime (Locally Interpretable Model-agnostic
    Explanations)

    View full-size slide

  15. Lime (Locally Interpretable Model-agnostic Explanations)
    Mainly created for images and text
    Model agnostic
    Focus on one observation (x) at a time
    Sample other observations (z) weighted by distance to x
    Compute f(z) (The predicted outcome)
    Select K features with LASSO then compute least squares
    Coefficients from LS are ‘local effects’

    View full-size slide

  16. Local Interpretability Example

    View full-size slide

  17. Titanic Effects (Men)

    View full-size slide

  18. Titanic Effects (Women)

    View full-size slide

  19. Shortcomings of Lime
    Good out-of-the-box solution that requires little thought
    Doesn’t allow control over local space
    Need to center/scale features for distance calculation
    Different effects up and down of binary/ordinal features

    View full-size slide

  20. Ice Box
    For 1 Feature, predict the likelihood of survival across the
    entire range of possible values for that feature
    Generalization of partial-dependence plots

    View full-size slide

  21. Gender
    −4 −2 0 2 4
    Sex
    partial log−odds
    1 2

    View full-size slide

  22. Class
    −4 −2 0 2 4
    Pclass
    partial log−odds
    1 2 3

    View full-size slide

  23. My Simulation Ideas (Distribution of effects)

    View full-size slide

  24. My Simulation Ideas (Distribution of effects)
    Local makes sense
    Control the changes
    BUT you can generalize up to the entire population
    Similar to average partial effects in econometrics
    HELP!
    [email protected]

    View full-size slide

  25. https://www.oreilly.com/ideas/
    ideas-on-interpreting-machine-learning
    https://github.com/marcotcr/lime
    https://www.oreilly.com/learning/
    introduction-to-local-interpretable-model-agnostic-expl
    http:
    //marcotcr.github.io/lime/tutorials/Tutorial%20-%
    20continuous%20and%20categorical%20features.html
    https://arxiv.org/pdf/1602.04938.pdf
    http://scweiss.blogspot.com/2015/12/
    beyond-beta-relationships-between.html
    https://cran.r-project.org/web/packages/ICEbox/
    ICEbox.pdf
    https://arxiv.org/pdf/1309.6392.pdf
    https://stats.stackexchange.com/questions/73449/
    average-partial-effects

    View full-size slide