Without operationalization and explainability, data science projects inevitably fail. In this presentation we discuss how to tackle both of these problems.
from across enterprise systems of record • Support • Engineering • Deep NLP to analyze unstructured data • Actionable signals routed to the appropriate owners • Ensemble models consume raw signals • Escalation prediction • Backlog prioritization • Product • Sales • Email • Slack • Discussion Forums
• Everyone hypes ML • Everyone has an ML project • But it certainly hasn’t yet… • Why hasn’t ML delivered on its promise? • Operationalization • Explainability • [and more]
but no one is around to use it…..” • In the commercial world, it’s crucial to • Affect behavior • Improve decisions • Drive outcomes • Deliver ROI • Many great DS projects result in nothing more than a PowerPoint • Failure to operationalize
first of many hurdles • Applied ML • Application of existing algorithms • Needs a partnership between engineers and data scientists • Operationalized solutions need ongoing maintenance • Models can need retraining • A simple understandable model in production often delivers benefits
• Recent explosion of solutions for basic ops • Standards languishing (PMML, PFA) • How do I integrate into users existing tooling? • How does my prediction API get queried? • Significant fragile glue-logic? • How do I build workflows around my predictions?
• How should they respond? • What is my engine warning light telling me? • Sometimes we intuitively know • It is predicted to rain tomorrow • Sometimes the end-user doesn’t necessarily need to care • Case assignment? • Sometimes understanding is fundamental to driving correct actions • Escalation prediction • Intelligent backlog prioritization
requirement • GDPR requirements • Debugging • Why did the model make this prediction? • Validation • Is my model doing what I think it is? • Model simplification • Many of these features don’t improve fidelity • Actions and workflows • Turn-key workflows
tooling to accelerate: • Feature Engineering • Model Selection • Hyper-parameter tuning • Model simplification • Many effective OSS solutions • For many problems automation solutions are adequate • How does AutoML impact explainability?
by choice of model • Complex models can deliver higher fidelity predictions • But it often comes at the cost of explainability • Variety of model agnostic techniques can provide insights into arbitrarily complex models
globally? • General information about the features that are the most important Individual observation explainability • Inspect an individual prediction of a model • Determine why the model made the decision it made
interpretable • Linear regression • Logistic regression • Decision tree • GLM • GAM y = ! !"#$%(' ()"(*+*"…" (-+- ) Logistic Regression A change in a feature by one unit changes the odds ratio (multiplicative) by a factor of exp(βj) cv y = / + ! ! + … + 0 0 Linear Regression Predictions are a weighted sum of the features, making predictions understandable cv
model behavior • Provides a good model sanity check • Easy to compute The Bad • Limited global information • Beware correlated features • Concerns with model specific methods
compared with feature importance The Bad • Beware correlated features • Average marginal plots can hide details • E.g. features displaying both negative and positive associations with target
displaying each individual observation • One line per instance that shows how the instance’s prediction changes when a feature changes • Interesting insights wont be lost because of the averaging inherent in the PDP
predictions of the black box model • Surrogate model needs to approximate predictions of black box as accurately as possible • Yet surrogate must be interpretable • Train a single global surrogate to probe high-level behavior • Train local surrogates to understand individual predictions • Important to understand the fidelity of the approximation • R2 Measure
Simple linear models are easily explainable • Complex models are locally linear (approximately) • Basic flow • Probe model around observation using slight perturbations in feature values • Train linear model on results • Use linear model to understand features driving prediction *https://github.com/marcotcr/lime
predictions for permuted observations using black box model 3. Compute the distance of each permutation from the original observation 4. Convert the distances to similarity scores • Exponential kernel of a user defined width 5. Fit linear (ridge) model to the permuted data • Permuted data further modified before training • Permuted data weighted by its similarity to the original observation • Probabilities form outcomes when explaining classifiers 6. Feature weights from the linear model drive explanations for the complex models local behavior
tabular data Text • Perturb input by randomly removing words from the observation text • Train an interpretable model on permuted observations • Uses cosine similarity to compute similarities scores • Leverage model to understand words driving black box prediction Images • Perturb image via superpixel construct • Superpixels defined using scikit-image segmentation methods • 'quickshift', 'slic', 'felzenszwalb'
infallible • Don’t trust blindly • Prediction quality illustrates how well it approximates the black-box • Low quality à explanation shouldn’t be trusted • Many tweakable parameters that influence outcomes • Big open question • What constitutes local?
From coalitional game theory • Determine how much each player (aka feature) in a collaborative game has contributed to success • Computationally intensive for real-world models & data sets • SHAP leverages approximations to control compute costs • Local linear models to estimate SHAP values for any model • Shapley derived weighting & sampling • Exact “high-speed” exact method for tree ensembles
feature to model output • Features “push” the model output from the base value to the model output • Features pushing the prediction higher are shown in red, • Features pushing the prediction lower are in blue
to as anchors • An anchor explanation is a rule that sufficiently “anchors” the prediction locally • Changes to the rest of the feature values of the instance do not matter. • For instances on which the anchor holds, the prediction is (almost) always the same • Explains the scope/coverage of the explanation • Provides clearer user understanding https://github.com/marcotcr/anchor
info on the features/values driving the prediction • Necessary to explain what this means to the user in human terms • Especially true when feature engineering creates non-obvious features
Audit & accountability reasons • Confidence & acceptance • Understanding of appropriate response/actions • Important to understand both global behavior and individual predictions • Simple models can be inherently interpretable • Blackbox models offer benefits but introduce complexity • Open source tools exist for attempting to explain backbox models • Try using them! • But don’t trust them blindly