data, in order to make predictions or inference. ๏ A model is a mapping ๏ A internal workings of a black-box model is opaque e.g. neural networks ๏ The internal workings of a white-box model is available for inspection e.g. linear regression, logistic regression f : X → Y 4
understand the cause of a decision” Miller, Tim. “Explanation in artificial intelligence: Insights from the social sciences.” arXiv Preprint arXiv:1706.07269. (2017) ”Interpretability is the degree to which a human can consistently predict the model’s result” Kim, Been, Rajiv Khanna, and Oluwasanmi O. Koyejo. “Examples are not enough, learn to criticize! Criticism for interpretability.” Advances in Neural Information Processing Systems (2016) 6
to explain the individual predictions of a black-box model. ๏ Local surrogate model should have local fidelity. 15 Source: https://github.com/marcotcr/lime
model being explained is a interpretable model measures how unfaithful is in approximating in the locality defined by measures the complexity ๏ There is a fidelity-interpretability trade-off ξ(x) = arg min g∈G ℒ(f, g, πx ) + Ω(x) f g ∈ G ℒ(f, g, πx ) g f πx Ω(x) 16