Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Black Box Machine Learning: Complexity and Trust

Black Box Machine Learning: Complexity and Trust

Modern ML algorithms are achieving remarkable results thanks to the continuous development of deeper architectures that can identify complex patterns that are closer to human understanding. However, these results come with a price to be paid: understanding ML predictions is becoming increasingly difficult and, in some cases, even impossible. The problem is more relevant because of regulatory requirements that oftentimes explicitly demand well-formed and understandable explanations for automatic decisions made by ML-powered processes. The need for ML arises when you know the questions and answers but you don’t know an easy way to get from one to the other.

Massimo Belloni

July 08, 2021
Tweet

More Decks by Massimo Belloni

Other Decks in Technology

Transcript

  1. Black Box Machine Learning: Complexity and Trust Massimo Belloni -

    [email protected] Dataiku Meetup // 8th July 2021 5PM BST LinkedIn: in/massibelloni/ Medium: @massibelloni
  2. Black Box Machine Learning - Complexity and Trust Dataiku Meetup

    // 8th July 2021 5PM BST Massimo Belloni Senior Data Scientist @ Bumble • MLOps, NLP and miscellaneous • Interests in philosophy of science, consciousness, Strong vs Weak AI ◦ Coding consciousness ◦ Interpretability and trust • MSc Computer Science and Engineering (Politecnico di Milano)
  3. Black Box Machine Learning - Complexity and Trust Dataiku Meetup

    // 8th July 2021 5PM BST model cat dog dog 0.98 cat 0.02 during training we change the model’s weights in order to maximise the performances on the training set, for which we know the labels if the model generalises well, we are confident enough to use the weights to predict on unseen images
  4. Black Box Machine Learning - Complexity and Trust Dataiku Meetup

    // 8th July 2021 5PM BST task complexity for a human task complexity for a machine easy hard Natural Language Understanding, Image Classification Arithmetic, Calculus, ... Fraud Detection, Credit Risk Modelling, ... hard Generating/summarising knowledge/art, Elaborate complex strategies, ...
  5. Black Box Machine Learning - Complexity and Trust Dataiku Meetup

    // 8th July 2021 5PM BST What interpretability actually means? The degree to which a human can understand the cause of a decision. [1] [1] Miller, Tim. "Explanation in artificial intelligence: Insights from the social sciences." [2] Molnar, Christoph “Interpretable Machine Learning” • A model is better interpretable than another model if its decisions are easier for a human to comprehend than decisions from the other model. [2] • A model is as interpretable as the quality of the explanations given for its predictions. • For more complex models, we often require complex techniques.
  6. Black Box Machine Learning - Complexity and Trust Dataiku Meetup

    // 8th July 2021 5PM BST Why do we need interpretability? • Trust If we know how the model thinks, we tend to trust its performances more before shipping to production. • Extracting knowledge Learning model’s weights and reasoning is the task. Humans (or natural events) are able to label a dataset but need to find some reasons or rules on how to come up with the decision. • Regulations In highly regulated sectors (eg. banking, finance) we must provide users with reasons behind an automated decision. Accuracy of the prediction isn’t enough.
  7. Black Box Machine Learning - Complexity and Trust Dataiku Meetup

    // 8th July 2021 5PM BST task complexity for a machine / model power explainability of the predictions linear models decision trees random forests shallow NN Deep Learning In every case, even the most complex model is still a deterministic function! No black magic here :) XGBoost (et al)
  8. Black Box Machine Learning - Complexity and Trust Dataiku Meetup

    // 8th July 2021 5PM BST Inherently interpretable models Logistic Regression Decision Tree 0 1 0.5 X higher the value of b, higher the correlation between the feature and the output probability if x 1 == A 0 ... if x n == B 1 ... the closer to the root, the more relevant is the feature
  9. Black Box Machine Learning - Complexity and Trust Dataiku Meetup

    // 8th July 2021 5PM BST A complex approach: LIME trained model (black box) interpretable model (eg. linear, tree, …) the learned model should be a good approximation of the machine learning model predictions locally, but it does not have to be a good global approximation. starting from the input sample, a new dataset consisting of perturbed replicas and their predictions is generated a new interpretable model is trained on the generated dataset and their predictions, weighted on their proximity from the original input
  10. Black Box Machine Learning - Complexity and Trust Dataiku Meetup

    // 8th July 2021 5PM BST ML Interpretability Techniques Do we need to know how the model works? Model specific Model agnostic Intrinsic methods Analyse the model’s architecture/weights Post hoc methods Apply models that analyse the inference process after training Feature summary statistics Obtain an understanding of which input features contributed to a prediction/have more importance. Data points Extract data points from the input space that help understanding a prediction (eg. counterfactual explanations) [1] [1] Piccinini, Belloni, Della Valle "Enhancing Fraud Detection Through Interpretable Machine Learning" yes no
  11. Black Box Machine Learning - Complexity and Trust Dataiku Meetup

    // 8th July 2021 5PM BST Do we trust the model more?
  12. Black Box Machine Learning - Complexity and Trust Dataiku Meetup

    // 8th July 2021 5PM BST Measuring accuracy training set validation set test set finally, an unseen set of samples is used to check performances of the trained model the validation set is needed to take decisions over hyperparameters (etc). training set and validation set are usually merged together to train the final model used for production (using parameters chosen as above!) Accuracy = 95%
  13. Black Box Machine Learning - Complexity and Trust Dataiku Meetup

    // 8th July 2021 5PM BST What can go wrong? • Wrong implementation of the classical ML pipeline (data leakage, etc) leading to overly optimistic performance estimation • Test set not representative of the production load ◦ Unbalanced target (=wrongly estimated precision) ◦ Different input distribution ◦ … • Bonus: ill posed problem, messy definition of the targets (usually in NLP tasks)
  14. Black Box Machine Learning - Complexity and Trust Dataiku Meetup

    // 8th July 2021 5PM BST Wrongly estimating the performances affects trust in a model significantly more than the inability to interpret its results!
  15. Black Box Machine Learning - Complexity and Trust Dataiku Meetup

    // 8th July 2021 5PM BST TL; DR • Interpreting a task isn’t inherently possible for all of them, since humans can struggle to explain why they take decisions in the first place • ML Interpretability is a fast growing domain that definitely helps to understand what’s going on inside black boxes and to spot huge mistakes or biases • Trust in ML models and AI is more a matter of well executed validation pipelines with reliable datasets than the ability to locally interpret some cherry picked cases