How to guarantee your machine learning model will fail on first contact with the real world.

How to guarantee your machine learning model will fail on
ﬁrst contact with the real world Jesper Dramsch PyData Global 2020

99.8 % Accuracy

Chihuahua or Blueberry Muﬃn? Real-World Applications are rarely 100%

Augmentations build better models Yet decrease metrics

Ignore the possibility Overﬁtting and Data Leakage

Are you overﬁtting?

Is there Data Leakage? [1]

Assume everything is IID

I I D Independent and Identically Distributed

The real world rarely is independent nor identically distributed.

Did you account for class imbalances? [1]

Always use Accuracy

Imbalanced Metrics [1]

Cities Dataset for Semantic Segmentation [1]

Losses for Semantic Segmentation [1]

Collecting more Data is a Better

A Good Data Scientist is Data Critical • CERN throws
away most of collected data at 25 GB/s [1] • Geophysical data has to be reprocessed for many different use cases [2] • Someone decides on social taxonomies. ImageNet class “looser / failure” as person. [3] • GPT-2 was trained on Reddit comments. Try and ask it about Earth Science. [4]

Strategies That Work (Sometimes) • Multiple Interpreters (Inter-interpreter) • Repeat
Interpretations (Intra-interpreter) • Take Responsibility to Change Questionable Taxonomies • Collect Representative Samples

Cross-Validation solves Everything

Cross Validation to the rescue?

Class Imbalances call for Stratiﬁcation

Cross-Validation for Time Series Data

Cross Validation for Spatial Data

Are you Cross-Validating your data preparation? [6]

Even Cross-Validation has its Flaws [5]

Absolutely ignore Model Simplicity

News Item on AI for Earthquake Aftershock Prediction [8]

One Neuron outperforms a Deep Neural Network [9]

Can we Crash Test our Machine Learning? []

Trust any Counter-Intuitive Results

Extraordinary Claims Require Extraordinary Evidence

Inferring the face of a person from its speech patterns
surely is extraordinary [1]

“AI” hiring decisions directly from video [1]

Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classiﬁcation -
Buolamwini and Gebru [1] Critical Perspectives on Computer Vision - Denton [2] Excavating AI - Crawford and Paglen [3] Tutorial on Fairness Accountability Transparency and Ethics in Computer Vision at CVPR 2020 [4] The Uncanny Valley of ML - Andrews [5] Bias in Facial Recognition [6] Research into Bias in ML

Subject Matter Experts often forgot more about a Subject than
a Data Scientists has learned during a Project

Data can often be explained by many hypotheses. [1]

Explainability shows how a Machine Learning Model thinks

Post-Hoc Explainability will explain “Why?” even on wrong decisions with
99% [1]

Calibration of Classiﬁers [1]

Shap Library for machine learning explainability [1]

Interpretability Explainable Forests, Linear Models, RuleFit Explainability SHAP, Partial Dependence
Plots, Lime, Feature Importance

A Machine Learning Model can outperform your Assumptions and Baseline
Data

Extracting information and establishing relationships is limiting the machine learning
model.

Speedround: When your Machine Learning Model isn’t scoring perfectly, you
can still Spice Up Your Results

It is uncomfortably common to hand-select “good” results [1]

It is uncomfortably common to overﬁt on benchmarks to “sell”
a method [1]

Committing the “Inverse Crime” [1]

Measuring what’s easy to measure rather than meaningful

• Use nothing else but accuracy • Under no circumstance
spend extensive time on validation • Blindly trust counter-intuitive results because the model converged • Explainability is overrated but has all the answer • Take all these points as gospel Main Take Aways

How to guarantee your machine learning model wi...

How to guarantee your machine learning model will fail on first contact with the real world.

More Decks by Jesper Dramsch

Other Decks in Technology

Featured

Transcript