Slide 1

Slide 1 text

How to guarantee your machine learning model will fail on first contact with the real world Jesper Dramsch PyData Global 2020

Slide 2

Slide 2 text

99.8 % Accuracy

Slide 3

Slide 3 text

Chihuahua or Blueberry Muffin? Real-World Applications are rarely 100%

Slide 4

Slide 4 text

Augmentations build better models Yet decrease metrics

Slide 5

Slide 5 text

Ignore the possibility Overfitting and Data Leakage

Slide 6

Slide 6 text

Are you overfitting?

Slide 7

Slide 7 text

Is there Data Leakage? [1]

Slide 8

Slide 8 text

Assume everything is IID

Slide 9

Slide 9 text

I I D Independent and Identically Distributed

Slide 10

Slide 10 text

The real world rarely is independent nor identically distributed.

Slide 11

Slide 11 text

Did you account for class imbalances? [1]

Slide 12

Slide 12 text

Always use Accuracy

Slide 13

Slide 13 text

Imbalanced Metrics [1]

Slide 14

Slide 14 text

Cities Dataset for Semantic Segmentation [1]

Slide 15

Slide 15 text

Losses for Semantic Segmentation [1]

Slide 16

Slide 16 text

Collecting more Data is a Better

Slide 17

Slide 17 text

A Good Data Scientist is Data Critical ● CERN throws away most of collected data at 25 GB/s [1] ● Geophysical data has to be reprocessed for many different use cases [2] ● Someone decides on social taxonomies. ImageNet class “looser / failure” as person. [3] ● GPT-2 was trained on Reddit comments. Try and ask it about Earth Science. [4]

Slide 18

Slide 18 text

Strategies That Work (Sometimes) ● Multiple Interpreters (Inter-interpreter) ● Repeat Interpretations (Intra-interpreter) ● Take Responsibility to Change Questionable Taxonomies ● Collect Representative Samples

Slide 19

Slide 19 text

Cross-Validation solves Everything

Slide 20

Slide 20 text

Cross Validation to the rescue?

Slide 21

Slide 21 text

Class Imbalances call for Stratification

Slide 22

Slide 22 text

Cross-Validation for Time Series Data

Slide 23

Slide 23 text

Cross Validation for Spatial Data

Slide 24

Slide 24 text

Are you Cross-Validating your data preparation? [6]

Slide 25

Slide 25 text

Even Cross-Validation has its Flaws [5]

Slide 26

Slide 26 text

Absolutely ignore Model Simplicity

Slide 27

Slide 27 text

News Item on AI for Earthquake Aftershock Prediction [8]

Slide 28

Slide 28 text

One Neuron outperforms a Deep Neural Network [9]

Slide 29

Slide 29 text

Can we Crash Test our Machine Learning? []

Slide 30

Slide 30 text

Trust any Counter-Intuitive Results

Slide 31

Slide 31 text

Extraordinary Claims Require Extraordinary Evidence

Slide 32

Slide 32 text

Inferring the face of a person from its speech patterns surely is extraordinary [1]

Slide 33

Slide 33 text

“AI” hiring decisions directly from video [1]

Slide 34

Slide 34 text

Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification - Buolamwini and Gebru [1] Critical Perspectives on Computer Vision - Denton [2] Excavating AI - Crawford and Paglen [3] Tutorial on Fairness Accountability Transparency and Ethics in Computer Vision at CVPR 2020 [4] The Uncanny Valley of ML - Andrews [5] Bias in Facial Recognition [6] Research into Bias in ML

Slide 35

Slide 35 text

Subject Matter Experts often forgot more about a Subject than a Data Scientists has learned during a Project

Slide 36

Slide 36 text

Data can often be explained by many hypotheses. [1]

Slide 37

Slide 37 text

Explainability shows how a Machine Learning Model thinks

Slide 38

Slide 38 text

Post-Hoc Explainability will explain “Why?” even on wrong decisions with 99% [1]

Slide 39

Slide 39 text

Calibration of Classifiers [1]

Slide 40

Slide 40 text

Shap Library for machine learning explainability [1]

Slide 41

Slide 41 text

Interpretability Explainable Forests, Linear Models, RuleFit Explainability SHAP, Partial Dependence Plots, Lime, Feature Importance

Slide 42

Slide 42 text

A Machine Learning Model can outperform your Assumptions and Baseline Data

Slide 43

Slide 43 text

Extracting information and establishing relationships is limiting the machine learning model.

Slide 44

Slide 44 text

Speedround: When your Machine Learning Model isn’t scoring perfectly, you can still Spice Up Your Results

Slide 45

Slide 45 text

It is uncomfortably common to hand-select “good” results [1]

Slide 46

Slide 46 text

It is uncomfortably common to overfit on benchmarks to “sell” a method [1]

Slide 47

Slide 47 text

Committing the “Inverse Crime” [1]

Slide 48

Slide 48 text

Measuring what’s easy to measure rather than meaningful

Slide 49

Slide 49 text

● Use nothing else but accuracy ● Under no circumstance spend extensive time on validation ● Blindly trust counter-intuitive results because the model converged ● Explainability is overrated but has all the answer ● Take all these points as gospel Main Take Aways