Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How to guarantee your machine learning model will fail on first contact with the real world.

How to guarantee your machine learning model will fail on first contact with the real world.

Recently I had my PhD thesis rejected. As a failure, I am uniquely positioned to recognize failure. While I am unwaveringly enthusiastic about machine learning, I aim to share my insights into failed machine learning modelling from real-world examples in science and industry. This talk is for you if you have an introductory understanding of machine learning and would like to avoid common pitfalls.

Jesper Dramsch

November 15, 2020
Tweet

More Decks by Jesper Dramsch

Other Decks in Technology

Transcript

  1. How to guarantee your machine learning model will fail on

    first contact with the real world Jesper Dramsch PyData Global 2020
  2. A Good Data Scientist is Data Critical • CERN throws

    away most of collected data at 25 GB/s [1] • Geophysical data has to be reprocessed for many different use cases [2] • Someone decides on social taxonomies. ImageNet class “looser / failure” as person. [3] • GPT-2 was trained on Reddit comments. Try and ask it about Earth Science. [4]
  3. Strategies That Work (Sometimes) • Multiple Interpreters (Inter-interpreter) • Repeat

    Interpretations (Intra-interpreter) • Take Responsibility to Change Questionable Taxonomies • Collect Representative Samples
  4. Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification -

    Buolamwini and Gebru [1] Critical Perspectives on Computer Vision - Denton [2] Excavating AI - Crawford and Paglen [3] Tutorial on Fairness Accountability Transparency and Ethics in Computer Vision at CVPR 2020 [4] The Uncanny Valley of ML - Andrews [5] Bias in Facial Recognition [6] Research into Bias in ML
  5. Subject Matter Experts often forgot more about a Subject than

    a Data Scientists has learned during a Project
  6. • Use nothing else but accuracy • Under no circumstance

    spend extensive time on validation • Blindly trust counter-intuitive results because the model converged • Explainability is overrated but has all the answer • Take all these points as gospel Main Take Aways