Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Gerrit Gruben: Limits of Data Science and other ethical considerations

Gerrit Gruben: Limits of Data Science and other ethical considerations

Faking statistics or doing bogus research on data has always been a classic and interesting topic. In the big data age, we observe otherwise rare phenomena such as the Simpson's paradox more often. There are also limits to our methods, both theoretical - think "black swan" - and human - think biases. I want to touch several topics to increase your consciousness and sharpen your critical thinking as an ethical data scientist. As everyone in Machine Learning has created a faulty experimental design at least once, this presentation is also of a high practical value. I will show-case you concrete examples of where the model evaluation has been screwed up for the disadvantage of human beings.

MunichDataGeeks

January 31, 2018
Tweet

More Decks by MunichDataGeeks

Other Decks in Science

Transcript

  1. about.me 2 •Freelance DS, before worked as DS/SWE. •Training people

    in a 3-month boot camp to be DS → •Org. of Kaggle Berlin meetup •ML PhD Dropout @ Potsdam •Degrees in Math. & CS, going for Laws (sic!) datascienceretreat.com
  2. 4

  3. Main points •No data positivism in ML • inductive bias

    always there • IID assumption is idealistic. •Can't predict everything •ML systems prone to manipulation (fragility) 5
  4. 9

  5. 11

  6. ”I beseech you, in the bowels of Christ, think it

    possible that you may be mistaken" --- Oliver Cromwell Dennis Lindley: avoid prior probabilities of 0 and 1.
  7. Problem of Induction •More general as the black swan problem.

    •ML models have an inductive bias. 13 ” The process of inferring a general law or principle from the observation of particular instances." --- Oxford's Dictionary (direct opposite of deduction)
  8. ” When you have two competing theories that make exactly

    the same predictions, the simpler one is the better." --- Ockham’s Razor
  9. Multiple Testing Retrying the tests so often, until "hitting" the

    significance level by chance. Solution: Bayesian or correction (e.g. Bonferroni correction) or different experimental design. Data Snooping: http://bit.ly/2iWoFrV
  10. "P-hacking" II "When a measure becomes a target, it ceases

    to be a good measure" --- Goodhart's law
  11. 23 Paper: http://bit.ly/2gBIR1M Prefer to call it “over-selection” In “Learning

    with Kernels” from Smola & Schölkopf they name ex. 5.10. “overfitting on the test set”.
  12. Messing up your experiments •Data split strategy is part of

    experiment. •Mainly care for: • Class distribution • Problem domain relevant issues such as time 33 ”Validation and Test sets should model nature and nature is not accommodating." --- Data Scientist’s Proverbs
  13. 34 “Model evaluation, model selection…“ by Sebastian Raschka: http://bit.ly/2p6PGY0 “Approximate

    Statistical Tests For Comparing Supervised Class. Learning Algorithms” (Dietterich 98): http://bit.ly/2wyItF6
  14. Feedback loops abused Tay.ai was a chat bot deployed on

    Twitter by Microsoft for just a day. Trolls started to "subvert" the bot by "teaching" it to be politically incorrect by focussed exposure to extreme content.
  15. Smaller tips for ML •Always model uncertainty. •Read this •Don’t

    mock values of a non-existant predictive model. 39
  16. Other Links •https://www.ma.utexas.edu/users/mks/statmistakes/StatisticsMist akes.html •Quantopian Lecture Series: p-Hacking and Multiple

    Comparison bias https://www.youtube.com/watch?v=YiDfbYtgUPc •David Hume: A Treatise on Human Nature: http://www.davidhume.org/texts/thn.html 41