Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Lies, Damned Lies and Statistics @ PyCon Italia...

Lies, Damned Lies and Statistics @ PyCon Italia 2018

Statistics show that eating ice cream causes death by drowning.

If this sounds baffling, this talk will help you to understand correlation, bias, statistical significance and other statistical techniques that are commonly (mis)used to support an argument that leads, by accident or on purpose, to drawing the wrong conclusions.

The casual observer is exposed to the use of statistics and probability in everyday life, but it is extremely easy to fall victim of a statistical fallacy, even for professional users.

The purpose of this talk is to help the audience understand how to recognise and avoid these fallacies, by combining an introduction to statistics with examples of lies and damned lies, in a way that is approachable for beginners.

Marco Bonzanini

April 19, 2018
Tweet

More Decks by Marco Bonzanini

Other Decks in Science

Transcript

  1. This talk is about: • The misuse of statistics in

    everyday life • How (not) to lie with statistics This talk is not about: • Python • Advanced Statistical Models The audience (you!): • Good citizens • An interest in statistical literacy
 (without an advanced Math degree?) 3
  2. Correlation • Informal: a connection between two things • Measure

    the strength of the association between two variables 6
  3. Correlation and causation • A causes B, or B causes

    A • A and B both cause C • C causes A and B • A causes C, and C causes B • No connection between A and B 19
  4. 27 University of California, Berkeley Graduate school admissions in 1973

    https://en.wikipedia.org/wiki/Simpson%27s_paradox
  5. 28 University of California, Berkeley Graduate school admissions in 1973

    https://en.wikipedia.org/wiki/Simpson%27s_paradox Gender bias?
  6. 29 University of California, Berkeley Graduate school admissions in 1973

    https://en.wikipedia.org/wiki/Simpson%27s_paradox
  7. 30 University of California, Berkeley Graduate school admissions in 1973

    https://en.wikipedia.org/wiki/Simpson%27s_paradox
  8. 31 University of California, Berkeley Graduate school admissions in 1973

    https://en.wikipedia.org/wiki/Simpson%27s_paradox
  9. 32 University of California, Berkeley Graduate school admissions in 1973

    https://en.wikipedia.org/wiki/Simpson%27s_paradox
  10. Sampling 35 • A selection of a subset of individuals

    • Purpose: estimate about the whole population • Hello Big Data!
  11. “Dewey defeats Truman” 40 https://en.wikipedia.org/wiki/Dewey_Defeats_Truman • The Chicago Tribune printed

    the wrong headline on election night • The editor trusted the results of the phone survey • … in 1948, a sample of phone users was not representative of the general population
  12. Survivorship Bias • Bill Gates, Steve Jobs, Mark Zuckerberg
 are

    all college drop-outs • … should you quit studying? 42
  13. Statistically Significant Results 60 • We are quite sure they

    are reliable (not by chance) • Maybe they’re not “big” • Maybe they’re not important • Maybe they’re not useful for decision making
  14. p-values • Probability of observing our results (or more extreme)

    when the null hypothesis is true • Probability, not certainty • Often p < 0.05 (arbitrary) • Can we afford to be fooled by randomness
 every 1 time out of 20? 63
  15. 65

  16. Data dredging • a.k.a. Data fishing or p-hacking • Convention:

    formulate hypothesis, collect data, prove/disprove hypothesis • Data dredging: look for patterns until something statistically significant comes up • Looking for patterns is ok
 Testing the hypothesis on the same data set is not 66
  17. 69 • Good Science ™ vs. Big headlines • Nobody

    is immune • Ask questions: What is the context? Who’s paying? What’s missing? • … “so what?”