Slide 1

Slide 1 text

Lies, Damned Lies
 and Statistics @MarcoBonzanini EuroPython 2018 Edinburgh, UK July 2018

Slide 2

Slide 2 text

In the Vatican City
 there are 5.88 popes
 per square mile 2

Slide 3

Slide 3 text

This talk is about: • The misuse of statistics in everyday life • How (not) to lie with statistics This talk is not about: • Python • Advanced Statistical Models The audience (you!): • Good citizens • An interest in statistical literacy
 (without an advanced Math degree?) 3

Slide 4

Slide 4 text

LIES, DAMNED LIES
 AND CORRELATION

Slide 5

Slide 5 text

Correlation 5

Slide 6

Slide 6 text

Correlation • Informal: a connection between two things • Measure the strength of the association between two variables 6

Slide 7

Slide 7 text

Linear Correlation 7

Slide 8

Slide 8 text

Linear Correlation 8 Positive Negative x x y y

Slide 9

Slide 9 text

Correlation Example 9

Slide 10

Slide 10 text

Correlation Example 10 Temperature Ice Cream
 Sales ($$$)

Slide 11

Slide 11 text

“Correlation 
 does not imply
 causation” 11

Slide 12

Slide 12 text

12 Deaths by
 drowning Ice Cream
 Sales ($$$)

Slide 13

Slide 13 text

13 Lurking Variable

Slide 14

Slide 14 text

14 Temperature Ice Cream
 Sales ($$$) Temperature Deaths by
 drowning Lurking Variable

Slide 15

Slide 15 text

More Lurking Variables 15

Slide 16

Slide 16 text

More Lurking Variables 16 Damage
 caused
 by fire Firefighters
 deployed

Slide 17

Slide 17 text

More Lurking Variables 17 Damage
 caused
 by fire Firefighters
 deployed Fire severity?

Slide 18

Slide 18 text

Correlation and causation 18

Slide 19

Slide 19 text

Correlation and causation • A causes B, or B causes A • A and B both cause C • C causes A and B • A causes C, and C causes B • No connection between A and B 19

Slide 20

Slide 20 text

20 http://www.tylervigen.com/spurious-correlations

Slide 21

Slide 21 text

21 http://www.tylervigen.com/spurious-correlations

Slide 22

Slide 22 text

22 https://www.buzzfeed.com/kjh2110/the-10-most-bizarre-correlations

Slide 23

Slide 23 text

23 https://www.buzzfeed.com/kjh2110/the-10-most-bizarre-correlations

Slide 24

Slide 24 text

24 http://www.nejm.org/doi/full/10.1056/NEJMon1211064

Slide 25

Slide 25 text

LIES, DAMNED LIES,
 SLICING AND DICING
 YOUR DATA

Slide 26

Slide 26 text

Simpson’s
 Paradox 26

Slide 27

Slide 27 text

27 University of California, Berkeley Graduate school admissions in 1973 https://en.wikipedia.org/wiki/Simpson%27s_paradox

Slide 28

Slide 28 text

28 University of California, Berkeley Graduate school admissions in 1973 https://en.wikipedia.org/wiki/Simpson%27s_paradox Gender bias?

Slide 29

Slide 29 text

29 University of California, Berkeley Graduate school admissions in 1973 https://en.wikipedia.org/wiki/Simpson%27s_paradox

Slide 30

Slide 30 text

30 University of California, Berkeley Graduate school admissions in 1973 https://en.wikipedia.org/wiki/Simpson%27s_paradox

Slide 31

Slide 31 text

31 University of California, Berkeley Graduate school admissions in 1973 https://en.wikipedia.org/wiki/Simpson%27s_paradox

Slide 32

Slide 32 text

32 University of California, Berkeley Graduate school admissions in 1973 https://en.wikipedia.org/wiki/Simpson%27s_paradox

Slide 33

Slide 33 text

LIES, DAMNED LIES
 AND SAMPLING BIAS

Slide 34

Slide 34 text

Sampling 34

Slide 35

Slide 35 text

Sampling 35 • A selection of a subset of individuals • Purpose: estimate about the whole population • Hello Big Data!

Slide 36

Slide 36 text

Bias 36

Slide 37

Slide 37 text

Bias 37 • Prejudice? Intuition? • Cultural context? • In science: a systematic error

Slide 38

Slide 38 text

“Dewey defeats Truman” 38

Slide 39

Slide 39 text

“Dewey defeats Truman” 39 https://en.wikipedia.org/wiki/Dewey_Defeats_Truman

Slide 40

Slide 40 text

“Dewey defeats Truman” 40 https://en.wikipedia.org/wiki/Dewey_Defeats_Truman • The Chicago Tribune printed the wrong headline on election night • The editor trusted the results of the phone survey • … in 1948, a sample of phone users was not representative of the general population

Slide 41

Slide 41 text

Survivorship Bias 41

Slide 42

Slide 42 text

Survivorship Bias • Bill Gates, Steve Jobs, Mark Zuckerberg
 are all college drop-outs • … should you quit studying? 42

Slide 43

Slide 43 text

LIES, DAMNED LIES
 AND DATAVIZ

Slide 44

Slide 44 text

“A picture is worth a thousand words” 44

Slide 45

Slide 45 text

45 https://en.wikipedia.org/wiki/Anscombe%27s_quartet

Slide 46

Slide 46 text

46 https://venngage.com/blog/misleading-graphs/

Slide 47

Slide 47 text

47 https://venngage.com/blog/misleading-graphs/

Slide 48

Slide 48 text

48 https://venngage.com/blog/misleading-graphs/

Slide 49

Slide 49 text

49 http://www.businessinsider.com/gun-deaths-in-florida-increased-with-stand-your-ground-2014-2?IR=T

Slide 50

Slide 50 text

50 http://www.businessinsider.com/gun-deaths-in-florida-increased-with-stand-your-ground-2014-2?IR=T

Slide 51

Slide 51 text

51 http://www.businessinsider.com/gun-deaths-in-florida-increased-with-stand-your-ground-2014-2?IR=T

Slide 52

Slide 52 text

52 https://www.raiplay.it/video/2016/04/Agor224-del-08042016-4d84cebb-472c-442c-82e0-df25c7e4d0ce.html

Slide 53

Slide 53 text

53 https://www.theguardian.com/news/datablog/2014/may/12/lies-election-leaflets-five-tricks-european-elections

Slide 54

Slide 54 text

54 https://www.theguardian.com/news/datablog/2014/may/12/lies-election-leaflets-five-tricks-european-elections

Slide 55

Slide 55 text

55 https://www.theguardian.com/news/datablog/2014/may/12/lies-election-leaflets-five-tricks-european-elections

Slide 56

Slide 56 text

56 https://www.theguardian.com/news/datablog/2014/may/12/lies-election-leaflets-five-tricks-european-elections

Slide 57

Slide 57 text

57 https://www.theguardian.com/news/datablog/2014/may/12/lies-election-leaflets-five-tricks-european-elections

Slide 58

Slide 58 text

58 https://www.theguardian.com/news/datablog/2014/may/12/lies-election-leaflets-five-tricks-european-elections

Slide 59

Slide 59 text

LIES, DAMNED LIES
 AND SIGNIFICANCE

Slide 60

Slide 60 text

Significant = Important 60 ?

Slide 61

Slide 61 text

Statistically Significant Results 61

Slide 62

Slide 62 text

Statistically Significant Results 62 • We are quite sure they are reliable (not by chance) • Maybe they’re not “big” • Maybe they’re not important • Maybe they’re not useful for decision making

Slide 63

Slide 63 text

p-values 63

Slide 64

Slide 64 text

64 https://en.wikipedia.org/wiki/Misunderstandings_of_p-values

Slide 65

Slide 65 text

p-values • Probability of observing our results (or more extreme) when the null hypothesis is true • Probability, not certainty • Often p < 0.05 (arbitrary) • Can we afford to be fooled by randomness
 every 1 time out of 20? 65

Slide 66

Slide 66 text

Data dredging 66

Slide 67

Slide 67 text

67

Slide 68

Slide 68 text

Data dredging • a.k.a. Data fishing or p-hacking • Convention: formulate hypothesis, collect data, prove/disprove hypothesis • Data dredging: look for patterns until something statistically significant comes up • Looking for patterns is ok
 Testing the hypothesis on the same data set is not 68

Slide 69

Slide 69 text

SUMMARY

Slide 70

Slide 70 text

— Dr. House “Everybody lies” 70

Slide 71

Slide 71 text

71 • Good Science ™ vs. Big headlines • Nobody is immune • Ask questions: What is the context? Who’s paying? What’s missing? • … “so what?”

Slide 72

Slide 72 text

THANK YOU @MarcoBonzanini speakerdeck.com/marcobonzanini GitHub.com/bonzanini marcobonzanini.com

Slide 73

Slide 73 text

No content