Slide 1

Slide 1 text

Lies, Damned Lies
 and Statistics @MarcoBonzanini PyCon UK 2019

Slide 2

Slide 2 text

In the Vatican City
 there are 5.88 popes
 per square mile

Slide 3

Slide 3 text

This talk is about: the misuse of stats in everyday life This talk is NOT about: Python The audience (you!): good citizens, with an interest in statistical literacy (without an advanced Math degree?)

Slide 4

Slide 4 text

LIES, DAMNED LIES
 AND CORRELATION

Slide 5

Slide 5 text

Correlation

Slide 6

Slide 6 text

Correlation • Informal: a connection between two things • Measure the strength of the association between two variables

Slide 7

Slide 7 text

Linear Correlation

Slide 8

Slide 8 text

Linear Correlation Positive Negative x x y y

Slide 9

Slide 9 text

Correlation Example

Slide 10

Slide 10 text

Correlation Example Temperature Ice Cream
 Sales ($$$)

Slide 11

Slide 11 text

“Correlation 
 does not imply
 causation”

Slide 12

Slide 12 text

Deaths by
 drowning Ice Cream
 Sales ($$$)

Slide 13

Slide 13 text

Lurking Variable

Slide 14

Slide 14 text

Temperature Ice Cream
 Sales ($$$) Temperature Deaths by
 drowning Lurking Variable

Slide 15

Slide 15 text

More Lurking Variables

Slide 16

Slide 16 text

Damage
 caused
 by fire Firefighters
 deployed More Lurking Variables

Slide 17

Slide 17 text

Damage
 caused
 by fire Firefighters
 deployed Fire severity? More Lurking Variables

Slide 18

Slide 18 text

Correlation and causation

Slide 19

Slide 19 text

Correlation and causation A A B B C A B C A B C

Slide 20

Slide 20 text

http://www.tylervigen.com/spurious-correlations

Slide 21

Slide 21 text

http://www.tylervigen.com/spurious-correlations

Slide 22

Slide 22 text

https://www.buzzfeed.com/kjh2110/the-10-most-bizarre-correlations

Slide 23

Slide 23 text

https://www.buzzfeed.com/kjh2110/the-10-most-bizarre-correlations

Slide 24

Slide 24 text

http://www.nejm.org/doi/full/10.1056/NEJMon1211064

Slide 25

Slide 25 text

LIES, DAMNED LIES,
 SLICING AND DICING
 YOUR DATA

Slide 26

Slide 26 text

Simpson’s Paradox

Slide 27

Slide 27 text

University of California, Berkeley Graduate school admissions in 1973 https://en.wikipedia.org/wiki/Simpson%27s_paradox

Slide 28

Slide 28 text

University of California, Berkeley Graduate school admissions in 1973 https://en.wikipedia.org/wiki/Simpson%27s_paradox Gender bias?

Slide 29

Slide 29 text

University of California, Berkeley Graduate school admissions in 1973 https://en.wikipedia.org/wiki/Simpson%27s_paradox

Slide 30

Slide 30 text

University of California, Berkeley Graduate school admissions in 1973 https://en.wikipedia.org/wiki/Simpson%27s_paradox

Slide 31

Slide 31 text

University of California, Berkeley Graduate school admissions in 1973 https://en.wikipedia.org/wiki/Simpson%27s_paradox

Slide 32

Slide 32 text

University of California, Berkeley Graduate school admissions in 1973 https://en.wikipedia.org/wiki/Simpson%27s_paradox

Slide 33

Slide 33 text

LIES, DAMNED LIES
 AND SAMPLING BIAS

Slide 34

Slide 34 text

Sampling

Slide 35

Slide 35 text

Sampling • A selection of a subset of individuals • Purpose: estimate about the whole population • Hello Big Data!

Slide 36

Slide 36 text

Bias

Slide 37

Slide 37 text

Bias • Prejudice? Intuition? • Cultural context? • In science: a systematic error

Slide 38

Slide 38 text

“Dewey defeats Truman”

Slide 39

Slide 39 text

https://en.wikipedia.org/wiki/Dewey_Defeats_Truman “Dewey defeats Truman”

Slide 40

Slide 40 text

• The Chicago Tribune printed the wrong headline on election night • The editor trusted the results of the phone survey • … in 1948, a sample of phone users was not representative of the general population “Dewey defeats Truman” https://en.wikipedia.org/wiki/Dewey_Defeats_Truman

Slide 41

Slide 41 text

Survivorship Bias

Slide 42

Slide 42 text

Survivorship Bias • Bill Gates, Steve Jobs, Mark Zuckerberg
 are all college drop-outs • … should you quit studying?

Slide 43

Slide 43 text

LIES, DAMNED LIES
 AND DATAVIZ

Slide 44

Slide 44 text

“A picture is worth 
 a thousand words”

Slide 45

Slide 45 text

https://en.wikipedia.org/wiki/Anscombe%27s_quartet

Slide 46

Slide 46 text

https://venngage.com/blog/misleading-graphs/

Slide 47

Slide 47 text

https://venngage.com/blog/misleading-graphs/

Slide 48

Slide 48 text

https://venngage.com/blog/misleading-graphs/

Slide 49

Slide 49 text

http://www.businessinsider.com/gun-deaths-in-florida-increased-with-stand-your-ground-2014-2?IR=T

Slide 50

Slide 50 text

http://www.businessinsider.com/gun-deaths-in-florida-increased-with-stand-your-ground-2014-2?IR=T

Slide 51

Slide 51 text

http://www.businessinsider.com/gun-deaths-in-florida-increased-with-stand-your-ground-2014-2?IR=T

Slide 52

Slide 52 text

https://www.raiplay.it/video/2016/04/Agor224-del-08042016-4d84cebb-472c-442c-82e0-df25c7e4d0ce.html

Slide 53

Slide 53 text

https://www.theguardian.com/news/datablog/2014/may/12/lies-election-leaflets-five-tricks-european-elections

Slide 54

Slide 54 text

https://www.theguardian.com/news/datablog/2014/may/12/lies-election-leaflets-five-tricks-european-elections

Slide 55

Slide 55 text

https://www.theguardian.com/news/datablog/2014/may/12/lies-election-leaflets-five-tricks-european-elections

Slide 56

Slide 56 text

https://www.theguardian.com/news/datablog/2014/may/12/lies-election-leaflets-five-tricks-european-elections

Slide 57

Slide 57 text

https://www.theguardian.com/news/datablog/2014/may/12/lies-election-leaflets-five-tricks-european-elections

Slide 58

Slide 58 text

https://www.theguardian.com/news/datablog/2014/may/12/lies-election-leaflets-five-tricks-european-elections

Slide 59

Slide 59 text

LIES, DAMNED LIES
 AND SIGNIFICANCE

Slide 60

Slide 60 text

Significant = Important ?

Slide 61

Slide 61 text

Statistically Significant Results

Slide 62

Slide 62 text

• We are quite sure they are reliable (not by chance) • Maybe they’re not “big” • Maybe they’re not important • Maybe they’re not useful for decision making Statistically Significant Results

Slide 63

Slide 63 text

p-values

Slide 64

Slide 64 text

https://en.wikipedia.org/wiki/Misunderstandings_of_p-values

Slide 65

Slide 65 text

p-values • Probability of observing our results (or more extreme) when the null hypothesis is true • Probability, not certainty • Often p < 0.05 (arbitrary) • Can we afford to be fooled by randomness
 every 1 time out of 20?

Slide 66

Slide 66 text

Data dredging

Slide 67

Slide 67 text

No content

Slide 68

Slide 68 text

Data dredging • a.k.a. Data fishing or p-hacking • Convention: formulate hypothesis, collect data, prove/disprove hypothesis • Data dredging: look for patterns until something statistically significant comes up • Looking for patterns is ok
 Testing the hypothesis on the same data set is not

Slide 69

Slide 69 text

LIES, DAMNED LIES
 AND CELEBRITIES ON TWITTER

Slide 70

Slide 70 text

https://twitter.com/billgates/status/1118196606975787008

Slide 71

Slide 71 text

P(mosquito|death) P(death|mosquito) ≠

Slide 72

Slide 72 text

SUMMARY

Slide 73

Slide 73 text

— Dr. House “Everybody lies”

Slide 74

Slide 74 text

• Good Science ™ vs. Big headlines • Nobody is immune • Ask questions:
 What is the context? 
 Who’s paying? 
 What’s missing? • … “so what?”

Slide 75

Slide 75 text

THANK YOU @MarcoBonzanini @PyDataLondon