Slide 1

Slide 1 text

Causal Thinking
 for
 Descriptive Research Biography Surv Almanacs Collection veys Exca Folktales tales urveys Ethnography R ds A Richard McElreath MPI-EVA Leipzig

Slide 2

Slide 2 text

Light | Dark Good | Evil Civilization | Barbarism Inferential | Descriptive

Slide 3

Slide 3 text

No causes in; nothing much out Tests of non-causal laws require background information too and much of it will necessarily be causal. There is a slogan in the book about attempts to infer causes purely from probabilities: “No causes in; no causes out.” But the stronger conclusion is intended as well: “No causes in; nothing much out.” Nancy Cartwright 1989 Nature’s Capacities and Their Measurement

Slide 4

Slide 4 text

Casual thinking for everyone • The principles that license causal inference in experiments are fundamentally the same as those that license description • Causal inference depends upon trustworthy description • Description depends upon causal information • Big data alone just means bigger bias

Slide 5

Slide 5 text

Some ordinary descriptive terrors • Missing values • Measurement • Relevance

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

  .*44*/( %"5" "/% 05)&3 0110356/*5*&4 -10000 -8000 -6000 -4000 -2000 0 2000 2 3 4 5 6 7 8 Time (year) Population size Moralizing gods present Moralizing gods absent Moralizing gods unknown 'ĶĴłĿIJ ƉƍƏ .JTTJOH WBMVFT JO UIF Moralizing_gods EBUB ćF CMVF QPJOUT CPUI PQFO BOE ĕMMFE BSF PCTFSWFE WBMVFT GPS UIF QSFTFODF PG CFMJFGT BCPVU NPSBMJ[JOH HPET ćF x TZNCPMT BSF VOLOPXOT UIF NJTTJOH WBMVFT Where is your god now? McElreath 2020 Statistical Rethinking, page 514

Slide 9

Slide 9 text

Where is your god now? • Missing data are not the real problem. • The cause of the missingness is the problem. • Data for Hawaii: Captain James Cook (1728–1779)

Slide 10

Slide 10 text

Missing methods • A circus of methods for handling missing values • drop all cases with missing values • replace missing values with means • “multiple imputation” • Bayesian imputation • prayer MISSING DATA ARE A CAUSAL
 PROBLEM

Slide 11

Slide 11 text

X Y Drawing our assumptions X Y observed variable unobserved variable a causal relationship

Slide 12

Slide 12 text

Ỵ Y Missing is a mechanism true observed with missing values

Slide 13

Slide 13 text

Ỵ D Y Missing is a mechanism true observed with missing values mechanism (a dog)

Slide 14

Slide 14 text

Ỵ X D Y Benign missingness true observed with missing values mechanism (a dog) cause

Slide 15

Slide 15 text

Ỵ X D Y Benign missingness -2 -1 0 1 2 -2 -1 0 1 2 3 X Y

Slide 16

Slide 16 text

Ỵ X D Y Less benign missingness

Slide 17

Slide 17 text

Ỵ X D Y Less benign missingness -1 0 1 2 -2 -1 0 1 2 3 X Y If we can model X –> Y, can recover p(Y)

Slide 18

Slide 18 text

Ỵ X D Y Utterly evil missingness

Slide 19

Slide 19 text

Ỵ X D Y Utterly evil missingness -2 -1 0 1 2 -2 -1 0 1 2 X Y If can model Y –> D, some hope

Slide 20

Slide 20 text

  .*44*/( %"5" "/% 05)&3 0110356/*5*&4 -10000 -8000 -6000 -4000 -2000 0 2000 2 3 4 5 6 7 8 Time (year) Population size Moralizing gods present Moralizing gods absent Moralizing gods unknown 'ĶĴłĿIJ ƉƍƏ .JTTJOH WBMVFT JO UIF Moralizing_gods EBUB ćF CMVF QPJOUT CPUI PQFO BOE ĕMMFE BSF PCTFSWFE WBMVFT GPS UIF QSFTFODF PG CFMJFGT BCPVU NPSBMJ[JOH HPET ćF x TZNCPMT BSF VOLOPXOT UIF NJTTJOH WBMVFT Where is your god now? McElreath 2020 Statistical Rethinking, page 514

Slide 21

Slide 21 text

  .*44*/( %"5" "/% 05)&3 0110356/*5*&4 -10000 -8000 -6000 -4000 -2000 0 2000 2 3 4 5 6 7 8 Time (year) Population size Moralizing gods present Moralizing gods absent Moralizing gods unknown literacy gods 0 1 0 16 1 0 1 9 310 0 442 86 0   PG  NJTTJOH WBMVFT BSF GPS OPOMJUFS PG BOZ LJOE JO NPTU DBTFT "OE BT ZPV DBO TFF JO XJUI TNBMMFS QPMJUJFT ćJT JT QPTTJCMZ CFDBVTF T UP CF MJUFSBUF ćFTF EBUB BSF TUSVDUVSFE CZ UIF J[JOH HPET BOE NJTTJOH WBMVFT #FOFBUI UIBU N NPSBMJ[JOH HPET DPVME CF DPNNPO PS SBSF EFQF

Slide 22

Slide 22 text

  .*44*/( %"5" "/% 05)&3 0110356/*5*&4 -10000 -8000 -6000 -4000 -2000 0 2000 2 3 4 5 6 7 8 Time (year) Population size Moralizing gods present Moralizing gods absent Moralizing gods unknown literacy gods 0 1 0 16 1 0 1 9 310 0 442 86 0   PG  NJTTJOH WBMVFT BSF GPS OPOMJUFS PG BOZ LJOE JO NPTU DBTFT "OE BT ZPV DBO TFF JO XJUI TNBMMFS QPMJUJFT ćJT JT QPTTJCMZ CFDBVTF T UP CF MJUFSBUF ćFTF EBUB BSF TUSVDUVSFE CZ UIF J[JOH HPET BOE NJTTJOH WBMVFT #FOFBUI UIBU N NPSBMJ[JOH HPET DPVME CF DPNNPO PS SBSF EFQF

Slide 23

Slide 23 text

  .*44*/( %"5" "/% 05)&3 0110356/*5*&4 -10000 -8000 -6000 -4000 -2000 0 2000 2 3 4 5 6 7 8 Time (year) Population size Moralizing gods present Moralizing gods absent Moralizing gods unknown literacy gods 0 1 0 16 1 0 1 9 310 0 442 86 0   PG  NJTTJOH WBMVFT BSF GPS OPOMJUFS PG BOZ LJOE JO NPTU DBTFT "OE BT ZPV DBO TFF JO XJUI TNBMMFS QPMJUJFT ćJT JT QPTTJCMZ CFDBVTF T UP CF MJUFSBUF ćFTF EBUB BSF TUSVDUVSFE CZ UIF J[JOH HPET BOE NJTTJOH WBMVFT #FOFBUI UIBU N NPSBMJ[JOH HPET DPVME CF DPNNPO PS SBSF EFQF

Slide 24

Slide 24 text

G̣ D G moralizing gods

Slide 25

Slide 25 text

G̣ L D G literacy moralizing gods

Slide 26

Slide 26 text

G̣ L D G P population literacy moralizing gods

Slide 27

Slide 27 text

G̣ L D G P Only large P have L = 1 G observed only when L = 1 No info re assoc P and G when L = 0 No way to reconstruct distribution of G 3 DPEF  with( Moralizing_gods , table( gods=moralizin literacy gods 0 1 0 16 1 0 1 9 310 0 442 86 0   PG  NJTTJOH WBMV PG BOZ LJOE JO NPTU DBTFT "O XJUI TNBMMFS QPMJUJFT ćJT JT UP CF MJUFSBUF ćFTF EBUB BSF J[JOH HPET BOE NJTTJOH WBMVF NPSBMJ[JOH HPET DPVME CF DPN

Slide 28

Slide 28 text

Missed opportunities • Many sources of data have partial missingness • Essential to explore causes of the missing values • Sometimes missing values are benign • Sometimes missing values preclude description (without more causal assumptions) • NO CAUSES IN; NO DESCRIPTION OUT

Slide 29

Slide 29 text

No content

Slide 30

Slide 30 text

Nick Blurton-Jones (right) interviews a Hadza great-grandmother (second from left) and her younger kinswoman (second from right) in 1999. PHOTO: ANNETTE WAGNER FROM FILMING OF TINDIGA—THOSE WHO ARE RUNNING AND HADZABE MEANS: US PEOPLE

Slide 31

Slide 31 text

5NITED3TATES &)'52%-ODALAGESOFADULTDEATH ./4%&REQUENCYDISTRIBUTIONOFAGESATDEATH FX FORINDIVIDUALSOVERAGESHOWSSTRONGPEAKSFORHUNTER GATHERERS FORAGER HORTICULTURALISTS ACCULTURATEDHUNTER GATHERERS 3WEDENn ANDTHE5NITED3TATES                   !GE (UNTER GATHERERS !CCULTURATEDHUNTER GATHERERS &ORAGER HORTICULTURALISTS 3WEDENn FX Gurven & Kaplan 2008 Longevity Among Hunter-Gatherers

Slide 32

Slide 32 text

Demography not so simple • Most humans did not and do not know their birthdays • Many records are estimates or simply falsified • The direction and magnitude of error changes with age • Analogies in many other kinds of data

Slide 33

Slide 33 text

No content

Slide 34

Slide 34 text

CC-BY-NC-ND 4.0 International lic a Newman 2020 Supercentenarian and remarkable age records

Slide 35

Slide 35 text

Figure 2. Number and per capita rate of attaining supercentenarian status across US states, relative Newman 2020 Supercentenarian and remarkable age records

Slide 36

Slide 36 text

Demography not so simple • Most humans do not know their birthdays • Many records are estimates or simply falsified • The direction and magnitude of error changes with age • CONCLUSION: Most census records do not describe the target population • But there is hope! Can use causal knowledge to refine age/ fertility estimates, get better descriptions.

Slide 37

Slide 37 text

Causal information about age • Every human has exactly one biological father and one biological mother • Human gestation is about 270 ± 15 days • Female fertility tightly bounded between 20 and 45 years in most populations • If we know family structure and birth order, we can do a lot with these facts • All of this can be made algorithmic, repeatable, audit-able

Slide 38

Slide 38 text

Hitting the Target • Basic problem: Sample is not the target • Post-stratification & Transport: Transparent, principled methods for extrapolating from sample to population • Post-strat requires casual model of reasons sample differs from population • NO CAUSES IN; NO DESCRIPTION OUT

Slide 39

Slide 39 text

Cartoon example A B C D Four age groups:

Slide 40

Slide 40 text

Cartoon example A B C D Four age groups: Proportions of sample: A B C D

Slide 41

Slide 41 text

Multi-level regression & post-stratification (MRP)

Slide 42

Slide 42 text

X Y Age Attitude Selection nodes

Slide 43

Slide 43 text

X Y Age Attitude S Selection
 by Age Selection nodes S : “Sample differs because of differences in what I point to”

Slide 44

Slide 44 text

Selection ubiquitous • Many sources of data are already filtered by selection effects • Crime statistics • Employment & job performance • Health • Preservation & curation

Slide 45

Slide 45 text

X Y Age Attitude S Selection
 by Age “Young people don’t answer their phones”

Slide 46

Slide 46 text

X Y S “Anarchists don’t answer their phones” X Y S “Young people don’t answer their phones”

Slide 47

Slide 47 text

X Y Age Attitude S Selection
 by Age “Young people don’t answer their phones and misreport their age” X̣ Reported
 Age

Slide 48

Slide 48 text

 *mbH 6`K2rQ`F 7Q` *`Qbb@*mHim`H :2M2`HBx#BHBiv .QKBMBF .2zM2`1∗ - CmHB JX _Q?`2`2  _B+?`/ J+1H`2i?1 1.2T`iK2Mi Q7 >mKM "2?pBQ`- 1+QHQ;v M/ *mHim`2- Jt SHM+F AMbiBimi2 7Q` 1pQHmiBQM`v M@ i?`QTQHQ;v- G2BTxB;- :2`KMv 2.2T`iK2Mi Q7 Sbv+?QHQ;v- G2BTxB; lMBp2`bBiv- G2BTxB;- :2`KMv ∗*Q``2bTQM/BM; mi?Q`, /QKBMBFn/2zM2`!2pXKT;X/2 "2?pBQ`H `2b2`+?2`b BM+`2bBM;Hv `2+Q;MBx2 i?2 M22/ 7Q` bKTH2b 7`QK KQ`2 /Bp2`b2 TQTmHiBQMb i?i +Tim`2 i?2 #`2/i? Q7 ?mKM 2tT2`B2M+2X *m``2Mi ii2KTib iQ 2bi#HBb? ;2M2`HBx#BHBiv +`Qbb TQTmHiBQMb 7Q+mb QM i?`2ib iQ pHB/Biv M/ i?2 ++mKmHiBQM Q7 H`;2 +`Qbb@+mHim`H /ib2ibX 6Q` +QMiBMm2/ T`Q;`2bb- KQ`2 /Bp2`b2 /i M/ HBbib Q7 i?BM;b i?i +M ;Q r`QM; `2 MQi bm{+B2Miě r2 HbQ M22/  7`K2rQ`F i?i H2ib mb /2i2`KBM2 r?B+? BM72`2M+2b +M #2 /`rM M/ ?Qr iQ KF2 BM7Q`KiBp2 +`Qbb@+mHim`H +QKT`BbQMbX q2 BMi`Q/m+2  7Q`KH ;2M2`iBp2 +mbH KQ/2HBM; 7`K2rQ`F M/ QmiHBM2 bBKTH2 ;`T?B+H +`Bi2`B iQ /2`Bp2 MHviB+ bi`i2;B2b M/ BKTHB2/ ;2M2`HBxiBQM 7`QK +mbH /B;`KbX lbBM; #Qi? bBKmHi2/ M/ `2H /i- r2 /2KQMbi`i2 K2i?Q/b iQ T`QD2+i M/ +QKT`2 2biBKi2b +`Qbb TQTmHiBQMbX q2 +QM+Hm/2 rBi?  /Bb+mbbBQM Q7 ?Qr  7Q`KH 7`K2rQ`F 7Q` ;2M2`HBx#BHBiv +M bbBbi `2b2`+?2`b BM /2bB;MBM; KtBKHHv BM7Q`KiBp2 +`Qbb@+mHim`H bim/B2b M/ i?mb T`QpB/2b  KQ`2 bQHB/ 7QmM/iBQM 7Q` +mKmHiBp2 M/ ;2M2`HBx#H2 #2?pBQ`H `2b2`+?X E2vrQ`/b, *`Qbb@+mHim`H `2b2`+?- ;2M2`HBx#BHBiv- q1A_. bKTH2b T`Q#H2K- +mbH BM72`2M+2- TQbibi`iB}+iBQMX RX AMi`Q/m+iBQM h?2 #2?pBQ`H M/ bQ+BH b+B2M+2b ?p2 #22M +`BiB+Bx2/ 7Q` `2HvBM; HKQbi 2t+HmbBp2Hv QM q1A_. bKTH2b BM r?B+? KQbi T`iB+BTMib `2 q2bi2`M- 2/m+i2/- M/ 7`QK BM/mbi`B@ HBx2/- `B+?- M/ /2KQ+`iB+ +QmMi`B2b U>2M`B+? 2i HX kyRy- TB+2HH 2i HX kyky- >2M`B+? kykyVX _2b2`+? ?b 2bi#HBb?2/ bm#biMiBH +`Qbb@+mHim`H p`BiBQM BM F2v Tbv+?QHQ;B+H 8 /QKBMb- bm+? b i?BMFBM; bivH2b U2X;X Jbm/ M/ LBb#2ii kyyR- LBb#2ii M/ JBvKQiQ Coming next month to a preprint server near you

Slide 49

Slide 49 text

Many Qs are really post-strat Qs • Justified descriptions require causal information and post- stratification • Other tasks are structurally similar • Causal effects also require post-stratification. e.g. vaccines • Proper time trends account for changes in measurement/ population, post-strat correctly for each time period • Comparison is post-stratification from one population to another

Slide 50

Slide 50 text

Honest Methods
 for
 Modest Questions Satellites Surv Almanacs Collection veys Exca Archives tales urveys Ethnography Re rds A y

Slide 51

Slide 51 text

Simple 4-step plan for honest digital scholarship • (1) What are we trying to describe? • (2) What is the ideal data for doing so? • (3) What data do we actually have? • (4) What causes the differences between (2) and (3)? • (5) [optional] Is there a statistical way to use (3) + (4) to accomplish (1)?