$30 off During Our Annual Pro Sale. View Details »

Causal Thinking for Descriptive Research

Causal Thinking for Descriptive Research

Abstract: Causal inference is hard, and everyone knows it. It is less
recognized that descriptive and comparative scholarship also rely upon
causal inference. How data are sampled and curated influences how we
should process the data, in order to accurately describe or compare
the people, times, and places of interest. I'll present some examples
to illustrate the problems that ignoring causal structure can create,
along with some solutions.

Richard McElreath

September 21, 2021
Tweet

More Decks by Richard McElreath

Other Decks in Education

Transcript

  1. Causal Thinking

    for

    Descriptive Research
    Biography
    Surv
    Almanacs Collection
    veys
    Exca
    Folktales
    tales
    urveys Ethnography
    R
    ds
    A
    Richard McElreath MPI-EVA Leipzig

    View Slide

  2. Light | Dark
    Good | Evil
    Civilization | Barbarism
    Inferential | Descriptive

    View Slide

  3. No causes in; nothing much out
    Tests of non-causal laws require
    background information too
    and much of it will necessarily
    be causal. There is a slogan in
    the book about attempts to infer
    causes purely from probabilities:
    “No causes in; no causes out.”
    But the stronger conclusion is
    intended as well: “No causes in;
    nothing much out.”
    Nancy Cartwright 1989 Nature’s Capacities and Their Measurement

    View Slide

  4. Casual thinking for everyone
    • The principles that license causal inference in
    experiments are fundamentally the same as
    those that license description
    • Causal inference depends upon trustworthy
    description
    • Description depends upon causal information
    • Big data alone just means bigger bias

    View Slide

  5. Some ordinary descriptive terrors
    • Missing values
    • Measurement
    • Relevance

    View Slide

  6. View Slide

  7. View Slide

  8. .*44*/( %"5" "/% 05)&3 0110356/*5*&4
    -10000 -8000 -6000 -4000 -2000 0 2000
    2 3 4 5 6 7 8
    Time (year)
    Population size
    Moralizing gods present
    Moralizing gods absent
    Moralizing gods unknown
    'ĶĴłĿIJ ƉƍƏ .JTTJOH WBMVFT JO UIF Moralizing_gods EBUB ćF CMVF
    QPJOUT CPUI PQFO BOE ĕMMFE BSF PCTFSWFE WBMVFT GPS UIF QSFTFODF PG CFMJFGT
    BCPVU NPSBMJ[JOH HPET ćF x TZNCPMT BSF VOLOPXOT UIF NJTTJOH WBMVFT
    Where is your god now?
    McElreath 2020 Statistical Rethinking, page 514

    View Slide

  9. Where is your god now?
    • Missing data are not the real problem.
    • The cause of the missingness is the problem.
    • Data for Hawaii:
    Captain James Cook (1728–1779)

    View Slide

  10. Missing methods
    • A circus of methods for handling
    missing values
    • drop all cases with missing values
    • replace missing values with means
    • “multiple imputation”
    • Bayesian imputation
    • prayer
    MISSING
    DATA
    ARE A
    CAUSAL

    PROBLEM

    View Slide

  11. X
    Y
    Drawing our assumptions
    X Y
    observed variable
    unobserved variable
    a causal relationship

    View Slide


  12. Y
    Missing is a mechanism
    true observed with
    missing values

    View Slide


  13. D
    Y
    Missing is a mechanism
    true observed with
    missing values
    mechanism
    (a dog)

    View Slide


  14. X D
    Y
    Benign missingness
    true observed with
    missing values
    mechanism
    (a dog)
    cause

    View Slide


  15. X D
    Y
    Benign missingness
    -2 -1 0 1 2
    -2 -1 0 1 2 3
    X
    Y

    View Slide


  16. X D
    Y
    Less benign missingness

    View Slide


  17. X D
    Y
    Less benign missingness
    -1 0 1 2
    -2 -1 0 1 2 3
    X
    Y
    If we can model X –> Y, can recover p(Y)

    View Slide


  18. X D
    Y
    Utterly evil missingness

    View Slide


  19. X D
    Y
    Utterly evil missingness
    -2 -1 0 1 2
    -2 -1 0 1 2
    X
    Y
    If can model Y –> D, some hope

    View Slide

  20. .*44*/( %"5" "/% 05)&3 0110356/*5*&4
    -10000 -8000 -6000 -4000 -2000 0 2000
    2 3 4 5 6 7 8
    Time (year)
    Population size
    Moralizing gods present
    Moralizing gods absent
    Moralizing gods unknown
    'ĶĴłĿIJ ƉƍƏ .JTTJOH WBMVFT JO UIF Moralizing_gods EBUB ćF CMVF
    QPJOUT CPUI PQFO BOE ĕMMFE BSF PCTFSWFE WBMVFT GPS UIF QSFTFODF PG CFMJFGT
    BCPVU NPSBMJ[JOH HPET ćF x TZNCPMT BSF VOLOPXOT UIF NJTTJOH WBMVFT
    Where is your god now?
    McElreath 2020 Statistical Rethinking, page 514

    View Slide

  21. .*44*/( %"5" "/% 05)&3 0110356/*5*&4
    -10000 -8000 -6000 -4000 -2000 0 2000
    2 3 4 5 6 7 8
    Time (year)
    Population size
    Moralizing gods present
    Moralizing gods absent
    Moralizing gods unknown
    literacy
    gods 0 1
    0 16 1 0
    1 9 310 0
    442 86 0

    PG NJTTJOH WBMVFT BSF GPS OPOMJUFS
    PG BOZ LJOE JO NPTU DBTFT "OE BT ZPV DBO TFF JO
    XJUI TNBMMFS QPMJUJFT ćJT JT QPTTJCMZ CFDBVTF T
    UP CF MJUFSBUF ćFTF EBUB BSF TUSVDUVSFE CZ UIF
    J[JOH HPET BOE NJTTJOH WBMVFT #FOFBUI UIBU N
    NPSBMJ[JOH HPET DPVME CF DPNNPO PS SBSF EFQF

    View Slide

  22. .*44*/( %"5" "/% 05)&3 0110356/*5*&4
    -10000 -8000 -6000 -4000 -2000 0 2000
    2 3 4 5 6 7 8
    Time (year)
    Population size
    Moralizing gods present
    Moralizing gods absent
    Moralizing gods unknown
    literacy
    gods 0 1
    0 16 1 0
    1 9 310 0
    442 86 0

    PG NJTTJOH WBMVFT BSF GPS OPOMJUFS
    PG BOZ LJOE JO NPTU DBTFT "OE BT ZPV DBO TFF JO
    XJUI TNBMMFS QPMJUJFT ćJT JT QPTTJCMZ CFDBVTF T
    UP CF MJUFSBUF ćFTF EBUB BSF TUSVDUVSFE CZ UIF
    J[JOH HPET BOE NJTTJOH WBMVFT #FOFBUI UIBU N
    NPSBMJ[JOH HPET DPVME CF DPNNPO PS SBSF EFQF

    View Slide

  23. .*44*/( %"5" "/% 05)&3 0110356/*5*&4
    -10000 -8000 -6000 -4000 -2000 0 2000
    2 3 4 5 6 7 8
    Time (year)
    Population size
    Moralizing gods present
    Moralizing gods absent
    Moralizing gods unknown
    literacy
    gods 0 1
    0 16 1 0
    1 9 310 0
    442 86 0

    PG NJTTJOH WBMVFT BSF GPS OPOMJUFS
    PG BOZ LJOE JO NPTU DBTFT "OE BT ZPV DBO TFF JO
    XJUI TNBMMFS QPMJUJFT ćJT JT QPTTJCMZ CFDBVTF T
    UP CF MJUFSBUF ćFTF EBUB BSF TUSVDUVSFE CZ UIF
    J[JOH HPET BOE NJTTJOH WBMVFT #FOFBUI UIBU N
    NPSBMJ[JOH HPET DPVME CF DPNNPO PS SBSF EFQF

    View Slide


  24. D
    G
    moralizing
    gods

    View Slide


  25. L D
    G
    literacy
    moralizing
    gods

    View Slide


  26. L D
    G
    P
    population
    literacy
    moralizing
    gods

    View Slide


  27. L D
    G
    P
    Only large P have L = 1
    G observed only when L = 1
    No info re assoc P and G when L = 0
    No way to reconstruct distribution of G
    3 DPEF
    with( Moralizing_gods ,
    table( gods=moralizin
    literacy
    gods 0 1
    0 16 1 0
    1 9 310 0
    442 86 0

    PG NJTTJOH WBMV
    PG BOZ LJOE JO NPTU DBTFT "O
    XJUI TNBMMFS QPMJUJFT ćJT JT
    UP CF MJUFSBUF ćFTF EBUB BSF
    J[JOH HPET BOE NJTTJOH WBMVF
    NPSBMJ[JOH HPET DPVME CF DPN

    View Slide

  28. Missed opportunities
    • Many sources of data have partial missingness
    • Essential to explore causes of the missing values
    • Sometimes missing values are benign
    • Sometimes missing values preclude description
    (without more causal assumptions)
    • NO CAUSES IN; NO DESCRIPTION OUT

    View Slide

  29. View Slide

  30. Nick Blurton-Jones (right) interviews a Hadza great-grandmother (second
    from left) and her younger kinswoman (second from right) in 1999.
    PHOTO: ANNETTE WAGNER FROM FILMING OF TINDIGA—THOSE WHO ARE
    RUNNING AND HADZABE MEANS: US PEOPLE

    View Slide

  31. 5NITED3TATES
    &)'52%-ODALAGESOFADULTDEATH
    ./4%&REQUENCYDISTRIBUTIONOFAGESATDEATH FX FORINDIVIDUALSOVERAGESHOWSSTRONGPEAKSFORHUNTER
    GATHERERSFORAGER HORTICULTURALISTSACCULTURATEDHUNTER GATHERERS3WEDENnANDTHE5NITED3TATES










    !GE
    (UNTER GATHERERS
    !CCULTURATEDHUNTER GATHERERS
    &ORAGER HORTICULTURALISTS
    3WEDENn
    FX
    Gurven & Kaplan 2008 Longevity Among Hunter-Gatherers

    View Slide

  32. Demography not so simple
    • Most humans did not and do not know
    their birthdays
    • Many records are estimates or simply
    falsified
    • The direction and magnitude of error
    changes with age
    • Analogies in many other kinds of data

    View Slide

  33. View Slide

  34. CC-BY-NC-ND 4.0 International lic
    a
    Newman 2020 Supercentenarian and remarkable age records

    View Slide

  35. Figure 2. Number and per capita rate of attaining supercentenarian status across US states, relative
    Newman 2020 Supercentenarian and remarkable age records

    View Slide

  36. Demography not so simple
    • Most humans do not know their birthdays
    • Many records are estimates or simply falsified
    • The direction and magnitude of error changes with age
    • CONCLUSION: Most census records do not describe the
    target population
    • But there is hope! Can use causal knowledge to refine age/
    fertility estimates, get better descriptions.

    View Slide

  37. Causal information about age
    • Every human has exactly one biological father and one
    biological mother
    • Human gestation is about 270 ± 15 days
    • Female fertility tightly bounded between 20 and 45 years in
    most populations
    • If we know family structure and birth order, we can do a lot with
    these facts
    • All of this can be made algorithmic, repeatable, audit-able

    View Slide

  38. Hitting the Target
    • Basic problem: Sample is not the target
    • Post-stratification & Transport:
    Transparent, principled methods for
    extrapolating from sample to population
    • Post-strat requires casual model of
    reasons sample differs from population
    • NO CAUSES IN; NO
    DESCRIPTION OUT

    View Slide

  39. Cartoon example
    A B C D
    Four age groups:

    View Slide

  40. Cartoon example
    A B C D
    Four age groups:
    Proportions of sample:
    A B C D

    View Slide

  41. Multi-level regression & post-stratification (MRP)

    View Slide

  42. X Y
    Age Attitude
    Selection nodes

    View Slide

  43. X Y
    Age Attitude
    S
    Selection

    by Age
    Selection nodes
    S : “Sample differs because of
    differences in what I point to”

    View Slide

  44. Selection ubiquitous
    • Many sources of data are already
    filtered by selection effects
    • Crime statistics
    • Employment & job performance
    • Health
    • Preservation & curation

    View Slide

  45. X Y
    Age Attitude
    S
    Selection

    by Age
    “Young people don’t answer their phones”

    View Slide

  46. X Y
    S
    “Anarchists
    don’t answer
    their phones”
    X Y
    S
    “Young people
    don’t answer
    their phones”

    View Slide

  47. X Y
    Age Attitude
    S
    Selection

    by Age
    “Young people don’t answer their phones
    and misreport their age”

    Reported

    Age

    View Slide

  48. *mbH 6`K2rQ`F 7Q` *`Qbb@*mHim`H :2M2`HBx#BHBiv
    .QKBMBF .2zM2`1∗ - CmHB JX _Q?`2`2 _B+?`/ J+1H`2i?1
    1.2T`iK2Mi Q7 >mKM "2?pBQ`- 1+QHQ;v M/ *mHim`2- Jt SHM+F AMbiBimi2 7Q` 1pQHmiBQM`v M@
    i?`QTQHQ;v- G2BTxB;- :2`KMv
    2.2T`iK2Mi Q7 Sbv+?QHQ;v- G2BTxB; lMBp2`bBiv- G2BTxB;- :2`KMv
    ∗*Q``2bTQM/BM; mi?Q`, /QKBMBFn/2zM2`!2pXKT;X/2
    "2?pBQ`H `2b2`+?2`b BM+`2bBM;Hv `2+Q;MBx2 i?2 M22/ 7Q` bKTH2b 7`QK KQ`2 /Bp2`b2 TQTmHiBQMb
    i?i +Tim`2 i?2 #`2/i? Q7 ?mKM 2tT2`B2M+2X *m``2Mi ii2KTib iQ 2bi#HBb? ;2M2`HBx#BHBiv +`Qbb
    TQTmHiBQMb 7Q+mb QM i?`2ib iQ pHB/Biv M/ i?2 ++mKmHiBQM Q7 H`;2 +`Qbb@+mHim`H /ib2ibX 6Q`
    +QMiBMm2/ T`Q;`2bb- KQ`2 /Bp2`b2 /i M/ HBbib Q7 i?BM;b i?i +M ;Q r`QM; `2 MQi bm{+B2Miě
    r2 HbQ M22/ 7`K2rQ`F i?i H2ib mb /2i2`KBM2 r?B+? BM72`2M+2b +M #2 /`rM M/ ?Qr iQ KF2
    BM7Q`KiBp2 +`Qbb@+mHim`H +QKT`BbQMbX q2 BMi`Q/m+2 7Q`KH ;2M2`iBp2 +mbH KQ/2HBM; 7`K2rQ`F
    M/ QmiHBM2 bBKTH2 ;`T?B+H +`Bi2`B iQ /2`Bp2 MHviB+ bi`i2;B2b M/ BKTHB2/ ;2M2`HBxiBQM 7`QK
    +mbH /B;`KbX lbBM; #Qi? bBKmHi2/ M/ `2H /i- r2 /2KQMbi`i2 K2i?Q/b iQ T`QD2+i M/
    +QKT`2 2biBKi2b +`Qbb TQTmHiBQMbX q2 +QM+Hm/2 rBi? /Bb+mbbBQM Q7 ?Qr 7Q`KH 7`K2rQ`F
    7Q` ;2M2`HBx#BHBiv +M bbBbi `2b2`+?2`b BM /2bB;MBM; KtBKHHv BM7Q`KiBp2 +`Qbb@+mHim`H bim/B2b
    M/ i?mb T`QpB/2b KQ`2 bQHB/ 7QmM/iBQM 7Q` +mKmHiBp2 M/ ;2M2`HBx#H2 #2?pBQ`H `2b2`+?X
    E2vrQ`/b, *`Qbb@+mHim`H `2b2`+?- ;2M2`HBx#BHBiv- q1A_. bKTH2b T`Q#H2K- +mbH BM72`2M+2-
    TQbibi`iB}+iBQMX
    RX AMi`Q/m+iBQM
    h?2 #2?pBQ`H M/ bQ+BH b+B2M+2b ?p2 #22M +`BiB+Bx2/ 7Q` `2HvBM; HKQbi 2t+HmbBp2Hv
    QM q1A_. bKTH2b BM r?B+? KQbi T`iB+BTMib `2 q2bi2`M- 2/m+i2/- M/ 7`QK BM/mbi`B@
    HBx2/- `B+?- M/ /2KQ+`iB+ +QmMi`B2b U>2M`B+? 2i HX kyRy- TB+2HH 2i HX kyky- >2M`B+?
    kykyVX _2b2`+? ?b 2bi#HBb?2/ bm#biMiBH +`Qbb@+mHim`H p`BiBQM BM F2v Tbv+?QHQ;B+H
    8
    /QKBMb- bm+? b i?BMFBM; bivH2b U2X;X Jbm/ M/ LBb#2ii kyyR- LBb#2ii M/ JBvKQiQ
    Coming next month to a preprint server near you

    View Slide

  49. Many Qs are really post-strat Qs
    • Justified descriptions require causal information and post-
    stratification
    • Other tasks are structurally similar
    • Causal effects also require post-stratification. e.g. vaccines
    • Proper time trends account for changes in measurement/
    population, post-strat correctly for each time period
    • Comparison is post-stratification from one population to
    another

    View Slide

  50. Honest Methods

    for

    Modest Questions
    Satellites
    Surv
    Almanacs Collection
    veys
    Exca
    Archives
    tales
    urveys Ethnography
    Re
    rds
    A
    y

    View Slide

  51. Simple 4-step plan for honest digital scholarship
    • (1) What are we trying to describe?
    • (2) What is the ideal data for doing so?
    • (3) What data do we actually have?
    • (4) What causes the differences between (2) and (3)?
    • (5) [optional] Is there a statistical way to use (3) + (4) to
    accomplish (1)?

    View Slide