Lecture 06 of the Dec 2018 through March 2019 edition of Statistical Rethinking. Covers multiple regression and basic causal inference (back-door criterion)
available predictors to model • “We controlled for...” • Almost always a bad idea • Adding variables creates confounds • Residual confounding • Overfitting
PVME TUJMM CF BTTPDJBUFE XJUI % FOUJSFMZ UISPVHI UIF JOEJSFDU QBUI ćBU LOPXO BT ĺIJıĶĮŁĶļĻ BOE XFMM IBWF BO FYBNQMF MBUFS S UIF JOEJSFDU QBUI BDUVBMMZ EPFT OP XPSL )PX DBO XF TIPX UIBU 8F NBSSJBHF SBUF JT QPTJUJWFMZ BTTPDJBUFE XJUI EJWPSDF SBUF #VU UIBU JTOU IF QBUI . → % JT QPTJUJWF *U DPVME CF UIBU UIF BTTPDJBUJPO CFUXFFO Z GSPN "T JOĘVFODF PO CPUI . BOE % -JLF UIJT A D M TUFOU XJUI UIF JOGFSFODFT GSPN NPEFMT (ǒǡǎ BOE (ǒǡǏ 4P XIJDI JT JU G NBSSJBHF SBUF PS SBUIFS JT BHF BU NBSSJBHF KVTU ESJWJOH CPUI DSFBUJOH CFUXFFO NBSSJBHF SBUF BOE EJWPSDF SBUF Age at marriage Marriage rate Divorce rate Z is a common cause of X and Y DE-confounding! conditioning on Z removes dependency between X and Y X _||_ Y | Z
Z removes dependency between X and Y: X _||_ Y | Z X Z Y Z mediates association between X and Y data do not distinguish from fork! X _||_ Y | Z in both
• Example: Influence of grandparents (G) and parents (P) on education of children (C) 5)& )"6/5&% %"( 5)& $"64"- 5&3303 C G P U JT B DPNNPO DPOTFRVFODF PG ( BOE 6 TP JG XF DPOEJUJPO PO 1 JU XJMM CJB unobserved variable collider!
2 grandparent education (G) grandchild education (C) good neighborhoods bad neighborhoods Parents in 45th to 60th centiles 'ĶĴłĿIJ ƎƎ 6OPCTFSWFE DPOGPVO MJEFS CJBT *O UIJT FYBNQMF HSBOEQB FODF HSBOELJET POMZ JOEJSFDUMZ UI FOUT )PXFWFS VOPCTFSWFE OFJHIC GFDUT PO QBSFOUT BOE UIFJS DIJMESFO JMMVTJPO UIBU HSBOEQBSFOUT IBSN UI LJET FEVDBUJPO 1BSFOUBM FEVDBUJP MJEFS 0ODF XF DPOEJUJPO PO JU HSB FEVDBUJPO CFDPNFT OFHBUJWFMZ BTTP HSBOEDIJME FEVDBUJPO XF MFBWF UIF TDBMF BMPOF XF TIPVME CF BCMF UP SFDPWFS TPNFUIJOH DMPTF UP UIPTF W BQPMPHJ[F GPS VTJOH WBHVF QSJPST IFSF KVTU UP QVTI GPSXBSE JO UIF FYBNQMF Figure 6.6 P in bad neighborhoods must have had more educated G. P in good neighborhoods must have had less educated G. Otherwise they wouldn’t all be in same quantile. Consider those P in 45-60th centile of education.
• The back-door criterion: Confounding caused by existence of open back door paths from X to Y • If you know your elements, you know how to open/close each of them #VU XIBU FYBDUMZ JT DPOGPVOEJOH "OE XIJDI QSJODJQMFT FYQMBJO XIZ TPN WBSJBCMFT BOE TPNFUJNFT BEEJOH UIFN DBO QSPEVDF UIF TBNF QIFOPNFOP S DBVTBM NPOTUFST MVSLJOH PVU UIFSF IBVOUJOH PVS HSBQIT 8F SFRVJSF TPN $POGPVOEJOH JT BOZ DPOUFYU JO XIJDI UIF BTTPDJBUJPO CFUXFFO BO PVUDPN PS PG JOUFSFTU 9 JT OPU UIF TBNF BT JU XPVME CF JG XF IBE FYQFSJNFOUBMMZ E FT PG 9 'PS FYBNQMF JO UIF QSFWJPVT FYBNQMF UIF BTTPDJBUJPO CFUXFFO T JT DPOGPVOEFE CZ UIF VOPCTFSWFE WBSJBCMF 6 *G XF IBE BTTJHOFE FEVD MF XFE HFU B EJČFSFOU FTUJNBUF GPS UIF BTTPDJBUJPO %JSFDUMZ NBOJQVMBU T UIF HSBQI PO UIF MFę JOUP UIF HSBQI PO UIF SJHIU E U W E U W EPFT JU EP UIJT *O UIF HSBQI PO UIF MFę UIFSF BSF UXP QBUIT DPOOFDU
X Z Y The Collider X Z Y The Descendant A Open unless you condition on Z Open unless you condition on Z Closed until you condition on Z Conditioning on A is like conditioning on Z
NBOJQVMB OT UIF HSBQI PO UIF MFę JOUP UIF HSBQI PO UIF SJHIU E U W E U W X EPFT JU EP UIJT *O UIF HSBQI PO UIF MFę UIFSF BSF UXP QBUIT DPOOFD & → 8 BOE & ← 6 → 8 " iQBUIw IFSF KVTU NFBOT BOZ TFSJFT P E XBML UISPVHI UP HFU GSPN POF WBSJBCMF UP BOPUIFS JHOPSJOH UIF EJSFDUJPO OJQVMBUJPO SFNPWFT UIF JOĘVFODF PG 6 PO & ćJT UIFO TUPQT JOGPSNBUJPO XFFO & BOE 8 UISPVHI 6 *U CMPDLT UIF TFDPOE QBUI 0ODF UIF QBUI JT CM Z POF XBZ GPS JOGPSNBUJPO UP HP CFUXFFO & BOE 8 BOE UIFO NFBTVSJOH XFFO & BOE 8 DPVME ZJFME B VTFGVM NFBTVSF PG DBVTBM JOĘVFODF .BOJQV DPOGPVOEJOH CFDBVTF JU CMPDLT UIF PUIFS QBUI CFUXFFO & BOE 8 /PX DPOTJEFS UIBU UIFSF BSF TUBUJTUJDBM XBZT UP BDIJFWF UIF TBNF SFTVMU X OJQVMBUJOH & )PX ćF NPTU PCWJPVT JT UP BEE 6 UP UIF NPEFM UP DPOEJU T UIJT BMTP SFNPWF UIF DPOGPVOEJOH #FDBVTF JU BMTP CMPDLT UIF ĘPX PG J Two paths from E to W: (1) E → W (2) E ← U → W Close 2nd path by conditioning on U, closing the pipe.
JG XF DPOEJUJPO PO 1 JU XJMM CJBT JOGFSFODF OFWFS HFU UP NFBTVSF 6 * EPOU FYQFDU UIBU GBDU UP CF JNNFEJBUFMZ PVHI B RVBOUJUBUJWF FYBNQMF USJBET PG HSBOEQBSFOUT QBSFOUT BOE DIJMESFO ćJT TJNVMBUJPO SPKFDU PVS %"( BT B TFSJFT PG JNQMJFE GVODUJPOBM SFMBUJPOTIJQT ćF PO PG ( BOE 6 PO PG ( 1 BOE 6 GVODUJPOT PG BOZ PUIFS LOPXO WBSJBCMFT 3 paths from G to C: (1) G → C (2) G → P → C (3) G → P ← U → C Condition on P: Closes (2) but opens (3)
condition on to infer X → Y? • Procedure: (1) Find all paths. (2) Open/close as necessary. PDL UIF QBUI GSPN 9 UP : ćF TBNF IPMET GPS DPMMJEFST *G ZPV DPO FTDFOEFOU PG B DPMMJEFS JUMM TUJMM CF MJLF XFBLMZ DPOEJUJPOJOH PO B DP UUFS IPX DPNQMJDBUFE B DBVTBM %"( BQQFBST JU JT BMXBZT CVJMU PVU P BUJPOT "OE TJODF ZPV LOPX IPX UP PQFO BOE DMPTF FBDI ZPV PS ZPV PVU XIJDI WBSJBCMFT ZPV OFFE UP DPOUSPMPS OPUJO PSEFS UP TIVU UI FS TPNF FYBNQMFT SPBET ćF %"( CFMPX DPOUBJOT BO FYQPTVSF PG JOUFSFTU 9 BO PVUDPN TFSWFE WBSJBCMF 6 BOE UISFF PCTFSWFE DPWBSJBUFT " # BOE $ A B C U X Y
condition on to infer X → Y? • Condition on A or C. Do not condition on B. DPMMJEFS JUMM TUJMM CF MJLF XFBLMZ DPOEJUJPOJOH PO B DPMMJEFS MJDBUFE B DBVTBM %"( BQQFBST JU JT BMXBZT CVJMU PVU PG UIFTF GPVS DF ZPV LOPX IPX UP PQFO BOE DMPTF FBDI ZPV PS ZPVS DPNQVUFS CMFT ZPV OFFE UP DPOUSPMPS OPUJO PSEFS UP TIVU UIF CBDLEPPS QMFT "( CFMPX DPOUBJOT BO FYQPTVSF PG JOUFSFTU 9 BO PVUDPNF PG JOUFSFTU 6 BOE UISFF PCTFSWFE DPWBSJBUFT " # BOE $ A B C U X Y VF QBUI UIF DBVTBM FČFDU PG 9 PO : 8IJDI PG UIF PCTFSWFE DPWBSJ U X 8F BSF JOUFSFTUFE JO UIF CMVF QBUI UIF BUFT EP XF OFFE UP BEE UP UIF NPEFM J CBDLEPPS QBUIT "TJEF GSPN UIF EJSFD 9 ← 6 ← " → $ → : 9 ← 6 → # ← $ → : : BO VOPCTFSWFE WBSJBCMF 6 BOE UISF U X 8F BSF JOUFSFTUFE JO UIF CMVF QBUI UIF BUFT EP XF OFFE UP BEE UP UIF NPEFM J CBDLEPPS QBUIT "TJEF GSPN UIF EJSFD 9 ← 6 ← " → $ → : 9 ← 6 → # ← $ → : This path is open. This path is closed.
to infer W → D? $0/'30/5*/( $0/'06/%*/( A D M S W QI 4 JT XIFUIFS PS OPU B 4UBUF JT JO UIF TPVUIFSO 6OJUFE 4UBUFT " JT South Waffle Houses Marriage Divorce Age at marriage
to infer W → D? $0/'30/5*/( $0/'06/%*/( A D M S W QI 4 JT XIFUIFS PS OPU B 4UBUF JT JO UIF TPVUIFSO 6OJUFE 4UBUFT " JT South Waffle Houses Marriage Divorce Age at marriage
$0/'30/5*/( $0/'06/%*/( A D M S W FS PS OPU B 4UBUF JT JO UIF TPVUIFSO 6OJUFE 4UBUFT " JT NFEJBO BHF UJPOBM JOEFQFOEFODJFT QBJST PG WBSJBCMFT UIBU BSF OPU BTTPDJBUFE PODF XF DPOEJUJPO PO TPNF TFU PG PUIFS WBSJBCMFT #Z MJTUJOH UIFTF JNQMJFE DPOEJUJPOBM JOEFQFOEFODJFT BOE BTTFTTJOH FBDI XF DBO BU MFBTU UFTU TPNF PG UIF GFBUVSFT PG B HSBQI :PV DBO ĕOE DPOEJUJPOBM JOEFQFOEFODJFT VTJOH UIF TBNF QBUI MPHJD ZPV MFBSOFE GPS ĕOE JOH BOE DMPTJOH CBDLEPPST :PV KVTU IBWF UP GPDVT PO B QBJS PG WBSJBCMFT ĕOE BMM QBUIT DPO OFDUJOH UIFN BOE ĕHVSF PVU JG UIFSF JT BOZ TFU PG WBSJBCMFT ZPV DPVME DPOEJUJPO PO UP DMPTF UIFN BMM *O B MBSHF HSBQI UIJT JT RVJUF B DIPSF CFDBVTF UIFSF BSF NBOZ QBJST PG WBSJBCMFT BOE QPTTJCMZ NBOZ QBUIT #VU ZPVS DPNQVUFS JT HPPE BU TVDI DIPSFT *O UIJT DBTF UIFSF BSF UISFF JNQMJFE DPOEJUJPOBM JOEFQFOEFODJFT 3 DPEF $(+'$ *)$/$*)' ) + ) )$ .ǿ "ǾǓǡǏ Ȁ ǾȆȆǾ Ȇ ǾȆȆǾ Ȇ Ǣ Ǣ ǾȆȆǾ Ȇ (1) A and W independent, conditioning on S (2) D and S independent, conditioning on A, M, & W (3) M and W independent, conditioning on S
cause • Experiments not required! • Experiments not always practical & ethical • Disease, evolution, development, dynamics of popular music, global climate, war • Experiments must choose an intervention • Interventions influence many variables at once • Experimentally manipulate obesity?
small world constructs • Residual confounding: • Misclassification • Measurement error • Missingness • DAGs can accommodate these problems, but maybe tell us there are no solutions • Eventually need *real* models of the system