Upgrade to Pro — share decks privately, control downloads, hide ads and more …

L06 Statistical Rethinking Winter 2019

L06 Statistical Rethinking Winter 2019

Lecture 06 of the Dec 2018 through March 2019 edition of Statistical Rethinking. Covers multiple regression and basic causal inference (back-door criterion)

A0f2f64b2e58f3bfa48296fb9ed73853?s=128

Richard McElreath

January 11, 2019
Tweet

Transcript

  1. The Haunted DAG & The Causal Terror Statistical Rethinking Winter

    2019 Lecture 06 / Week 3
  2. Index variable TJNQMF UIFTF QSJPST XJMM XBTI PVU WFSZ RVJDLMZ

    JO HFOFSBM XF TIPVME CF DBSF BDUVBMMZ NPSF VOTVSF BCPVU NBMF IFJHIU UIBO GFNBMF IFJHIU B QSJPSJ *T UIFSF B "OPUIFS BQQSPBDI BWBJMBCMF UP VT VTJOH UIF TBNF JOGPSNBUJPO JT UP VTF BO ĮįĹIJ JOTUFBE "O JOEFY WBSJBCMF DPOUBJOT JOUFHFST UIBU DPSSFTQPOE UP EJČFS ćF JOUFHFST BSF KVTU OBNFT CVU UIFZ BMTP MFU VT SFGFSFODF B MJTU PG DPSSFTQPOEJO POF GPS FBDI DBUFHPSZ *O UIJT DBTF XF DBO DPOTUSVDU PVS JOEFY MJLF UIJT 3 DPEF  ɶ. 3 ʚǶ $! '. ǿ ɶ(' ʙʙǎ Ǣ Ǐ Ǣ ǎ Ȁ ./-ǿ ɶ. 3 Ȁ )0( ȁǎǣǒǑǑȂ Ǐ ǎ ǎ Ǐ ǎ Ǐ ǎ Ǐ ǎ Ǐ ǡǡǡ /PX iw NFBOT GFNBMF BOE iw NFBOT NBMF /P PSEFS JT JNQMJFE ćFTF BSF KV UIF NBUIFNBUJDBM WFSTJPO PG UIF NPEFM CFDPNFT IJ ∼ /PSNBM(µJ, σ) µJ = αŀIJŅ[J] αK ∼ /PSNBM(, ) , GPS K = .. σ ∼ 6OJGPSN(, ) 8IBU UIJT EPFT JT DSFBUF B MJTU PG α QBSBNFUFST POF GPS FBDI VOJRVF WBMVF JO UIF 4P JO UIJT DBTF XF FOE VQ XJUI UXP α QBSBNFUFST OBNFE α BOE α  ćF OVNCF UP UIF WBMVFT JO UIF JOEFY WBSJBCMF . 3 * LOPX UIJT TFFNT PWFSMZ DPNQMJDBUFE PVS QSPCMFN XJUI UIF QSJPST /PX UIF TBNF QSJPS DBO CF BTTJHOFE UP FBDI DPS UIF OPUJPO UIBU BMM UIF DBUFHPSJFT BSF UIF TBNF QSJPS UP UIF EBUB /FJUIFS DBUFH F VOTVSF BCPVU NBMF IFJHIU UIBO GFNBMF IFJHIU B QSJPSJ *T UIFSF BOPUIFS XBZ BQQSPBDI BWBJMBCMF UP VT VTJOH UIF TBNF JOGPSNBUJPO JT UP VTF BO ĶĻıIJŅ ŃĮĿĶ E "O JOEFY WBSJBCMF DPOUBJOT JOUFHFST UIBU DPSSFTQPOE UP EJČFSFOU DBUFHPSJFT BSF KVTU OBNFT CVU UIFZ BMTP MFU VT SFGFSFODF B MJTU PG DPSSFTQPOEJOH QBSBNFUFST DBUFHPSZ *O UIJT DBTF XF DBO DPOTUSVDU PVS JOEFY MJLF UIJT ! '. ǿ ɶ(' ʙʙǎ Ǣ Ǐ Ǣ ǎ Ȁ Ȁ ǑȂ Ǐ ǎ ǎ Ǐ ǎ Ǐ ǎ Ǐ ǎ Ǐ ǡǡǡ BOT GFNBMF BOE iw NFBOT NBMF /P PSEFS JT JNQMJFE ćFTF BSF KVTU MBCFMT "OE BUJDBM WFSTJPO PG UIF NPEFM CFDPNFT IJ ∼ /PSNBM(µJ, σ) µJ = αŀIJŅ[J] αK ∼ /PSNBM(, ) , GPS K = .. σ ∼ 6OJGPSN(, ) FT JT DSFBUF B MJTU PG α QBSBNFUFST POF GPS FBDI VOJRVF WBMVF JO UIF JOEFY WBSJBCMF F XF FOE VQ XJUI UXP α QBSBNFUFST OBNFE α BOE α  ćF OVNCFST DPSSFTQPOE JO UIF JOEFY WBSJBCMF . 3 * LOPX UIJT TFFNT PWFSMZ DPNQMJDBUFE CVU JU TPMWFT XJUI UIF QSJPST /PX UIF TBNF QSJPS DBO CF BTTJHOFE UP FBDI DPSSFTQPOEJOH UP
  3. Index variable  $"5&(03*$"- 7"3*"#-&4  3 DPEF  (ǒǡǕ

    ʚǶ ,0+ǿ '$./ǿ # $"#/ ʡ )*-(ǿ (0 Ǣ .$"( Ȁ Ǣ (0 ʚǶ ȁ. 3Ȃ Ǣ ȁ. 3Ȃ ʡ )*-(ǿ ǎǔǕ Ǣ ǏǍ Ȁ Ǣ .$"( ʡ 0)$!ǿ Ǎ Ǣ ǒǍ Ȁ Ȁ Ǣ /ʙ Ȁ +- $.ǿ (ǒǡǕ Ǣ  +/#ʙǏ Ȁ ( ) . ǒǡǒʉ ǖǑǡǒʉ ȁǎȂ ǎǐǑǡǖǎ ǎǡǓǎ ǎǐǏǡǐǑ ǎǐǔǡǑǕ ȁǏȂ ǎǑǏǡǒǕ ǎǡǔǍ ǎǐǖǡǕǓ ǎǑǒǡǏǖ .$"( Ǐǔǡǐǎ ǍǡǕǐ ǏǒǡǖǕ ǏǕǡǓǐ /PUF UIF  +/#ʙǏ UIBU * BEEFE UP +- $. ćJT UFMMT JU UP TIPX BOZ WFDUPS QBSBNFUFST MJLF PVS OFX  WFDUPS 7FDUPS BOE NBUSJY QBSBNFUFST BSF IJEEFO CZ +- $ . CZ EFGBVMU CFDBVTF TPNFUJNFT UIFSF BSF MPUT PG UIFTF BOE ZPV EPOU XBOU UP JOTQFDU UIFJS JOEJWJEVBM WBMVFT :PVMM TFF XIBU * NFBO JO MBUFS DIBQUFST *OUFSQSFUJOH UIFTF QBSBNFUFST JT FBTZ FOPVHI‰UIFZ BSF UIF FYQFDUFE IFJHIUT JO FBDI DBU FHPSZ #VU PęFO XF BSF JOUFSFTUFE JO EJČFSFODFT CFUXFFO DBUFHPSJFT *O UIJT DBTF XIBU JT UIF FYQFDUFE EJČFSFODF CFUXFFO GFNBMFT BOE NBMFT 8F DBO DPNQVUF UIJT VTJOH TBNQMFT GSPN UIF QPTUFSJPS *O GBDU *MM FYUSBDU QPTUFSJPS TBNQMFT JOUP B EBUB GSBNF BOE JOTFSU PVS
  4. Differences TPNFUJNFT UIFSF BSF MPUT PG UIFTF BOE ZPV EPOU

    XBOU UP JOTQFDU UIFJS JOEJWJEVBM WBMVFT :PVMM TFF XIBU * NFBO JO MBUFS DIBQUFST *OUFSQSFUJOH UIFTF QBSBNFUFST JT FBTZ FOPVHI‰UIFZ BSF UIF FYQFDUFE IFJHIUT JO FBDI DBU FHPSZ #VU PęFO XF BSF JOUFSFTUFE JO EJČFSFODFT CFUXFFO DBUFHPSJFT *O UIJT DBTF XIBU JT UIF FYQFDUFE EJČFSFODF CFUXFFO GFNBMFT BOE NBMFT 8F DBO DPNQVUF UIJT VTJOH TBNQMFT GSPN UIF QPTUFSJPS *O GBDU *MM FYUSBDU QPTUFSJPS TBNQMFT JOUP B EBUB GSBNF BOE JOTFSU PVS DBMDVMBUJPO EJSFDUMZ JOUP UIF TBNF GSBNF 3 DPEF  +*./ ʚǶ 3/-/ǡ.(+' .ǿ(ǒǡǕȀ +*./ɶ$!!Ǿ!( ʚǶ +*./ɶȁǢǎȂ Ƕ +*./ɶȁǢǏȂ +- $.ǿ +*./ Ǣ  +/#ʙǏ Ȁ ,0+ +*./ -$*-ǣ ǎǍǍǍǍ .(+' . !-*( (ǒǡǕ ( ) . ǒǡǒʉ ǖǑǡǒʉ #$./*"-( .$"( ǏǔǡǏǖ ǍǡǕǑ Ǐǒǡǖǒ ǏǕǡǓǐ ΤΤΤΤΦΪΪΪΦΥΤΤΤ ȁǎȂ ǎǐǑǡǖǎ ǎǡǒǖ ǎǐǏǡǐǔ ǎǐǔǡǑǏ ΤΤΤΥΨΪΪΨΥΤΤΤΤ ȁǏȂ ǎǑǏǡǓǍ ǎǡǔǎ ǎǐǖǡǖǍ ǎǑǒǡǐǒ ΤΤΤΨΪΦΤΤΤ $!!Ǿ!( ǶǔǡǔǍ Ǐǡǐǐ ǶǎǎǡǑǎ Ƕǐǡǖǔ ΤΤΤΤΦΪΪΦΤΤΤ 0VS DBMDVMBUJPO BQQFBST BU UIF CPUUPN BT JG JU XFSF B OFX QBSBNFUFS JO UIF QPTUFSJPS ćJT JT UIF FYQFDUFE EJČFSFODF CFUXFFO B GFNBMF BOE NBMF JO UIF TBNQMF ćJT LJOE PG DBMDVMBUJPO JT DBMMFE B İļĻŁĿĮŀŁ /P NBUUFS IPX NBOZ DBUFHPSJFT ZPV IBWF ZPV DBO DPNQVUF UIF DPOUSBTU CFUXFFO BOZ UXP CZ VTJOH TBNQMFT GSPN UIF QPTUFSJPS UP DPNQVUF UIFJS EJČFSFODF ćFO ZPV HFU UIF QPTUFSJPS EJTUSJCVUJPO PG UIF EJČFSFODF  .BOZ DBUFHPSJFT #JOBSZ DBUFHPSJFT BSF FBTZ XIFUIFS ZPV VTF BO JOEJDBUPS WBSJBCMF PS JOTUFBE BO JOEFY WBSJBCMF #VU XIFO UIFSF BSF NPSF UIBO UXP DBUFHPSJFT UIF JOEJDBUPS WBSJBCMF BQQSPBDI FYQMPEFT :PVMM OFFE B OFX JOEJDBUPS WBSJBCMF GPS FBDI OFX DBUFHPSZ *G ZPV IBWF L VOJRVF DBUFHPSJFT ZPV OFFE L −  JOEJDBUPS WBSJBCMFT "VUPNBUFE UPPMT MJLF 3T '( EP JO GBDU HP UIJT SPVUF DPOTUSVDUJOH L− JOEJDBUPS WBSJBCMFT GPS ZPV BOE SFUVSOJOH L−
  5. data(Hurricanes)

  6. Why aren’t surprising things true? Dr Felisa Wolfe-Simon at Mono

    Lake
  7.   5)& )"6/5&% %"(  5)& $"64"- 5 -2

    -1 0 1 2 3 -3 -2 -1 0 1 2 3 newsworthiness trustworthiness selected rejected 'ĶĴłĿIJ ƎƉ 8 JFT NJHIU CF TFBSDI QSPQP USVTUXPSUIJOF  BSF TFMFD OP DPSSFMBUJPO SJB BSF TUSPOH MFDUJPO ćF DP Selection-distortion effect
  8. Regression as a wicked oracle • Regression automatically focuses on

    the most informative cases • Cases that don’t help are automatically ignored • But not kind — ask carefully
  9. Why not just add everything? • Could just add all

    available predictors to model • “We controlled for...” • Almost always a bad idea • Adding variables creates confounds • Residual confounding • Overfitting
  10. X Z Y The Pipe X Z Y The Fork

    X Z Y The Collider X Z Y The Descendant A Ye Olde Causal Alchemy The Four Elemental Confounds
  11. The Confounding Fork X Z Y  4163*064 "440$*"5*0/ 

    PVME TUJMM CF BTTPDJBUFE XJUI % FOUJSFMZ UISPVHI UIF JOEJSFDU QBUI ćBU LOPXO BT ĺIJıĶĮŁĶļĻ BOE XFMM IBWF BO FYBNQMF MBUFS S UIF JOEJSFDU QBUI BDUVBMMZ EPFT OP XPSL )PX DBO XF TIPX UIBU 8F NBSSJBHF SBUF JT QPTJUJWFMZ BTTPDJBUFE XJUI EJWPSDF SBUF #VU UIBU JTOU IF QBUI . → % JT QPTJUJWF *U DPVME CF UIBU UIF BTTPDJBUJPO CFUXFFO Z GSPN "T JOĘVFODF PO CPUI . BOE % -JLF UIJT A D M TUFOU XJUI UIF JOGFSFODFT GSPN NPEFMT (ǒǡǎ BOE (ǒǡǏ 4P XIJDI JT JU G NBSSJBHF SBUF PS SBUIFS JT BHF BU NBSSJBHF KVTU ESJWJOH CPUI DSFBUJOH CFUXFFO NBSSJBHF SBUF BOE EJWPSDF SBUF Age at marriage Marriage rate Divorce rate Z is a common cause of X and Y DE-confounding! conditioning on Z removes dependency between X and Y X _||_ Y | Z
  12. The Perplexing Pipe X causes Z causes Y conditioning on

    Z removes dependency between X and Y: X _||_ Y | Z X Z Y Z mediates association between X and Y data do not distinguish from fork! X _||_ Y | Z in both
  13. Post-treatment bias • The pipe confounds when we ignore it

    • Post-treatment bias: Controlling for consequence of treatment statistically knocks out treatment y x1 x2 Treatment Consequence of treatment Outcome
  14.  Ƕʛ ǎ  Ƕʛ  $)/ .ǿ +')/Ǿ" Ȁ

    ʚǶ '$./ǿ 3ʙǿ ǍʙǍǢʙǏǢʙǎǡǒǢ ǎʙǎȀ Ǣ 4ʙǿ ǍʙǍǢʙǍǢʙǎǢ ǎʙǏȀ Ȁ ǿ +')/Ǿ" Ȁ F H0 H1 T F USFBUNFOU 5 JOĘVFODFT UIF QSFTFODF PG GVOHVT ' XIJDI JOĘVFODFT QMBOU IFJ  1MBOU IFJHIU BU UJNF  JT BMTP JOĘVFODFE CZ QMBOU IFJHIU BU UJNF  ) 8IFO XF JODMVEF ' UIF QPTUUSFBUNFOU FČFDU JO UIF NPEFM XF FOE VQ CM GSPN UIF USFBUNFOU UP UIF PVUDPNF ćJT JT UIF %"( XBZ PG TBZJOH UIBU MF NFOU UFMMT VT OPUIJOH BCPVU UIF PVUDPNF PODF XF LOPX UIF GVOHVT TUBUVT Post-treatment bias Anti-fungal Treatment Fungus Outcome Section 6.2 Initial height
  15. Post-treatment bias Observational studies harder income status job “Treatment” edu-

    cation Controlling for every available variable likely to block a pipe someplace.
  16. The Explosive Collider X and Y jointly cause Z conditioning

    on Z creates dependency between X and Y X Z Y X and Y independent learning X and Z reveals Y
  17. Conditioning on a Collider switch Light electricity

  18. Conditioning on a Collider switch Light electricity ON OFF ?

  19. Conditioning on a Collider switch Light electricity ON ON ?

  20. Conditioning on a Collider newsworthy published trustworthy

  21. Conditioning on a Collider newsworthy published trustworthy in Nature Not

    ?
  22. Are taller people better at basketball? 473 NBA players, 2016-2017

    season
  23. marry happy Collider confounding Conditioning on collider is like selecting

    on sub- population. Are older people less happy? Should we control for marriage status? age + + ?
  24. Collider simulation • Assumptions: • 20 people born each year

    • Uniform happiness at birth, never changes • At 18 years old, eligible to marry. Probability of marriage in each year proportional to happiness. • Married people remain married until death. • At age 65, move to south coast of Spain. UJDBM BTTPDJBUJPO CFUXFFO BHF BOE IBQQJOFTT "OE UIJT DBO NJTMFBE VT UP UIJOL UIBU IBQQJ DIBOHFT XJUI BHF XIFO JO GBDU JU JT DPOTUBOU 5P DPOWJODF ZPV PG UIJT MFUT EP BOPUIFS TJNVMBUJPO 4JNVMBUJPOT BSF VTFGVM JO UIFT BNQMFT CFDBVTF UIFTF BSF UIF POMZ UJNFT XIFO XF LOPX UIF USVF DBVTBM NPEFM *G B QSPDF DBOOPU ĕHVSF PVU UIF USVUI JO B TJNVMBUFE FYBNQMF XF TIPVMEOU USVTU JU JO B SFBM POF 8 HPJOH UP EP B GBODJFS TJNVMBUJPO UIJT UJNF VTJOH BO BHFOUCBTFE NPEFM PG BHJOH BOE NBSS UP QSPEVDF B TJNVMBUFE EBUB TFU UP VTF JO B SFHSFTTJPO )FSF JT UIF TJNVMBUJPO EFTJHO  &BDI ZFBS  QFPQMF BSF CPSO XJUI VOJGPSNMZ EJTUSJCVUFE IBQQJOFTT WBMVFT  &BDI ZFBS FBDI QFSTPO BHFT POF ZFBS )BQQJOFTT EPFT OPU DIBOHF  "U BHF  JOEJWJEVBMT DBO CFDPNF NBSSJFE ćF PEET PG NBSSJBHF FBDI ZFBS QSPQPSUJPOBM UP BO JOEJWJEVBMT IBQQJOFTT  0ODF NBSSJFE BO JOEJWJEVBM SFNBJOT NBSSJFE  "ęFS BHF  JOEJWJEVBMT MFBWF UIF TBNQMF ćFZ NPWF UP 4QBJO *WF XSJUUFO UIJT BMHPSJUIN JOUP UIF - /#$)&$)" QBDLBHF :PV DBO SVO JU PVU GPS  Z BOE DPMMFDU UIF SFTVMUJOH EBUB 3 DPEF  '$--4ǿ- /#$)&$)"Ȁ  ʚǶ .$(Ǿ#++$) ..ǿ . ʙǎǖǔǔ Ǣ Ǿ4 -.ʙǎǍǍǍ Ȁ +- $.ǿȀ Ǫ/ǡ!-( Ǫǣ ǎǐǍǍ *.ǡ *! ǐ 1-$' .ǣ ( ) . ǒǡǒʉ ǖǑǡǒʉ #$./*"-( " ǐǐǡǍ ǎǕǡǔǔ ǑǡǍǍ ǓǏǡǍǍ ΪΪΪΪΪΪΪΪΪΪΪΪΪ (--$  Ǎǡǐ ǍǡǑǓ ǍǡǍǍ ǎǡǍǍ ΪΤΤΤΤΤΤΤΤΦ #++$) .. ǍǡǍ ǎǡǏǎ Ƕǎǡǔǖ ǎǡǔǖ ΪΨΪΨΨΪΨΪ ćFTF EBUB DPNQSJTF  QFPQMF PG BMM BHFT GSPN CJSUI UP  ZFBST PME ćF WBSJBCMFT DP
  25. Collider of sorrow B WFSZ TUSPOH QSJPS CVU BHBJO JU

    BU MFBTU IFMQT CPVOE JOGFSFODF UP SFBMJT UIF JOUFSDFQUT &BDI α JT UIF WBMVF PG µJ XIFO "J =  *O UIJT DBTF UIBU XF OFFE UP BMMPX α UP DPWFS UIF GVMM SBOHF PG IBQQJOFTT TDPSFT /PSNBM( UIF NBTT JO UIF − UP + JOUFSWBM 'JOBMMZ MFUT BQQSPYJNBUF UIF QPTUFSJPS 8F OFFE UP DPOTUSVDU UIF N WBSJBCMF BT XFMM *MM EP UIBU BOE UIFO JNNFEJBUF QSFTFOU UIF ,0+ DPEF 3 DPEF  Ǐɶ($ ʚǶ Ǐɶ(--$  ʔ ǎ (Ǔǡǖ ʚǶ ,0+ǿ '$./ǿ #++$) .. ʡ )*-(ǿ (0 Ǣ .$"( ȀǢ (0 ʚǶ ȁ($Ȃ ʔ ȉǢ ȁ($Ȃ ʡ )*-(ǿ Ǎ Ǣ ǎ ȀǢ  ʡ )*-(ǿ Ǎ Ǣ Ǐ ȀǢ .$"( ʡ  3+ǿǎȀ Ȁ Ǣ /ʙǏ Ȁ +- $.ǿ(ǓǡǖǢ +/#ʙǏȀ ( ) . ǒǡǒʉ ǖǑǡǒʉ ȁǎȂ ǶǍǡǏǐ ǍǡǍǓ ǶǍǡǐǑ ǶǍǡǎǐ ȁǏȂ ǎǡǏǓ ǍǡǍǕ ǎǡǎǏ ǎǡǑǍ  ǶǍǡǔǒ Ǎǡǎǎ ǶǍǡǖǐ ǶǍǡǒǔ .$"( Ǎǡǖǖ ǍǡǍǏ Ǎǡǖǒ ǎǡǍǐ ćF NPEFM JT RVJUF TVSF UIBU BHF JT OFHBUJWFMZ BTTPDJBUFE XJUI IBQQJOFTT 8 UIF JOGFSFODFT GSPN UIJT NPEFM UP B NPEFM UIBU PNJUT NBSSJBHF TUBUVT ) married single
  26.  $0--*%&3 #*"4 0 10 20 30 40 50 60

    -2 -1 0 1 2 age happiness married unmarried 'ĶĴłĿIJ Ǝƍ 4JNVMBUFE EBUB BTTVNJOH UIBU IBQQJOFTT JT VOJGPSNMZ EJT USJCVUFE BOE OFWFS DIBOHFT &BDI QPJOU JT B QFSTPO .BSSJFE JOEJWJEVBMT BSF TIPXO XJUI ĕMMFE CMVF QPJOUT "U FBDI BHF BęFS  UIF IBQQJFTU JOEJWJE Figure 6.5
  27.  $0--*%&3 #*"4 0 10 20 30 40 50 60

    -2 -1 0 1 2 age happiness married unmarried 'ĶĴłĿIJ Ǝƍ 4JNVMBUFE EBUB BTTVNJOH UIBU IBQQJOFTT JT VOJGPSNMZ EJT USJCVUFE BOE OFWFS DIBOHFT &BDI QPJOU JT B QFSTPO .BSSJFE JOEJWJEVBMT BSF TIPXO XJUI ĕMMFE CMVF QPJOUT "U FBDI BHF BęFS  UIF IBQQJFTU JOEJWJE Figure 6.5
  28. marry happy Collider confounding Are older people less happy? Controlling

    for marriage status creates a confound. Cannot know whether to control for some variable, without a causal model. age + + ?
  29. The Haunted DAG • Unmeasured variables can also create colliders

    • Example: Influence of grandparents (G) and parents (P) on education of children (C) BVOUFE%"( $PMMJEFS CJBT BSJTFT GSPN DPOEJUJPOJOH PO B DPNNPO DPOTF WJPVT FYBNQMF *G XF DBO KVTU HFU PVS HSBQI TPSUFE XF DBO BWPJE JU #VU TZ UP TFF B QPUFOUJBM DPMMJEFS CFDBVTF UIFSF NBZ CF VONFBTVSFE DBVTFT 6 T DBO TUJMM JOEVDF DPMMJEFS CJBT 4P *N TPSSZ UP TBZ UIBU XF BMTP IBWF UP DP UZ UIBU PVS %"( NBZ CF IBVOUFE GPS FYBNQMF UIBU XF BSF JOUFSFTUFE JO JOGFSSJOH UIF EJSFDU JOĘVFODF PG CP HSBOEQBSFOUT ( PO UIF FEVDBUJPOBM BDIJFWFNFOU PG DIJMESFO $  4JODF QSFTVNBCMZ JOĘVFODF UIFJS PXO DIJMESFOT FEVDBUJPO UIFSF JT BO BSSPX ( QSFUUZ FBTZ TP GBS *UT KVTU MJLF PVS EJWPSDF SBUF FYBNQMF GSPN MBTU DIBQUFS C G P U
  30. The Haunted DAG • Unmeasured variables can also create colliders

    • Example: Influence of grandparents (G) and parents (P) on education of children (C)  5)& )"6/5&% %"(  5)& $"64"- 5&3303 C G P U JT B DPNNPO DPOTFRVFODF PG ( BOE 6 TP JG XF DPOEJUJPO PO 1 JU XJMM CJB unobserved variable collider!
  31. Simulated haunting  1 JT TPNF GVODUJPO PG ( BOE

    6  $ JT TPNF GVODUJPO PG ( 1 BOE 6  ( BOE 6 BSF OPU GVODUJPOT PG BOZ PUIFS LOPXO WBSJBCMFT 8F DBO NBLF UIFTF JNQMJDBUJPOT JOUP B TJNQMF TJNVMBUJPO VTJOH -)*-( UP HFOFSBUF TJNVMBUFE PCTFSWBUJPOT #VU UP EP UIJT XF OFFE UP CF B CJU NPSF QSFDJTF UIBO iTPNF GVODUJPO PGw 4P *MM JOWFOU TPNF TUSFOHUI PG BTTPDJBUJPO 3 DPEF   ʚǶ ǏǍǍ ȕ )0( - *! "-)+- )/Ƕ+- )/Ƕ#$' /-$. Ǿ ʚǶ ǎ ȕ $- / !! / *!  *)  Ǿ ʚǶ Ǎ ȕ $- / !! / *!  *)  Ǿ ʚǶ ǎ ȕ $- / !! / *!  *)  Ǿ ʚǶ Ǐ ȕ $- / !! / *!  *)  )  ćFTF QBSBNFUFST BSF MJLF TMPQFT JO B SFHSFTTJPO NPEFM /PUJDF UIBU *WF BTTVNFE UIBU HSBOE QBSFOUT ( IBWF [FSP FČFDU PO UIFJS HSBOELJET $ ćF FYBNQMF EPFTOU EFQFOE VQPO UIBU FČFDU CFJOH FYBDUMZ [FSP CVU JU XJMM NBLF UIF MFTTPO DMFBSFS /PX XF VTF UIFTF TMPQFT UP ESBX SBOEPN PCTFSWBUJPOT 3 DPEF  . /ǡ. ǿǎȀ  ʚǶ Ǐȉ- -)ǿ  Ǣ Ǎǡǒ Ȁ Ƕ ǎ  ʚǶ -)*-(ǿ  Ȁ  ʚǶ -)*-(ǿ  Ǣ Ǿȉ ʔ Ǿȉ Ȁ  ʚǶ -)*-(ǿ  Ǣ Ǿȉ ʔ Ǿȉ ʔ Ǿȉ Ȁ  ʚǶ /ǡ!-( ǿ ʙ Ǣ ʙ Ǣ ʙ Ǣ ʙ Ȁ *WF NBEF UIF OFJHICPSIPPE FČFDU 6 CJOBSZ ćJT XJMM NBLF UIF FYBNQMF FBTJFS UP VOEFS TUBOE #VU UIF FYBNQMF EPFTOU EFQFOE VQPO UIBU BTTVNQUJPO ćF PUIFS MJOFT BSF KVTU MJOFBS NPEFMT FNCFEEFE JO -)*-( PCTFSWBUJPOT #VU UP EP UIJT XF OFFE UP CF B CJU NPSF QSFDJTF UIBO iTPNF GVODUJPO PGw 4P *MM JOWFOU TPNF TUSFOHUI PG BTTPDJBUJPO 3 DPEF   ʚǶ ǏǍǍ ȕ )0( - *! "-)+- )/Ƕ+- )/Ƕ#$' /-$. Ǿ ʚǶ ǎ ȕ $- / !! / *!  *)  Ǿ ʚǶ Ǎ ȕ $- / !! / *!  *)  Ǿ ʚǶ ǎ ȕ $- / !! / *!  *)  Ǿ ʚǶ Ǐ ȕ $- / !! / *!  *)  )  ćFTF QBSBNFUFST BSF MJLF TMPQFT JO B SFHSFTTJPO NPEFM /PUJDF UIBU *WF BTTVNFE UIBU HSBOE QBSFOUT ( IBWF [FSP FČFDU PO UIFJS HSBOELJET $ ćF FYBNQMF EPFTOU EFQFOE VQPO UIBU FČFDU CFJOH FYBDUMZ [FSP CVU JU XJMM NBLF UIF MFTTPO DMFBSFS /PX XF VTF UIFTF TMPQFT UP ESBX SBOEPN PCTFSWBUJPOT 3 DPEF  . /ǡ. ǿǎȀ  ʚǶ Ǐȉ- -)ǿ  Ǣ Ǎǡǒ Ȁ Ƕ ǎ  ʚǶ -)*-(ǿ  Ȁ  ʚǶ -)*-(ǿ  Ǣ Ǿȉ ʔ Ǿȉ Ȁ  ʚǶ -)*-(ǿ  Ǣ Ǿȉ ʔ Ǿȉ ʔ Ǿȉ Ȁ  ʚǶ /ǡ!-( ǿ ʙ Ǣ ʙ Ǣ ʙ Ǣ ʙ Ȁ *WF NBEF UIF OFJHICPSIPPE FČFDU 6 CJOBSZ ćJT XJMM NBLF UIF FYBNQMF FBTJFS UP VOEFS TUBOE #VU UIF FYBNQMF EPFTOU EFQFOE VQPO UIBU BTTVNQUJPO ćF PUIFS MJOFT BSF KVTU MJOFBS NPEFMT FNCFEEFE JO -)*-( /PX XIBU IBQQFOT XIFO XF USZ UP JOGFS UIF JOĘVFODF PG HSBOEQBSFOUT 4JODF TPNF PG UIF UPUBM FČFDU PG HSBOEQBSFOUT QBTTFT UISPVHI QBSFOUT XF SFBMJ[F XF OFFE UP DPOUSPM GPS QBSFOUT )FSF JT B TJNQMF SFHSFTTJPO PG $ PO 1 BOE ( /PSNBMMZ * XPVME BEWJTF TUBOEBSEJ[JOH UIF WBSJBCMFT CFDBVTF JU NBLFT FTUBCMJTIJOH TFOTJCMF QSJPST B MPU FBTJFS #VU *N HPJOH UP LFFQ   5)& )"6/5&% %"(  5)& $"64"- 5&3303 C G P U /PX 1 JT B DPNNPO DPOTFRVFODF PG ( BOE 6 TP JG XF DPOEJUJPO BCPVU ( → $ FWFO JG XF OFWFS HFU UP NFBTVSF 6 * EPOU FYQFDU UI PCWJPVT 4P MFUT DSBXM UISPVHI B RVBOUJUBUJWF FYBNQMF 'JSTU MFUT TJNVMBUF  USJBET PG HSBOEQBSFOUT QBSFOUT BOE D XJMM CF TJNQMF 8FMM KVTU QSPKFDU PVS %"( BT B TFSJFT PG JNQMJFE GVO %"( BCPWF JNQMJFT UIBU
  32. Simulated haunting • Conditioning on parents distorts inference about grandparents

    • Reason: Opens a “backdoor” through U to C   5)& )"6/5&% %"(  5)& $"64"- 5&3303 C G P U /PX 1 JT B DPNNPO DPOTFRVFODF PG ( BOE 6 TP JG XF DPOEJUJPO P BCPVU ( → $ FWFO JG XF OFWFS HFU UP NFBTVSF 6 * EPOU FYQFDU UI PCWJPVT 4P MFUT DSBXM UISPVHI B RVBOUJUBUJWF FYBNQMF 'JSTU MFUT TJNVMBUF  USJBET PG HSBOEQBSFOUT QBSFOUT BOE D -3 -2 -1 0 1 2 -2 -1 grandparent education (G) grandchild bad neighborhoods GFDUT PO QBSFOUT BOE UIFJS DIJMESFO DSFBUF UIF JMMVTJPO UIBU HSBOEQBSFOUT IBSN UIFJS HSBOE LJET FEVDBUJPO 1BSFOUBM FEVDBUJPO JT B DPM MJEFS 0ODF XF DPOEJUJPO PO JU HSBOEQBSFOUBM FEVDBUJPO CFDPNFT OFHBUJWFMZ BTTPDJBUFE XJUI HSBOEDIJME FEVDBUJPO XF MFBWF UIF TDBMF BMPOF XF TIPVME CF BCMF UP SFDPWFS TPNFUIJOH DMPTF UP UIPTF WBMVFT 4P * BQPMPHJ[F GPS VTJOH WBHVF QSJPST IFSF KVTU UP QVTI GPSXBSE JO UIF FYBNQMF 3 DPEF  (Ǔǡǎǎ ʚǶ ,0+ǿ '$./ǿ  ʡ )*-(ǿ (0 Ǣ .$"( ȀǢ (0 ʚǶ  ʔ Ǿȉ ʔ ǾȉǢ  ʡ )*-(ǿ Ǎ Ǣ ǎ ȀǢ ǿǾǢǾȀ ʡ )*-(ǿ Ǎ Ǣ ǎ ȀǢ .$"( ʡ  3+ǿ ǎ Ȁ ȀǢ /ʙ Ȁ +- $.ǿ(ǓǡǎǎȀ ( ) . ǒǡǒʉ ǖǑǡǒʉ  ǶǍǡǎǏ ǍǡǎǍ ǶǍǡǏǕ ǍǡǍǑ Ǿ ǎǡǔǖ ǍǡǍǑ ǎǡǔǏ ǎǡǕǓ Ǿ ǶǍǡǕǑ Ǎǡǎǎ ǶǎǡǍǎ ǶǍǡǓǔ .$"( ǎǡǑǎ ǍǡǍǔ ǎǡǐǍ ǎǡǒǏ ćF JOGFSSFE FČFDU PG QBSFOUT MPPLT UPP CJH BMNPTU UXJDF BT MBSHF BT JU TIPVME CF ćBU JTOU
  33. -3 -2 -1 0 1 2 -2 -1 0 1

    2 grandparent education (G) grandchild education (C) good neighborhoods bad neighborhoods Parents in 45th to 60th centiles 'ĶĴłĿIJ ƎƎ 6OPCTFSWFE DPOGPVO MJEFS CJBT *O UIJT FYBNQMF HSBOEQB FODF HSBOELJET POMZ JOEJSFDUMZ UI FOUT )PXFWFS VOPCTFSWFE OFJHIC GFDUT PO QBSFOUT BOE UIFJS DIJMESFO JMMVTJPO UIBU HSBOEQBSFOUT IBSN UI LJET FEVDBUJPO 1BSFOUBM FEVDBUJP MJEFS 0ODF XF DPOEJUJPO PO JU HSB FEVDBUJPO CFDPNFT OFHBUJWFMZ BTTP HSBOEDIJME FEVDBUJPO XF MFBWF UIF TDBMF BMPOF XF TIPVME CF BCMF UP SFDPWFS TPNFUIJOH DMPTF UP UIPTF W BQPMPHJ[F GPS VTJOH WBHVF QSJPST IFSF KVTU UP QVTI GPSXBSE JO UIF FYBNQMF Figure 6.6 P in bad neighborhoods must have had more educated G. P in good neighborhoods must have had less educated G. Otherwise they wouldn’t all be in same quantile. Consider those P in 45-60th centile of education.
  34. Shutting the back door • What ties these examples together:

    • The back-door criterion: Confounding caused by existence of open back door paths from X to Y • If you know your elements, you know how to open/close each of them #VU XIBU FYBDUMZ JT DPOGPVOEJOH "OE XIJDI QSJODJQMFT FYQMBJO XIZ TPN WBSJBCMFT BOE TPNFUJNFT BEEJOH UIFN DBO QSPEVDF UIF TBNF QIFOPNFOP S DBVTBM NPOTUFST MVSLJOH PVU UIFSF IBVOUJOH PVS HSBQIT 8F SFRVJSF TPN $POGPVOEJOH JT BOZ DPOUFYU JO XIJDI UIF BTTPDJBUJPO CFUXFFO BO PVUDPN PS PG JOUFSFTU 9 JT OPU UIF TBNF BT JU XPVME CF JG XF IBE FYQFSJNFOUBMMZ E FT PG 9 'PS FYBNQMF JO UIF QSFWJPVT FYBNQMF UIF BTTPDJBUJPO CFUXFFO T JT DPOGPVOEFE CZ UIF VOPCTFSWFE WBSJBCMF 6 *G XF IBE BTTJHOFE FEVD MF XFE HFU B EJČFSFOU FTUJNBUF GPS UIF BTTPDJBUJPO %JSFDUMZ NBOJQVMBU T UIF HSBQI PO UIF MFę JOUP UIF HSBQI PO UIF SJHIU E U W E U W EPFT JU EP UIJT *O UIF HSBQI PO UIF MFę UIFSF BSF UXP QBUIT DPOOFDU
  35. X Z Y The Pipe X Z Y The Fork

    X Z Y The Collider X Z Y The Descendant A Ye Olde Causal Alchemy The Four Elemental Confounds
  36. X Z Y The Pipe X Z Y The Fork

    X Z Y The Collider X Z Y The Descendant A Open unless you condition on Z Open unless you condition on Z Closed until you condition on Z Conditioning on A is like conditioning on Z
  37. QMF XFE HFU B EJČFSFOU FTUJNBUF GPS UIF BTTPDJBUJPO %JSFDUMZ

    NBOJQVMB OT UIF HSBQI PO UIF MFę JOUP UIF HSBQI PO UIF SJHIU E U W E U W X EPFT JU EP UIJT *O UIF HSBQI PO UIF MFę UIFSF BSF UXP QBUIT DPOOFD & → 8 BOE  & ← 6 → 8 " iQBUIw IFSF KVTU NFBOT BOZ TFSJFT P E XBML UISPVHI UP HFU GSPN POF WBSJBCMF UP BOPUIFS JHOPSJOH UIF EJSFDUJPO OJQVMBUJPO SFNPWFT UIF JOĘVFODF PG 6 PO & ćJT UIFO TUPQT JOGPSNBUJPO XFFO & BOE 8 UISPVHI 6 *U CMPDLT UIF TFDPOE QBUI 0ODF UIF QBUI JT CM Z POF XBZ GPS JOGPSNBUJPO UP HP CFUXFFO & BOE 8 BOE UIFO NFBTVSJOH XFFO & BOE 8 DPVME ZJFME B VTFGVM NFBTVSF PG DBVTBM JOĘVFODF .BOJQV DPOGPVOEJOH CFDBVTF JU CMPDLT UIF PUIFS QBUI CFUXFFO & BOE 8 /PX DPOTJEFS UIBU UIFSF BSF TUBUJTUJDBM XBZT UP BDIJFWF UIF TBNF SFTVMU X OJQVMBUJOH & )PX ćF NPTU PCWJPVT JT UP BEE 6 UP UIF NPEFM UP DPOEJU T UIJT BMTP SFNPWF UIF DPOGPVOEJOH #FDBVTF JU BMTP CMPDLT UIF ĘPX PG J Two paths from E to W: (1) E → W (2) E ← U → W Close 2nd path by conditioning on U, closing the pipe.
  38. C G P U FRVFODF PG ( BOE 6 TP

    JG XF DPOEJUJPO PO 1 JU XJMM CJBT JOGFSFODF OFWFS HFU UP NFBTVSF 6 * EPOU FYQFDU UIBU GBDU UP CF JNNFEJBUFMZ PVHI B RVBOUJUBUJWF FYBNQMF  USJBET PG HSBOEQBSFOUT QBSFOUT BOE DIJMESFO ćJT TJNVMBUJPO SPKFDU PVS %"( BT B TFSJFT PG JNQMJFE GVODUJPOBM SFMBUJPOTIJQT ćF PO PG ( BOE 6 PO PG ( 1 BOE 6 GVODUJPOT PG BOZ PUIFS LOPXO WBSJBCMFT 3 paths from G to C: (1) G → C (2) G → P → C (3) G → P ← U → C Condition on P: Closes (2) but opens (3)
  39. Something more interesting • Which variables, if any, should you

    condition on to infer X → Y? • Procedure: (1) Find all paths. (2) Open/close as necessary. PDL UIF QBUI GSPN 9 UP : ćF TBNF IPMET GPS DPMMJEFST *G ZPV DPO FTDFOEFOU PG B DPMMJEFS JUMM TUJMM CF MJLF XFBLMZ DPOEJUJPOJOH PO B DP UUFS IPX DPNQMJDBUFE B DBVTBM %"( BQQFBST JU JT BMXBZT CVJMU PVU P BUJPOT "OE TJODF ZPV LOPX IPX UP PQFO BOE DMPTF FBDI ZPV PS ZPV PVU XIJDI WBSJBCMFT ZPV OFFE UP DPOUSPM‰PS OPU‰JO PSEFS UP TIVU UI FS TPNF FYBNQMFT SPBET ćF %"( CFMPX DPOUBJOT BO FYQPTVSF PG JOUFSFTU 9 BO PVUDPN TFSWFE WBSJBCMF 6 BOE UISFF PCTFSWFE DPWBSJBUFT " # BOE $  A B C U X Y
  40. Something more interesting • Which variables, if any, should you

    condition on to infer X → Y? • Condition on A or C. Do not condition on B. DPMMJEFS JUMM TUJMM CF MJLF XFBLMZ DPOEJUJPOJOH PO B DPMMJEFS MJDBUFE B DBVTBM %"( BQQFBST JU JT BMXBZT CVJMU PVU PG UIFTF GPVS DF ZPV LOPX IPX UP PQFO BOE DMPTF FBDI ZPV PS ZPVS DPNQVUFS CMFT ZPV OFFE UP DPOUSPM‰PS OPU‰JO PSEFS UP TIVU UIF CBDLEPPS QMFT "( CFMPX DPOUBJOT BO FYQPTVSF PG JOUFSFTU 9 BO PVUDPNF PG JOUFSFTU 6 BOE UISFF PCTFSWFE DPWBSJBUFT " # BOE $  A B C U X Y VF QBUI UIF DBVTBM FČFDU PG 9 PO : 8IJDI PG UIF PCTFSWFE DPWBSJ U X 8F BSF JOUFSFTUFE JO UIF CMVF QBUI UIF BUFT EP XF OFFE UP BEE UP UIF NPEFM J CBDLEPPS QBUIT "TJEF GSPN UIF EJSFD  9 ← 6 ← " → $ → :  9 ← 6 → # ← $ → : : BO VOPCTFSWFE WBSJBCMF 6 BOE UISF U X 8F BSF JOUFSFTUFE JO UIF CMVF QBUI UIF BUFT EP XF OFFE UP BEE UP UIF NPEFM J CBDLEPPS QBUIT "TJEF GSPN UIF EJSFD  9 ← 6 ← " → $ → :  9 ← 6 → # ← $ → : This path is open. This path is closed.
  41. Waffles Requiem • Remember the waffles. • Which to control

    to infer W → D?  $0/'30/5*/( $0/'06/%*/( A D M S W QI 4 JT XIFUIFS PS OPU B 4UBUF JT JO UIF TPVUIFSO 6OJUFE 4UBUFT " JT South Waffle Houses Marriage Divorce Age at marriage
  42. Waffles Requiem • Remember the waffles. • Which to control

    to infer W → D?  $0/'30/5*/( $0/'06/%*/( A D M S W QI 4 JT XIFUIFS PS OPU B 4UBUF JT JO UIF TPVUIFSO 6OJUFE 4UBUFT " JT South Waffle Houses Marriage Divorce Age at marriage
  43. Implied conditional independence • Given DAG, can test some implications

     $0/'30/5*/( $0/'06/%*/(  A D M S W FS PS OPU B 4UBUF JT JO UIF TPVUIFSO 6OJUFE 4UBUFT " JT NFEJBO BHF UJPOBM JOEFQFOEFODJFT QBJST PG WBSJBCMFT UIBU BSF OPU BTTPDJBUFE PODF XF DPOEJUJPO PO TPNF TFU PG PUIFS WBSJBCMFT #Z MJTUJOH UIFTF JNQMJFE DPOEJUJPOBM JOEFQFOEFODJFT BOE BTTFTTJOH FBDI XF DBO BU MFBTU UFTU TPNF PG UIF GFBUVSFT PG B HSBQI :PV DBO ĕOE DPOEJUJPOBM JOEFQFOEFODJFT VTJOH UIF TBNF QBUI MPHJD ZPV MFBSOFE GPS ĕOE JOH BOE DMPTJOH CBDLEPPST :PV KVTU IBWF UP GPDVT PO B QBJS PG WBSJBCMFT ĕOE BMM QBUIT DPO OFDUJOH UIFN BOE ĕHVSF PVU JG UIFSF JT BOZ TFU PG WBSJBCMFT ZPV DPVME DPOEJUJPO PO UP DMPTF UIFN BMM *O B MBSHF HSBQI UIJT JT RVJUF B DIPSF CFDBVTF UIFSF BSF NBOZ QBJST PG WBSJBCMFT BOE QPTTJCMZ NBOZ QBUIT #VU ZPVS DPNQVUFS JT HPPE BU TVDI DIPSFT *O UIJT DBTF UIFSF BSF UISFF JNQMJFE DPOEJUJPOBM JOEFQFOEFODJFT 3 DPEF  $(+'$ *)$/$*)' ) + ) )$ .ǿ "ǾǓǡǏ Ȁ  ǾȆȆǾ  Ȇ   ǾȆȆǾ  Ȇ Ǣ Ǣ   ǾȆȆǾ  Ȇ  (1) A and W independent, conditioning on S (2) D and S independent, conditioning on A, M, & W (3) M and W independent, conditioning on S
  44. Causal inference hard but possible • Demonstrate capable of inferring

    cause • Experiments not required! • Experiments not always practical & ethical • Disease, evolution, development, dynamics of popular music, global climate, war • Experiments must choose an intervention • Interventions influence many variables at once • Experimentally manipulate obesity?
  45. More than the Back Door • Closing back doors is

    not the only option • Front-door criterion • Instrumental variables E FEVDBUJPO XF FYQFDU UIF JOGFSFODF UP CF CJBTFE CZ GBDUPST UIBU JOĘVFODF EVDBUJPO 'PS FYBNQMF JOEVTUSJPVT QFPQMF NBZ CPUI DPNQMFUF NPSF FEV JHIFS XBHFT HFOFSBUJOH B DPSSFMBUJPO CFUXFFO FEVDBUJPO BOE XBHFT #VU TBSJMZ NFBO UIBU FEVDBUJPO DBVTFT IJHIFS XBHFT V DBOU NFBTVSF UIF DPNNPO DPOGPVOET JU NJHIU CF QPTTJCMF UP HFU B HPPE JOĘVFODF PG FEVDBUJPO PO XBHFT 8IBU JT OFFEFE JT BO ĶĻŀŁĿłĺIJĻŁĮĹ KVTU BO PQBRVF UFSN GPS B EJSFDU JOĘVFODF PO FEVDBUJPO UIBU DBOOPU EJSFDUMZ VDI B WBSJBCMF NBLFT FEVDBUJPO JOUP B DPMMJEFS PG JUTFMG BOE UIF VONFBTVSFE ODF FEVDBUJPO JT B DPMMJEFS UIBU NFBOT MFBSOJOH BCPVU POF PG UIF QBUIT JOUP BUJPO BCPVU UIF PUIFS LF B MJHIU TXJUDI ćF MJHIU CFJOH PO EFQFOET VQPO CPUI UIF TXJUDI CFJOH XPSLJOH MJHIU CVMC -JHIU JT B DPMMJEFS PG UIF TXJUDI BOE UIF CVMC *G XF TFF Č BOE UIFO MFBSO UIBU UIF TXJUDI JT PO XF DBO JOGFS UIBU UIF CVMC NVTU CF O ćJT JT UIF TFOTF JO XIJDI B DPMMJEFS DBO CF VTFGVM 0ODF XF MFBSO POF PG OGPSNBUJPO BCPVU UIF PUIFS %"( CFMPX ćF DFOUSBM QSPCMFN JT UIBU FEVDBUJPO & BOE XBHFT 8 BSF FODFE CZ BO VOPCTFSWFE DBVTF 6 E Q U W U X Y Z
  46. Directed Acyclic Gaffes • Don’t get cocky • DAGs are

    small world constructs • Residual confounding: • Misclassification • Measurement error • Missingness • DAGs can accommodate these problems, but maybe tell us there are no solutions • Eventually need *real* models of the system
  47. Moving forward • Homework: DAG practice • Next week, Chapter

    7 • Sailing between (1) the whirlpool of underfitting (2) the many-headed monster of overfitting