Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Statistical Rethinking Fall 2017 Lecture 06

Statistical Rethinking Fall 2017 Lecture 06

Week 3, Lecture 6, Statistical Rethinking: A Bayesian Course with Examples in R and Stan. This lecture covers Chapter 5 of the book.

Richard McElreath

November 10, 2017
Tweet

More Decks by Richard McElreath

Other Decks in Education

Transcript

  1. Posterior predictions • Goal: Compute implied predictions for observed cases

    • Check model fit — golems do make mistakes • Find model failures, stimulate new ideas • Always average over the posterior distribution • Using only MAP leads to overconfidence • Embrace the uncertainty
  2. Figure 5.6 Distribution of residuals for each State • negative

    residual: less divorce than expected • positive residual: more divorce than expected (c) 6 8 10 12 6 8 10 12 Observed divorce Predicted divorce ID UT ID NJ MN ND CT UT NE SC WI PA NY CA FL MT IL WY MO VA HI TX MI DE DC NC OH IA KS MD MA WA NM WV VT OR SD AZ TN NH IN MS LA RI CO OK GA KY AK AL AR ME -6 -4 -2 0 2 4 0 10 20 30 40 -4 -2 0 2 4 Waffles per capita Divorce error AL AR GA ID ME MS SC
  3. Figure 5.6 (c) 6 8 10 12 6 8 10

    12 Observed divorce Predicted divorce ID UT ID NJ MN ND CT UT NE SC WI PA NY CA FL MT IL WY MO VA HI TX MI DE DC NC OH IA KS MD MA WA NM WV VT OR SD AZ TN NH IN MS LA RI CO OK GA KY AK AL AR ME -6 -4 -2 0 2 4 0 10 20 30 40 -4 -2 0 2 4 Waffles per capita Divorce error AL AR GA ID ME MS SC (c) 6 8 10 12 Observed divorce ID NJ MN ND CT UT NE SC WI PA NY CA FL MT IL WY MO VA HI TX MI DE DC NC OH IA KS MD MA WA NM WV VT -6 -4 -2 0 2 4 0 10 20 30 40 -4 -2 0 2 4 Waffles per capita Divorce error AL AR GA ID ME MS SC 'ĶĴłĿIJ ƍƏ 1PTUFSJPS QSFEJDUJWF QMPUT GPS UIF NVMUJWBSJBUF EJWPSDF NPEFM (ǀǏƾ B 1SFEJDUFE EJWPSDF SBUF BHBJOTU PCTFSWFE XJUI  DPOĕEFODF JO UFSWBMT PG UIF BWFSBHF QSFEJDUJPO ćF EBTIFE MJOF TIPXT QFSGFDU QSFEJDUJPO
  4. y Synthetic spurious association • Thinking generatively very useful •

    Simulate spurious association to better understand how simple it is  ."4,&% 3&-"5*0/4)*1  3 DPEF   ʆǦ ƾƽƽ ȅ +2*"/ ,# 0"0 5Ǯ/") ʆǦ /+,/*ǯ  ǰ ȅ 5Ǯ/") 0 200&+ 4&1% *"+ ƽ +! 01!!"3 ƾ 5Ǯ0-2/ ʆǦ /+,/*ǯ  ǒ 5Ǯ/") ǰ ȅ 5Ǯ0-2/ 0 200&+ 4&1% *"+ʅ5Ǯ/") 6 ʆǦ /+,/*ǯ  ǒ 5Ǯ/") ǰ ȅ 6 0 200&+ 4&1% *"+ʅ5Ǯ/") ! ʆǦ !1Ǒ#/*"ǯ6ǒ5Ǯ/")ǒ5Ǯ0-2/ǰ ȅ &+! )) 1,$"1%"/ &+ !1 #/*" /PX UIF EBUB GSBNF ! IBT  TJNVMBUFE DBTFT #FDBVTF 5Ǯ/") JOĘVFODFT CPUI 6 BOE 5Ǯ0-2/ ZPV DBO UIJOL PG 5Ǯ0-2/ BT BOPUIFS PVUDPNF PG 5Ǯ/") CVU POF XIJDI XF NJTUBLF BT B QPUFOUJBM QSFEJDUPS PG 6 "T B SFTVMU CPUI YSFBM BOE YTQVS BSF DPSSFMBUFE XJUI Z :PV DBO TFF UIJT JO UIF TDBUUFSQMPUT GSPN -&/0ǯ!ǰ #VU XIFO ZPV JODMVEF CPUI Y WBSJBCMFT JO B MJOFBS SFHSFTTJPO QSFEJDUJOH Z UIF QPTUFSJPS NFBO GPS UIF BTTPDJBUJPO CFUXFFO Z BOE YTQVS XJMM CF DMPTF UP [FSP XIJMF UIF DPNQBSBCMF NFBO GPS YSFBM XJMM CF DMPTFS UP   .BTLFE SFMBUJPOTIJQ x_real x_spur
  5. Masked association • Sometimes association between outcome and predictor masked

    by another variable • Need both variables to see influence of either • Tends to arise when • Another predictor associated with outcome in opposite direction • Both predictors associated with one another • Noise in predictors can also mask association
  6. Eulemur fulvus 0.49 kcal/g 55% neocortex Homo sapiens 0.71 kcal/g

    75% neocortex Cebus apella 0.89 kcal/g 68% neocortex Milk and Brain
  7. Masked influence • Primate milk data kcal.per.g -2 0 2

    4 0.5 0.7 0.9 -2 0 2 4 log(mass) 0.5 0.7 0.9 55 65 75 55 65 75 neocortex.perc library(rethinking) data(milk) d <- milk pairs(~kcal.per.g+log(mass) +neocortex.perc , data=d)
  8. Masked influence • Primate milk data kcal.per.g -2 0 2

    4 0.5 0.7 0.9 -2 0 2 4 log(mass) 0.5 0.7 0.9 55 65 75 55 65 75 neocortex.perc library(rethinking) data(milk) d <- milk pairs(~kcal.per.g+log(mass) +neocortex.perc , data=d)
  9. Masked influence • Primate milk data kcal.per.g -2 0 2

    4 0.5 0.7 0.9 -2 0 2 4 log(mass) 0.5 0.7 0.9 55 65 75 55 65 75 neocortex.perc library(rethinking) data(milk) d <- milk pairs(~kcal.per.g+log(mass) +neocortex.perc , data=d)
  10. Complete cases • Missing values in primate milk data •

    Drop cases (species) with missing values • Much later, see how to “impute” missing values, so can use all the data [1] 55.16 NA NA NA NA 64.54 64.54 67.64 NA 68.85 58.85 61.69 [13] 60.32 NA NA 69.97 NA 70.41 NA 73.40 NA 67.53 NA 71.26 [25] 72.60 NA 70.24 76.30 75.49 %" 01/1 3)2"0 #,/ 1%" -/*"1"/0 4"/" &+3)&!Ǒ %&0 ,2)! " 20"! 6 *&00&+$ 3)2"0 ǯǰ &+ 1%" !1 ,/ 6 01/1 3)2"0 ,210&!" 1%" -/*"1"/ ,+01/&+10Ǒ # 1%"/" /" +,  3)2"0 &+ 1%" !1ǒ 1/6 20&+$ "5-)& &1 01 3)2"0Ǒ 8IBU IBT HPOF XSPOH IFSF ćJT QBSUJDVMBS FSSPS NFTTBHF NFBOT UIBU UIF NPEFM EJEOU SF B WBMJE QPTUFSJPS QSPCBCJMJUZ GPS FWFO UIF TUBSUJOH QBSBNFUFS WBMVFT *O UIJT DBTF UIF DV JT UIF NJTTJOH WBMVFT JO UIF +", ,/1"5Ǒ-"/ DPMVNO 5BLF B MPPL JOTJEF UIBU DPMVNO TFF GPS ZPVSTFMG 3 DPEF  !ɢ+", ,/1"5Ǒ-"/ &BDI  JO UIF PVUQVU JT B NJTTJOH WBMVF *G ZPV QBTT B WFDUPS MJLF UIJT UP B MJLFMJIPPE GVOD MJLF !+,/* JU EPFTOU LOPX XIBU UP EP "ęFS BMM XIBUT UIF QSPCBCJMJUZ PG B NJTTJOH W 8IBUFWFS UIF BOTXFS JU JTOU B OVNCFS BOE TP !+,/* SFUVSOT B  6OBCMF UP FWFO HFU TUB *- PS SBUIFS ,-1&* XIJDI EPFT UIF SFBM XPSL HJWFT VQ BOE CBSLT BCPVU TPNF XFJSE U DBMMFE 3**&+ OPU CFJOH ĕOJUF ćJT LJOE PG PQBRVF FSSPS NFTTBHF JT VOGPSUVOBUFMZ UIF O JO 3 ćJT JT FBTZ UP ĕY UIPVHI 8IBU ZPV OFFE UP EP IFSF JT NBOVBMMZ ESPQ BMM UIF DBTFT NJTTJOH WBMVFT .PSF BVUPNBUFE CMBDLCPY DPNNBOET MJLF )* BOE $)* XJMM ESPQ TVDI D GPS ZPV #VU UIJT JTOU BMXBZT B HPPE UIJOH JG ZPV BSFOU BXBSF PG JU *O UIF OFYU DIBQUFS Z TFF POF SFBTPO XIZ 4P JOEVMHF NF GPS OPX *UT XPSUI MFBSOJOH IPX UP EP UIJT ZPVSTFM NBLF B OFX EBUB GSBNF XJUI POMZ DPNQMFUF DBTFT JO JU KVTU VTF 3 DPEF  ! ʆǦ !DZ ,*-)"1"Ǒ 0"0ǯ!ǰ ǒ Dz ćJT NBLFT B OFX EBUB GSBNF ! UIBU DPOTJTUT PG UIF  SPXT GSPN ! UIBU IBWF WBMVFT JT UIF NJTTJOH WBMVFT JO UIF +", ,/1"5Ǒ-"/ DPMVNO 5BLF B MPPL JOTJEF UIBU DP TFF GPS ZPVSTFMG 3 DPEF  !ɢ+", ,/1"5Ǒ-"/ &BDI  JO UIF PVUQVU JT B NJTTJOH WBMVF *G ZPV QBTT B WFDUPS MJLF UIJT UP B MJLFMJIPPE MJLF !+,/* JU EPFTOU LOPX XIBU UP EP "ęFS BMM XIBUT UIF QSPCBCJMJUZ PG B NJTT 8IBUFWFS UIF BOTXFS JU JTOU B OVNCFS BOE TP !+,/* SFUVSOT B  6OBCMF UP FWFO H *- PS SBUIFS ,-1&* XIJDI EPFT UIF SFBM XPSL HJWFT VQ BOE CBSLT BCPVU TPNF X DBMMFE 3**&+ OPU CFJOH ĕOJUF ćJT LJOE PG PQBRVF FSSPS NFTTBHF JT VOGPSUVOBUFMZ JO 3 ćJT JT FBTZ UP ĕY UIPVHI 8IBU ZPV OFFE UP EP IFSF JT NBOVBMMZ ESPQ BMM UIF D NJTTJOH WBMVFT .PSF BVUPNBUFE CMBDLCPY DPNNBOET MJLF )* BOE $)* XJMM ESPQ T GPS ZPV #VU UIJT JTOU BMXBZT B HPPE UIJOH JG ZPV BSFOU BXBSF PG JU *O UIF OFYU DIBQ TFF POF SFBTPO XIZ 4P JOEVMHF NF GPS OPX *UT XPSUI MFBSOJOH IPX UP EP UIJT ZP NBLF B OFX EBUB GSBNF XJUI POMZ DPNQMFUF DBTFT JO JU KVTU VTF 3 DPEF  ! ʆǦ !DZ ,*-)"1"Ǒ 0"0ǯ!ǰ ǒ Dz ćJT NBLFT B OFX EBUB GSBNF ! UIBU DPOTJTUT PG UIF  SPXT GSPN ! UIBU IBWF WB DPMVNOT /PX MFUT XPSL XJUI UIF OFX EBUB GSBNF "MM UIBU JT OFX JO UIF DPEF JT JOTUFBE PG ! 3 DPEF  *ǂǑǂ ʆǦ *-ǯ )&01ǯ ( )Ǒ-"/Ǒ$ ʍ !+,/*ǯ *2 ǒ 0&$* ǰ ǒ
  11. Bivariate models Figure 5.7 kcal.per.g ~ dnorm(mu,sigma), mu <- a

    + bp*neocortex.perc kcal.per.g ~ dnorm(mu,sigma), mu <- a + bm*log(mass)   .6-5*7"3*"5& -*/&"3 .0%&-4 55 60 65 70 75 0.5 0.6 0.7 0.8 0.9 neocortex.perc kcal.per.g -2 -1 0 1 2 3 4 0.5 0.6 0.7 0.8 0.9 log.mass kcal.per.g 6 0.7 0.8 0.9 kcal.per.g 6 0.7 0.8 0.9 kcal.per.g
  12. Multivariate model OE UIBU MPHNBTT JT OFHBUJWFMZ DPSSFMBUFE XJUI LJMPDBMPSJFT

    ćJT JOĘVFODF EPFT HFS UIBO UIBU PG OFPDPSUFY QFSDFOU BMUIPVHI JO UIF PQQPTJUF EJSFDUJPO *U JT RVJUF O UIPVHI XJUI B XJEF DPOĕEFODF JOUFSWBM UIBU JT DPOTJTUFOU XJUI B XJEF SBOHF PG BOE TUSPOHFS SFMBUJPOTIJQT ćJT SFHSFTTJPO JT TIPXO JO UIF VQQFSSJHIU PG 'ĶĴłĿIJ /PX MFUT TFF XIBU IBQQFOT XIFO XF BEE CPUI QSFEJDUPS WBSJBCMFT BU UIF TBNF UJNF U TTJPO ćJT JT UIF NVMUJWBSJBUF NPEFM JO NBUI GPSN LJ ∼ /PSNBM(µJ, σ) µJ = α + βO OJ + βN MPH(NJ) α ∼ /PSNBM(, ) βO ∼ /PSNBM(, ) βN ∼ /PSNBM(, ) σ ∼ 6OJGPSN(, ) F L JT &'Ǐ+ -Ǐ" O JT ) **-/ 3Ǐ+ - BOE N JT (.. 'JUUJOH UIF KPJOU NPEFM J VE FYQFDU CZ OPX ʄǤ (+ǭ '$./ǭ
  13. Multivariate model O βN ∼ /PSNBM(, ) σ ∼ 6OJGPSN(,

    ) "CPWF L JT ( )Ǒ-"/Ǒ$ O JT +", ,/1"5Ǒ-"/ BOE N JT *00 'JUUJOH UIF KPJOU NPEFM JT KVTU BT ZPVE FYQFDU CZ OPX 3 DPEF  *ǂǑDŽ ʆǦ *-ǯ )&01ǯ ( )Ǒ-"/Ǒ$ ʍ !+,/*ǯ *2 ǒ 0&$* ǰ ǒ *2 ʆǦ  ʀ +ǹ+", ,/1"5Ǒ-"/ ʀ *ǹ),$Ǒ*00 ǒ  ʍ !+,/*ǯ ƽ ǒ ƾƽƽ ǰ ǒ + ʍ !+,/*ǯ ƽ ǒ ƾ ǰ ǒ * ʍ !+,/*ǯ ƽ ǒ ƾ ǰ ǒ 0&$* ʍ !2+&#ǯ ƽ ǒ ƾ ǰ ǰ ǒ   .6-5*7"3*"5& -*/&"3 .0%&-4 !1ʅ! ǰ -/" &0ǯ*ǂǑDŽǰ "+ 1!"3 ǂǑǂɵ džǁǑǂɵ  ǦƾǑƽdž ƽǑǁDŽ ǦƾǑDžǀ ǦƽǑǀǁ + ƽǑƽǀ ƽǑƽƾ ƽǑƽƿ ƽǑƽǁ * ǦƽǑƾƽ ƽǑƽƿ ǦƽǑƾǀ ǦƽǑƽǃ 0&$* ƽǑƾƾ ƽǑƽƿ ƽǑƽDž ƽǑƾǂ #Z JODPSQPSBUJOH CPUI QSFEJDUPS WBSJBCMFT JO UIF SFHSFTTJPO UIF FTUJNBUFE BTTPDJBUJPO PG CPUI XJUI UIF PVUDPNF IBT JODSFBTFE ćF QPTUFSJPS NFBO GPS UIF BTTPDJBUJPO PG OFPDPSUFY QFSDFOU IBTJODSFBTFE NPSFUIBO TJYGPME BOEJUT  JOUFSWBM JT OPXFOUJSFMZ BCPWF[FSP ćF QPTUFSJPS NFBO GPS MPH CPEZ NBTT JT NPSF TUSPOHMZ OFHBUJWF -FUT QMPU UIF JOUFSWBMT GPS UIF QSFEJDUFE NFBO LJMPDBMPSJFT GPS UIJT OFX NPEFM )FSFT
  14. Figure 5.7 Bivariate Multivariate 55 60 65 70 75 0.5

    0.6 0.7 0.8 0.9 neocortex.perc kcal.per.g -2 -1 0 1 2 3 4 0.5 0.6 0.7 0.8 0.9 log.mass kcal.per.g 55 60 65 70 75 0.5 0.6 0.7 0.8 0.9 neocortex.perc kcal.per.g -2 -1 0 1 2 3 4 0.5 0.6 0.7 0.8 0.9 log.mass kcal.per.g 'ĶĴłĿIJ ƍƐ .JML FOFSHZ BOE OFPDPSUFY BNPOH QSJNBUFT *O UIF UPQ UXP QMPUT TJNQMF CJWBSJBUF SFHSFTTJPOT PG LJMPDBMPSJFT QFS HSBN PG NJML PO MFę OFPDPSUFY QFSDFOU BOE SJHIU MPH GFNBMF CPEZ NBTT TIPX XFBL BOE VODFS 55 60 65 70 75 0.5 0.6 0.7 0.8 0.9 neocortex.perc kcal.per.g -2 -1 0 1 2 3 4 0.5 0.6 0.7 0.8 0.9 log.mass kcal.per.g 55 60 65 70 75 0.5 0.6 0.7 0.8 0.9 neocortex.perc kcal.per.g -2 -1 0 1 2 3 4 0.5 0.6 0.7 0.8 0.9 log.mass kcal.per.g 'ĶĴłĿIJ ƍƐ .JML FOFSHZ BOE OFPDPSUFY BNPOH QSJNBUFT *O UIF UPQ UXP QMPUT TJNQMF CJWBSJBUF SFHSFTTJPOT PG LJMPDBMPSJFT QFS HSBN PG NJML PO MFę
  15. Figure 5.7   .6-5*7"3*"5& -*/&"3 .0%&-4 -FUT QMPU UIF

    JOUFSWBMT GPS UIF QSFEJDUFE NFBO LJMPDBMPSJFT GPS UIJT OFX NPEFM )FSFT UIF DPEF GPS UIF SFMBUJPOTIJQ CFUXFFO LJMPDBMPSJFT BOE OFPDPSUFY QFSDFOU ćFTF BSF DPVO UFSGBDUVBM QMPUT TP XFMM VTF UIF NFBO MPH CPEZ NBTT JO UIJT DBMDVMBUJPO TIPXJOH POMZ IPX QSFEJDUFE FOFSHZ WBSJFT BT B GVODUJPO PG OFPDPSUFY QFSDFOU ( )Ǐ'*"Ǐ(.. ʄǤ ( )ǭ '*"ǭɠ(..Ǯ Ǯ )+Ǐ. , ʄǤ ƻǑƼƻƻ +- Ǐ/ ʄǤ '$./ǭ ) **-/ 3Ǐ+ -ʃ)+Ǐ. ,ǐ '*"Ǐ(..ʃ( )Ǐ'*"Ǐ(.. Ǯ (0 ʄǤ '$)&ǭ (ǀǏǂ ǐ /ʃ+- Ǐ/ Ǯ (0Ǐ( ) ʄǤ ++'4ǭ (0 ǐ ƽ ǐ ( ) Ǯ (0Ǐ ʄǤ ++'4ǭ (0 ǐ ƽ ǐ  Ǯ +'*/ǭ &'Ǐ+ -Ǐ" ʋ ) **-/ 3Ǐ+ - ǐ /ʃ ǐ /4+ ʃǙ)Ǚ Ǯ '$) .ǭ )+Ǐ. , ǐ (0Ǐ( ) Ǯ '$) .ǭ )+Ǐ. , ǐ (0Ǐ ǯƼǐǰ ǐ '/4ʃƽ Ǯ '$) .ǭ )+Ǐ. , ǐ (0Ǐ ǯƽǐǰ ǐ '/4ʃƽ Ǯ ćJT QMPU JT EJTQMBZFE JO UIF MPXFSMFę PG 'ĶĴłĿIJ ƍƐ ćF BOBMPHPVT QMPU GPS '*"ǭ(..Ǯ JT TIPXO JO UIF MPXFSSJHIU * MFBWF JU UP UIF SFBEFS UP NPEJGZ UIF DPEF BCPWF UP SFQMJDBUF UIF QMPU JO UIF MPXFSSJHIU 8IZ EJE BEEJOH OFPDPSUFY BOE CPEZ NBTT UP UIF TBNF NPEFM MFBE UP MBSHFS FTUJNBUFE FČFDUT PG CPUI ćJT JT B DPOUFYU JO XIJDI UIFSF BSF UXP WBSJBCMFT DPSSFMBUFE XJUI UIF PVU DPNF CVU POF JT QPTJUJWFMZ DPSSFMBUFE XJUI JU BOE UIF PUIFS JT OFHBUJWFMZ DPSSFMBUFE XJUI JU 55 60 65 70 75 0.5 0.6 neocortex.perc kc 55 60 65 70 75 0.5 0.6 0.7 0.8 0.9 neocortex.perc kcal.per.g 'ĶĴłĿIJ ƍƐ .JML FOFSHZ BOE OFPDPS QMPUT TJNQMF CJWBSJBUF SFHSFTTJPOT PG L OFPDPSUFY QFSDFOU BOE SJHIU MPH GFN UBJO BTTPDJBUJPOT )PXFWFS PO UIF C OFPDPSUFY QFSDFOU BOE MPH CPEZ NBTT WBSJBCMFT #PUI OFPDPSUFY BOE CPEZ N
  16. Synthetic masked association 0WFSUIJOLJOH 4JNVMBUJOH B NBTLJOH SFMBUJPOTIJQ +VTU BT

    XJUI VOEFSTUBOEJOH TQVSJPVT BTTPDJBUJPO QBHF  JU NBZ IFMQ UP TJNVMBUF EBUB JO XIJDI UXP NFBOJOHGVM QSFEJDUPST BDU UP NBTL POF BOPUIFS 4VQQPTF BHBJO B TJOHMF PVUDPNF Z BOE UXP QSFEJDUPST YQPT BOE YOFH  ćF QSFEJDUPS YQPT JT QPTJUJWFMZ BTTPDJBUFE XJUI Z XIJMF YOFH JT OFHBUJWFMZ BTTPDJBUFE XJUI Z 'VSUIFSNPSF UIF UXP QSFEJDUPST BSF QPTJUJWFMZ DPSSFMBUFE XJUI POF BOPUIFS )FSFT DPEF UP QSPEVDF EBUB NFFUJOH UIFTF DSJUFSJB 3 DPEF   ʆǦ ƾƽƽ ȅ +2*"/ ,# 0"0 /%, ʆǦ ƽǑDŽ ȅ ,//")1&,+ 14 5Ǯ-,0 +! 5Ǯ+"$ 5Ǯ-,0 ʆǦ /+,/*ǯ  ǰ ȅ 5Ǯ-,0 0 200&+ 5Ǯ+"$ ʆǦ /+,/*ǯ  ǒ /%,ǹ5Ǯ-,0 ǒ ȅ 5Ǯ+"$ ,//")1"! 4&1% 5Ǯ-,0 0./1ǯƾǦ/%,ʋƿǰ ǰ 6 ʆǦ /+,/*ǯ  ǒ 5Ǯ-,0 Ǧ 5Ǯ+"$ ǰ ȅ 6 ".2))6 00, &1"! 4&1% 5Ǯ-,0ǒ 5Ǯ+"$ ! ʆǦ !1Ǒ#/*"ǯ6ǒ5Ǯ-,0ǒ5Ǯ+"$ǰ ȅ &+! )) 1,$"1%"/ &+ !1 #/*" ćF ĕSTU UIJOH UP EP JT FOUFS -&/0ǯ!ǰ UP TFF UIF CJWBSJBUF SFMBUJPOTIJQT BNPOH UIF WBSJBCMFT /PX JG ZPV ĕU UXP CJWBSJBUF SFHSFTTJPOT QSFEJDUJOH 6 VTJOH FJUIFS 5Ǯ-,0 PS 5Ǯ+"$ ZPV HFU QPTUFSJPS EJTUSJ CVUJPOT UIBU VOEFSFTUJNBUF UIF USVF BTTPDJBUJPO XIJDI TIPVME CF BCPVU  PS − SFTQFDUJWFMZ  #VU JG ZPV UIFO ĕU B NPEFM QSFEJDUJOH 6 VTJOH CPUI QSFEJDUPST ZPVMM HFU B QPTUFSJPS EJTUSJCVUJPO UIBU CFUUFS NBUDIFT UIF VOEFSMZJOH USVUI *G ZPV NPWF UIF WBMVF PG /%, DMPTFS UP [FSP UIJT NBTLJOH QIFOPNFOPO XJMM EJNJOJTI *G ZPV NBLF /%, DMPTFS UP  PS − JU XJMM NBHOJGZ #VU JG /%, HFUT WFSZ DMPTF UP  PS − UIFO UIF UXP QSFEJDUPST DPOUBJO FYBDUMZ UIF TBNF JOGPSNBUJPO BOE UIFSFT OP IPQF GPS BOZ TUBUJTUJDBM NPEFM UP UFBTF PVU UIF USVF VOEFSMZJOH BTTPDJBUJPO VTFE JO UIF TJNVMBUJPO 8IZ TIPVME UXP QSFEJDUPST CF DPSSFMBUFE JO UIJT XBZ ćFZ NJHIU CPUI CF JOĘVFODFE CZ BOPUIFS VONFBTVSFE WBSJBCMF 0S POF PG UIFN TBZ YOFH JT JOĘVFODFE QBSUMZ CZ YQPT CVU BMTP CZ JUT PXO VOJRVF QSPDFTTFT ćFZ NJHIU CPUI QBSUMZ JOĘVFODF POF BOPUIFS JO B DBTF PG SFDJQSPDBM DBVTBUJPO *O UIF QSJ NBUF NJML FYBNQMF JU NBZ CF UIBU UIF QPTJUJWF BTTPDJBUJPO CFUXFFO MBSHF CPEZ TJ[F BOE OFPDPSUFY QFSDFOU BSJTFT GSPN B USBEFPČ CFUXFFO MJGFTQBO BOE MFBSOJOH -BSHF BOJNBMT UFOE UP MJWF B MPOH UJNF "OE JO TVDI BOJNBMT BO JOWFTUNFOU JO MFBSOJOH NBZ CF B CFUUFS JOWFTUNFOU CFDBVTF MFBSOJOH DBO CF BNPSUJ[FE PWFS B MPOHFS MJGFTQBO #PUI MBSHF CPEZ TJ[F BOE MBSHF OFPDPSUFY UIFO JOĘVFODF NJML y x_pos x_neg + – +
  17. Regression as a wicked oracle • Regression automatically focuses on

    the most informative cases • Cases that don’t help are automatically ignored • But not kind — ask carefully
  18. Why not just add everything? • Could just add all

    available predictors to model • Almost always a bad idea • Multicollinearity • Confounding colliders • Loss of interpretability • Loss of precision • Overfitting
  19. Multicollinear legs height leg_left leg_right 1 15.384202 7.115039 7.139183 2

    12.176479 5.718942 5.729024 3 9.634356 4.278725 4.275795 4 7.671892 3.158348 3.166970 5 8.592127 3.518352 3.543422 6 7.747036 3.397380 3.384179 7 9.623175 4.601825 4.603800 8 7.735412 3.852066 3.848137 9 12.083202 5.502614 5.521156 10 11.080817 4.847354 4.790418 11 11.631615 5.017371 4.996615 12 6.477359 3.023023 3.036469 13 8.870094 3.708882 3.764201 14 12.703396 6.073339 6.076483 15 11.416840 5.444431 5.441192 16 10.758823 5.286965 5.297677 17 11.464688 5.596979 5.604316 18 9.747457 4.003333 4.012955 19 12.211823 6.092597 6.100131 20 12.671249 6.184386 6.193254
  20. IFJHIU   ćJT JT /. ≈ . /PX MFUT

    TFF XIBU IBQQFOT JOTUFBE 3 DPEF  *ǂǑDž ʆǦ *-ǯ )&01ǯ %"&$%1 ʍ !+,/*ǯ *2 ǒ 0&$* ǰ ǒ *2 ʆǦ  ʀ )ǹ)"$Ǯ)"#1 ʀ /ǹ)"$Ǯ/&$%1 ǒ  ʍ !+,/*ǯ ƾƽ ǒ ƾƽƽ ǰ ǒ ) ʍ !+,/*ǯ ƿ ǒ ƾƽ ǰ ǒ / ʍ !+,/*ǯ ƿ ǒ ƾƽ ǰ ǒ 0&$* ʍ !2+&#ǯ ƽ ǒ ƾƽ ǰ ǰ ǒ !1ʅ! ǰ -/" &0ǯ*ǂǑDžǰ "+ 1!"3 ǂǑǂɵ džǁǑǂɵ  ƽǑDŽƽ ƽǑǀƾ ƽǑƿƽ ƾǑƿƽ  8)&/ "%%*/( 7"3*"#-&4 )6354  ) ǦƽǑǁǀ ƿǑƾDž ǦǀǑdžƿ ǀǑƽǃ / ƿǑǁDž ƿǑƾdž ǦƾǑƽƾ ǂǑdžDž 0&$* ƽǑǃƿ ƽǑƽǁ ƽǑǂǂ ƽǑǃdž ćPTF QPTUFSJPS NFBOT BOE TUBOEBSE EFWJBUJPOT MPPL DSB[Z ćJT JT B DBTF JO XIJDI B HSBQIJDBM WJFX PG UIF -/" &0 PVUQVU JT NPSF VTFGVM CFDBVTF JU EJTQMBZT UIF QPTUFSJPS NFBOT BOE  JOUFSWBMT JO B XBZ UIBU BMMPXT VT XJUI B HMBODF UP TFF UIBU TPNFUIJOH IBT HPOF XSPOH IFSF 3 DPEF  -),1ǯ-/" &0ǯ*ǂǑDžǰǰ sigma br bl a  8)&/ "%%*/( 7"3*"#-&4 )6354  ǦƽǑǁǀ ƿǑƾDž ǦǀǑdžƿ ǀǑƽǃ ƿǑǁDž ƿǑƾdž ǦƾǑƽƾ ǂǑdžDž $* ƽǑǃƿ ƽǑƽǁ ƽǑǂǂ ƽǑǃdž ćPTF QPTUFSJPS NFBOT BOE TUBOEBSE EFWJBUJPOT MPPL DSB[Z ćJT JT B DBTF JO XIJDI B HSBQIJDBM FX PG UIF -/" &0 PVUQVU JT NPSF VTFGVM CFDBVTF JU EJTQMBZT UIF QPTUFSJPS NFBOT BOE  UFSWBMT JO B XBZ UIBU BMMPXT VT XJUI B HMBODF UP TFF UIBU TPNFUIJOH IBT HPOF XSPOH IFSF 3  ,1ǯ-/" &0ǯ*ǂǑDžǰǰ sigma br bl a -4 -2 0 2 4 6 Value
  21. Multicollinear legs • Q: What is value of learning left/right

    leg, once we already know right/left leg? • A: Almost nothing, on average.   .6-5*7"3*"5& -*/&"3 .0%&-4 1.8 1.9 2.0 2.1 2.2 2.3 0 1 2 3 4 5 6 sum of bl and br Density 'ĶĴłĿIJ ƍƑ -Fę 1PTUFSJPS EJTUSJCVUJPO PG UIF BTTPDJBUJPO PG FBDI MFH XJUI Figure 5.8
  22. Multicollinear legs SJBCMFT DPOUBJO BMNPTU FYBDUMZ UIF TBNF JOGPSNBUJPO JG

    ZPV JOTJTU PO EFM UIFO UIFSF XJMM CF B QSBDUJDBMMZ JOĕOJUF OVNCFS PG DPNCJOBUJPOT VDF UIF TBNF QSFEJDUJPOT G UIJT QIFOPNFOPO JT UIBU ZPV IBWF BQQSPYJNBUFE UIJT MJLFMJIPPE ZJ ∼ /PSNBM(µJ, σ) µJ = α + β YJ + β YJ PNF MJLF IFJHIU JO UIF FYBNQMF BOE Y JT B TJOHMF QSFEJDUPS MJLF UIF MFH )FSF Y JT VTFE UXJDF XIJDI JT B QFSGFDU FYBNQMF PG UIF QSPCMFN DBVTFE OUJDBM MFH MFOHUIT 'SPN UIF DPNQVUFST QFSTQFDUJWF UIJT MJLFMJIPPE ZJ ∼ /PSNBM(µJ, σ) µJ = α + (β + β)YJ PVU PG FBDI UFSN ćF QBSBNFUFST β BOE β DBOOPU CF QVMMFE BQBSU SBUFMZ JOĘVFODF UIF NFBO µ 0OMZ UIFJS TVN β+β JOĘVFODFT µ 4P WBSJBCMF Z JT UIF PVUDPNF MJLF IFJHIU JO UIF FYBNQMF BOE Y JT B TJOHMF UIT JO UIF FYBNQMF )FSF Y JT VTFE UXJDF XIJDI JT B QFSGFDU FYBNQMF P VTJOH UIF BMNPTUJEFOUJDBM MFH MFOHUIT 'SPN UIF DPNQVUFST QFSTQF BMMZ ZJ ∼ /PSNBM(µJ, σ) µJ = α + (β + β)YJ *WF EPOF JT GBDUPS YJ PVU PG FBDI UFSN ćF QBSBNFUFST β BOE β DB BVTF UIFZ OFWFS TFQBSBUFMZ JOĘVFODF UIF NFBO µ 0OMZ UIFJS TVN β+ NFBOT UIF QPTUFSJPS EJTUSJCVUJPO FOET VQ SFQPSUJOH UIF QSBDUJDBMMZ J  BOE β UIBU NBLF UIFJS TVN DMPTF UP UIF BDUVBM BTTPDJBUJPO PG Y XJ   .6-5*7"3*"5& -*/&"3 .0%&-4 1.8 1.9 2.0 2.1 2.2 2.3 0 1 2 3 4 5 6 sum of bl and br Density 'ĶĴłĿIJ ƍƑ -Fę 1PTUFSJPS EJTUSJCVUJPO PG UIF BTTPDJBUJPO PG FBDI MFH XJUI Figure 5.8
  23.   .6-5*7"3*"5& -*/&"3 .0%&-4 1.8 1.9 2.0 2.1 2.2

    2.3 0 1 2 3 4 5 6 sum of bl and br Density 4 6 8 10 12 14 4 6 8 10 12 14 leg_total height d$leg_total <- d$leg_left + d$leg_right plot( height ~ leg_total , d , col=rangi2 ) leg_list <- seq(from=1,to=15,length.out=30) leg_dat <- list(leg_left=leg_list, leg_right=leg_list) mu <- link( m5.8 , data=leg_dat ) mu.mean <- apply( mu , 2 , mean ) mu.PI <- apply( mu , 2 , PI ) lines( leg_list*2 , mu.mean ) shade( mu.PI , leg_list*2 ) Model did what you asked!
  24. Correlated predictors • Multicollinearity: strong correlations among prediction variables 

    .6-5*7"3*"5& .0%&-4 kcal.per.g 10 30 50 0.5 0.7 0.9 10 30 50 perc.fat 0.5 0.7 0.9 30 50 70 30 50 70 perc.lactose 'JHVSF  " QBJST PG UIF UPUBM FOFSHZ DFOU GBU BOE QFSDFOU MB WBSJBCMFT GSPN UIF QSJ NJML EBUB 1FSDFOU GBU QFSDFOU MBDUPTF BSF TUSP OFHBUJWFMZ DPSSFMBUFE POF BOPUIFS QSPWJEJO TFOUJBMMZ UIF TBNF JOGP UJPO
  25. Correlated predictors • perc.fat or perc.lactose alone: Strong association with

    kcal.per.g  .6-5*7"3*"5& .0%&-4  kcal.per.g 10 30 50 0.5 0.7 0.9 10 30 50 perc.fat 30 50 70 perc.lactose 'JHVSF  " QBJST QMPU PG UIF UPUBM FOFSHZ QFS DFOU GBU BOE QFSDFOU MBDUPTF WBSJBCMFT GSPN UIF QSJNBUF NJML EBUB 1FSDFOU GBU BOE QFSDFOU MBDUPTF BSF TUSPOHMZ OFHBUJWFMZ DPSSFMBUFE XJUI POF BOPUIFS QSPWJEJOH FT TFOUJBMMZ UIF TBNF JOGPSNB UJPO  .6-5*7"3*"5& .0%&-4  10 30 50 0.5 0.7 0.9 perc.fat 'JHVSF  " QBJST QMPU PG UIF UPUBM FOFSHZ QFS DFOU GBU BOE QFSDFOU MBDUPTF WBSJBCMFT GSPN UIF QSJNBUF NJML EBUB 1FSDFOU GBU BOE )&01ǯ ( )Ǒ-"/Ǒ$ ʍ !+,/*ǯ *2 ǒ 0&$* ǰ ǒ *2 ʆǦ  ʀ )ǹ-"/ Ǒ) 1,0" ǒ  ʍ !+,/*ǯ ƽǑǃ ǒ ƾƽ ǰ ǒ ) ʍ !+,/*ǯ ƽ ǒ ƾ ǰ ǒ 0&$* ʍ !2+&#ǯ ƽ ǒ ƾƽ ǰ ǰ ǒ !1ʅ! ǰ -/" &0ǯ *ǂǑƾƽ ǒ !&$&10ʅǀ ǰ -/" &0ǯ *ǂǑƾƾ ǒ !&$&10ʅǀ ǰ "+ 1!"3 ǂǑǂɵ džǁǑǂɵ  ƽǑǀƽƾ ƽǑƽǀǃ ƽǑƿǁǁ ƽǑǀǂDž # ƽǑƽƾƽ ƽǑƽƽƾ ƽǑƽƽDž ƽǑƽƾƿ 0&$* ƽǑƽDŽǀ ƽǑƽƾƽ ƽǑƽǂDž ƽǑƽDždž "+ 1!"3 ǂǑǂɵ džǁǑǂɵ  ƾǑƾǃǃ ƽǑƽǁǀ ƾǑƽdžDž ƾǑƿǀǂ ) ǦƽǑƽƾƾ ƽǑƽƽƾ ǦƽǑƽƾƿ ǦƽǑƽƽdž 0&$* ƽǑƽǃƿ ƽǑƽƽDž ƽǑƽǁdž ƽǑƽDŽǂ ćF QPTUFSJPS NFBO GPS # UIF BTTPDJBUJPO PG QFSDFOU GBU XJUI NJM JOUFSWBM [., .] ćF QPTUFSJPS NFBO JO UIF TFDPOE NPEFM G XJUI  JOUFSWBM [−., −.] ćFTF QPTUFSJPS NFBOT BSF F
  26. • Together: Both reduced association? (JWFO UIF TUSPOH BTTPDJBUJPO PG

    FBDI QSFEJDUPS XJUI UIF PVUDPNF XF NJHIU DPODMVEF UIBU CPUI WBSJBCMFT BSF SFMJBCMF QSFEJDUPST PG UPUBM FOFSHZ JO NJML BDSPTT TQFDJFT ćF NPSF GBU UIF NPSF LJMPDBMPSJFT JO UIF NJML ćF NPSF MBDUPTF UIF GFXFS LJMPDBMPSJFT JO NJML #VU XBUDI XIBU IBQQFOT XIFO XF QMBDF CPUI QSFEJDUPS WBSJBCMFT JO UIF TBNF SFHSFTTJPO NPEFM 3 DPEF  *ǂǑƾƿ ʆǦ *-ǯ )&01ǯ 0.5 0.7 0.9 30 50 70 30 perc.lactose ( )Ǒ-"/Ǒ$ ʍ !+,/*ǯ *2 ǒ 0&$* ǰ ǒ *2 ʆǦ  ʀ #ǹ-"/ Ǒ#1 ʀ )ǹ-"/ Ǒ) 1,0" ǒ  ʍ !+,/*ǯ ƽǑǃ ǒ ƾƽ ǰ ǒ # ʍ !+,/*ǯ ƽ ǒ ƾ ǰ ǒ ) ʍ !+,/*ǯ ƽ ǒ ƾ ǰ ǒ 0&$* ʍ !2+&#ǯ ƽ ǒ ƾƽ ǰ ǰ ǒ !1ʅ! ǰ -/" &0ǯ *ǂǑƾƿ ǒ !&$&10ʅǀ ǰ "+ 1!"3 ǂǑǂɵ džǁǑǂɵ  ƾǑƽƽDŽ ƽǑƿƽƽ ƽǑǃDžDž ƾǑǀƿDŽ # ƽǑƽƽƿ ƽǑƽƽƿ ǦƽǑƽƽƿ ƽǑƽƽǃ ) ǦƽǑƽƽdž ƽǑƽƽƿ ǦƽǑƽƾǀ ǦƽǑƽƽǂ 0&$* ƽǑƽǃƾ ƽǑƽƽDž ƽǑƽǁDž ƽǑƽDŽǁ /PX UIF QPTUFSJPS NFBOT PG CPUI # BOE ) BSF DMPTFS UP [FSP "OE UIF TUBOEBSE EFWJBUJPOT GPS CPUI QBSBNFUFST BSF UXJDF BT MBSHF BT JO UIF CJWBSJBUF NPEFMT *ǂǑƾƽ BOE *ǂǑƾƾ  *O UIF DBTF PG QFSDFOU GBU UIF QPTUFSJPS NFBO JT FTTFOUJBMMZ [FSP 8IBU IBT IBQQFOFE IFSF ćJT JT UIF TBNF QIFOPNFOPO BT JO UIF MFH MFOHUI FYBNQMF 8IBU IBT IBQQFOFE JT UIBU UIF WBSJBCMFT -"/ Ǒ#1 BOE -"/ Ǒ) 1,0" DPOUBJO NVDI PG UIF TBNF JOGPSNBUJPO ćFZ BSF TVCTUJUVUFT GPS POF BOPUIFS "T B SFTVMU XIFO ZPV JODMVEF CPUI JO B SFHSFTTJPO UIF QPTUFSJPS EJTUSJCVUJPO FOET VQ EFTDSJCJOH B MPOH SJEHF PG DPNCJOBUJPOT  .6-5*7"3*"5& .0%&-4 kcal.per.g 10 30 50 0.5 0.7 0.9 10 30 50 perc.fat 0.5 0.7 0.9 30 50 70 30 50 70 perc.lactose 'JHVSF  PG UIF UPU DFOU GBU BO WBSJBCMFT GS NJML EBUB QFSDFOU MBD OFHBUJWFMZ POF BOPUIF TFOUJBMMZ UI UJPO
  27. Post-treatment bias • Headline: Thoughtlessly adding predictors is a bad

    idea. • Another danger: Post-treatment bias Controlling for consequence of treatment statistically knocks out treatment y x1 x2 Treatment Mediator Outcome
  28. Beware the Collider • Collider: A variable X that is

    influenced by two other variables, Y and Z • Want to know Z ~ Y • Don’t condition on X (or anything X causes) • Common trap: Selection on X forces conditioning on X
  29. marry happy Beware the Collider Also happens with model specification.

    Are older people less happy? Should we control for marriage status? age + + ?
  30. 0 20 40 60 80 100 0.0 0.2 0.4 0.6

    0.8 1.0 age happiness married unmarried No relationship between age & happiness. 5 happiest people get married each year. What happens when we control for marriage status?
  31. married age_std -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5

    Value model: happiness ~ age + married marry happy age + + –
  32. married age_std -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5

    Value model: happiness ~ age + married age_std -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5 Value model: happiness ~ age
  33. Additional Nightmares • Causal inference is very hard™ • Residual

    confounding looms • Always model dependent • Real interventions change many variables at once • Complex systems –> everything “causes” everything • No secret weapon
  34. Next week • Homework: 5H1, 5H2, 5H3 • Please put

    your name in the file • Next week, we are in the big lecture hall downstairs • Next week, Chapter 6 • Sailing between (1) the whirlpool of underfitting (2) the many-headed monster of overfitting