L20 Statistical Rethinking Winter 2019

L20 Statistical Rethinking Winter 2019

Lecture 20 of the Dec 2018 through March 2019 edition of Statistical Rethinking. Covers Chapter 15, measurement error and missing data.

A0f2f64b2e58f3bfa48296fb9ed73853?s=128

Richard McElreath

March 01, 2019
Tweet

Transcript

  1. Missing Data & Other Opportunities Statistical Rethinking Winter 2019 Lecture

    20 / Week 10
  2. 1 2 3

  3. 1 2 3 You are served: Probability other side is

    burnt?
  4. Avoid being clever • Intuition terrible guide to probability •

    No need to be clever; just ruthlessly apply conditional probability • Pr(want to know|already know)
  5. 1S(XBOU UP LOPX|BMSFBEZ LOPX) DBTF XF LOPX UIF VQ TJEF

    JT CVSOU 8F XBOU UP LOPX XIFUIFS PS OPU UIF EPX U ćF EFĕOJUJPO PG DPOEJUJPOBM QSPCBCJMJUZ UFMMT VT 1S(CVSOU EPXO|CVSOU VQ) = 1S(CVSOU VQ, CVSOU EPXO) 1S(CVSOU VQ) KVTU UIF EFĕOJUJPO PG DPOEJUJPOBM QSPCBCJMJUZ MBCFMFE XJUI PVS QBODBLF QS OU UP LOPX JG UIF EPXO TJEF JT CVSOU BOE UIF JOGPSNBUJPO XF IBWF JT UIBU CVSOU 8F DPOEJUJPO PO UIF JOGPSNBUJPO TP XF VQEBUF PVS TUBUF PG JOGPSNB JU ćF EFĕOJUJPO UFMMT VT UIBU UIF QSPCBCJMJUZ XF XBOU JT KVTU UIF QSPCBCJMJUZ CVSOU QBODBLF EJWJEFE CZ UIF QSPCBCJMJUZ PG TFFJOH B CVSOU TJEF VQ ćF QSPC VSOUCVSOU QBODBLF JT  CFDBVTF B QBODBLF XBT TFMFDUFE BU SBOEPN ćF QSP VQ TJEF JT CVSOU NVTU BWFSBHF PWFS FBDI XBZ XF DBO HFU EFBMU B CVSOU UPQ TJEF F ćJT JT VSOU VQ) = 1S(##)() + 1S(#6)(.) + 1S(66)() = (/) + (/)(/) = SFNBJOT JT UIF QSPCBCJMJUZ PG HFUUJOH UIF QBODBLF UIBU JT CVSOU PO CPUI TJEFT B SPN UIF QSPCMFN EFĕOJUJPO 4P BMM UPHFUIFS
  6. 1S(XBOU UP LOPX|BMSFBEZ LOPX) DBTF XF LOPX UIF VQ TJEF

    JT CVSOU 8F XBOU UP LOPX XIFUIFS PS OPU UIF EPX U ćF EFĕOJUJPO PG DPOEJUJPOBM QSPCBCJMJUZ UFMMT VT 1S(CVSOU EPXO|CVSOU VQ) = 1S(CVSOU VQ, CVSOU EPXO) 1S(CVSOU VQ) KVTU UIF EFĕOJUJPO PG DPOEJUJPOBM QSPCBCJMJUZ MBCFMFE XJUI PVS QBODBLF QS OU UP LOPX JG UIF EPXO TJEF JT CVSOU BOE UIF JOGPSNBUJPO XF IBWF JT UIBU CVSOU 8F DPOEJUJPO PO UIF JOGPSNBUJPO TP XF VQEBUF PVS TUBUF PG JOGPSNB JU ćF EFĕOJUJPO UFMMT VT UIBU UIF QSPCBCJMJUZ XF XBOU JT KVTU UIF QSPCBCJMJUZ CVSOU QBODBLF EJWJEFE CZ UIF QSPCBCJMJUZ PG TFFJOH B CVSOU TJEF VQ ćF QSPC VSOUCVSOU QBODBLF JT  CFDBVTF B QBODBLF XBT TFMFDUFE BU SBOEPN ćF QSP VQ TJEF JT CVSOU NVTU BWFSBHF PWFS FBDI XBZ XF DBO HFU EFBMU B CVSOU UPQ TJEF F ćJT JT VSOU VQ) = 1S(##)() + 1S(#6)(.) + 1S(66)() = (/) + (/)(/) = SFNBJOT JT UIF QSPCBCJMJUZ PG HFUUJOH UIF QBODBLF UIBU JT CVSOU PO CPUI TJEFT B SPN UIF QSPCMFN EFĕOJUJPO 4P BMM UPHFUIFS ćJT JT KVTU UIF EFĕOJUJPO PG DPOEJUJPOBM QSPCBCJMJUZ MBCFMFE XJUI P 8F XBOU UP LOPX JG UIF EPXO TJEF JT CVSOU BOE UIF JOGPSNBUJPO X TJEF JT CVSOU 8F DPOEJUJPO PO UIF JOGPSNBUJPO TP XF VQEBUF PVS TU MJHIU PG JU ćF EFĕOJUJPO UFMMT VT UIBU UIF QSPCBCJMJUZ XF XBOU JT KVTU CVSOUCVSOU QBODBLF EJWJEFE CZ UIF QSPCBCJMJUZ PG TFFJOH B CVSOU TJE PG UIF CVSOUCVSOU QBODBLF JT  CFDBVTF B QBODBLF XBT TFMFDUFE BU SB JUZ UIF VQ TJEF JT CVSOU NVTU BWFSBHF PWFS FBDI XBZ XF DBO HFU EFBMU B QBODBLF ćJT JT 1S(CVSOU VQ) = 1S(##)() + 1S(#6)(.) + 1S(66)() = (/) + "MM UIBU SFNBJOT JT UIF QSPCBCJMJUZ PG HFUUJOH UIF QBODBLF UIBU JT CVSOU P JT  GSPN UIF QSPCMFN EFĕOJUJPO 4P BMM UPHFUIFS 1S(CVSOU EPXO|CVSOU VQ) = / / =   *G ZPV EPOU RVJUF CFMJFWF UIJT BOTXFS ZPV DBO EP B RVJDL TJNVMBUJPO UP 
  7. 1S(XBOU UP LOPX|BMSFBEZ LOPX) DBTF XF LOPX UIF VQ TJEF

    JT CVSOU 8F XBOU UP LOPX XIFUIFS PS OPU UIF EPX U ćF EFĕOJUJPO PG DPOEJUJPOBM QSPCBCJMJUZ UFMMT VT 1S(CVSOU EPXO|CVSOU VQ) = 1S(CVSOU VQ, CVSOU EPXO) 1S(CVSOU VQ) KVTU UIF EFĕOJUJPO PG DPOEJUJPOBM QSPCBCJMJUZ MBCFMFE XJUI PVS QBODBLF QS OU UP LOPX JG UIF EPXO TJEF JT CVSOU BOE UIF JOGPSNBUJPO XF IBWF JT UIBU CVSOU 8F DPOEJUJPO PO UIF JOGPSNBUJPO TP XF VQEBUF PVS TUBUF PG JOGPSNB JU ćF EFĕOJUJPO UFMMT VT UIBU UIF QSPCBCJMJUZ XF XBOU JT KVTU UIF QSPCBCJMJUZ CVSOU QBODBLF EJWJEFE CZ UIF QSPCBCJMJUZ PG TFFJOH B CVSOU TJEF VQ ćF QSPC VSOUCVSOU QBODBLF JT  CFDBVTF B QBODBLF XBT TFMFDUFE BU SBOEPN ćF QSP VQ TJEF JT CVSOU NVTU BWFSBHF PWFS FBDI XBZ XF DBO HFU EFBMU B CVSOU UPQ TJEF F ćJT JT VSOU VQ) = 1S(##)() + 1S(#6)(.) + 1S(66)() = (/) + (/)(/) = SFNBJOT JT UIF QSPCBCJMJUZ PG HFUUJOH UIF QBODBLF UIBU JT CVSOU PO CPUI TJEFT B SPN UIF QSPCMFN EFĕOJUJPO 4P BMM UPHFUIFS ćJT JT KVTU UIF EFĕOJUJPO PG DPOEJUJPOBM QSPCBCJMJUZ MBCFMFE XJUI P 8F XBOU UP LOPX JG UIF EPXO TJEF JT CVSOU BOE UIF JOGPSNBUJPO X TJEF JT CVSOU 8F DPOEJUJPO PO UIF JOGPSNBUJPO TP XF VQEBUF PVS TU MJHIU PG JU ćF EFĕOJUJPO UFMMT VT UIBU UIF QSPCBCJMJUZ XF XBOU JT KVTU CVSOUCVSOU QBODBLF EJWJEFE CZ UIF QSPCBCJMJUZ PG TFFJOH B CVSOU TJE PG UIF CVSOUCVSOU QBODBLF JT  CFDBVTF B QBODBLF XBT TFMFDUFE BU SB JUZ UIF VQ TJEF JT CVSOU NVTU BWFSBHF PWFS FBDI XBZ XF DBO HFU EFBMU B QBODBLF ćJT JT 1S(CVSOU VQ) = 1S(##)() + 1S(#6)(.) + 1S(66)() = (/) + "MM UIBU SFNBJOT JT UIF QSPCBCJMJUZ PG HFUUJOH UIF QBODBLF UIBU JT CVSOU P JT  GSPN UIF QSPCMFN EFĕOJUJPO 4P BMM UPHFUIFS 1S(CVSOU EPXO|CVSOU VQ) = / / =   *G ZPV EPOU RVJUF CFMJFWF UIJT BOTXFS ZPV DBO EP B RVJDL TJNVMBUJPO UP  1S(CVSOU VQ) JUJPOBM QSPCBCJMJUZ MBCFMFE XJUI PVS QBODBLF QSPCMFN F JT CVSOU BOE UIF JOGPSNBUJPO XF IBWF JT UIBU UIF VQ JOGPSNBUJPO TP XF VQEBUF PVS TUBUF PG JOGPSNBUJPO JO IBU UIF QSPCBCJMJUZ XF XBOU JT KVTU UIF QSPCBCJMJUZ PG UIF F QSPCBCJMJUZ PG TFFJOH B CVSOU TJEF VQ ćF QSPCBCJMJUZ FDBVTF B QBODBLF XBT TFMFDUFE BU SBOEPN ćF QSPCBCJM F PWFS FBDI XBZ XF DBO HFU EFBMU B CVSOU UPQ TJEF PG UIF #6)(.) + 1S(66)() = (/) + (/)(/) = . G HFUUJOH UIF QBODBLF UIBU JT CVSOU PO CPUI TJEFT BOE UIJT 4P BMM UPHFUIFS EPXO|CVSOU VQ) = / / =   S ZPV DBO EP B RVJDL TJNVMBUJPO UP DPOĕSN JU
  8. 1S(XBOU UP LOPX|BMSFBEZ LOPX) DBTF XF LOPX UIF VQ TJEF

    JT CVSOU 8F XBOU UP LOPX XIFUIFS PS OPU UIF EPX U ćF EFĕOJUJPO PG DPOEJUJPOBM QSPCBCJMJUZ UFMMT VT 1S(CVSOU EPXO|CVSOU VQ) = 1S(CVSOU VQ, CVSOU EPXO) 1S(CVSOU VQ) KVTU UIF EFĕOJUJPO PG DPOEJUJPOBM QSPCBCJMJUZ MBCFMFE XJUI PVS QBODBLF QS OU UP LOPX JG UIF EPXO TJEF JT CVSOU BOE UIF JOGPSNBUJPO XF IBWF JT UIBU CVSOU 8F DPOEJUJPO PO UIF JOGPSNBUJPO TP XF VQEBUF PVS TUBUF PG JOGPSNB JU ćF EFĕOJUJPO UFMMT VT UIBU UIF QSPCBCJMJUZ XF XBOU JT KVTU UIF QSPCBCJMJUZ CVSOU QBODBLF EJWJEFE CZ UIF QSPCBCJMJUZ PG TFFJOH B CVSOU TJEF VQ ćF QSPC VSOUCVSOU QBODBLF JT  CFDBVTF B QBODBLF XBT TFMFDUFE BU SBOEPN ćF QSP VQ TJEF JT CVSOU NVTU BWFSBHF PWFS FBDI XBZ XF DBO HFU EFBMU B CVSOU UPQ TJEF F ćJT JT VSOU VQ) = 1S(##)() + 1S(#6)(.) + 1S(66)() = (/) + (/)(/) = SFNBJOT JT UIF QSPCBCJMJUZ PG HFUUJOH UIF QBODBLF UIBU JT CVSOU PO CPUI TJEFT B SPN UIF QSPCMFN EFĕOJUJPO 4P BMM UPHFUIFS ćJT JT KVTU UIF EFĕOJUJPO PG DPOEJUJPOBM QSPCBCJMJUZ MBCFMFE XJUI P 8F XBOU UP LOPX JG UIF EPXO TJEF JT CVSOU BOE UIF JOGPSNBUJPO X TJEF JT CVSOU 8F DPOEJUJPO PO UIF JOGPSNBUJPO TP XF VQEBUF PVS TU MJHIU PG JU ćF EFĕOJUJPO UFMMT VT UIBU UIF QSPCBCJMJUZ XF XBOU JT KVTU CVSOUCVSOU QBODBLF EJWJEFE CZ UIF QSPCBCJMJUZ PG TFFJOH B CVSOU TJE PG UIF CVSOUCVSOU QBODBLF JT  CFDBVTF B QBODBLF XBT TFMFDUFE BU SB JUZ UIF VQ TJEF JT CVSOU NVTU BWFSBHF PWFS FBDI XBZ XF DBO HFU EFBMU B QBODBLF ćJT JT 1S(CVSOU VQ) = 1S(##)() + 1S(#6)(.) + 1S(66)() = (/) + "MM UIBU SFNBJOT JT UIF QSPCBCJMJUZ PG HFUUJOH UIF QBODBLF UIBU JT CVSOU P JT  GSPN UIF QSPCMFN EFĕOJUJPO 4P BMM UPHFUIFS 1S(CVSOU EPXO|CVSOU VQ) = / / =   *G ZPV EPOU RVJUF CFMJFWF UIJT BOTXFS ZPV DBO EP B RVJDL TJNVMBUJPO UP  1S(CVSOU VQ) JUJPOBM QSPCBCJMJUZ MBCFMFE XJUI PVS QBODBLF QSPCMFN F JT CVSOU BOE UIF JOGPSNBUJPO XF IBWF JT UIBU UIF VQ JOGPSNBUJPO TP XF VQEBUF PVS TUBUF PG JOGPSNBUJPO JO IBU UIF QSPCBCJMJUZ XF XBOU JT KVTU UIF QSPCBCJMJUZ PG UIF F QSPCBCJMJUZ PG TFFJOH B CVSOU TJEF VQ ćF QSPCBCJMJUZ FDBVTF B QBODBLF XBT TFMFDUFE BU SBOEPN ćF QSPCBCJM F PWFS FBDI XBZ XF DBO HFU EFBMU B CVSOU UPQ TJEF PG UIF #6)(.) + 1S(66)() = (/) + (/)(/) = . G HFUUJOH UIF QBODBLF UIBU JT CVSOU PO CPUI TJEFT BOE UIJT 4P BMM UPHFUIFS EPXO|CVSOU VQ) = / / =   S ZPV DBO EP B RVJDL TJNVMBUJPO UP DPOĕSN JU VTU UIF EFĕOJUJPO PG DPOEJUJPOBM QSPCBCJMJUZ MBCFMFE XJUI PVS QBODBLF Q U UP LOPX JG UIF EPXO TJEF JT CVSOU BOE UIF JOGPSNBUJPO XF IBWF JT UIB VSOU 8F DPOEJUJPO PO UIF JOGPSNBUJPO TP XF VQEBUF PVS TUBUF PG JOGPSN JU ćF EFĕOJUJPO UFMMT VT UIBU UIF QSPCBCJMJUZ XF XBOU JT KVTU UIF QSPCBCJMJ VSOU QBODBLF EJWJEFE CZ UIF QSPCBCJMJUZ PG TFFJOH B CVSOU TJEF VQ ćF QSP VSOUCVSOU QBODBLF JT  CFDBVTF B QBODBLF XBT TFMFDUFE BU SBOEPN ćF Q Q TJEF JT CVSOU NVTU BWFSBHF PWFS FBDI XBZ XF DBO HFU EFBMU B CVSOU UPQ TJE  ćJT JT SOU VQ) = 1S(##)() + 1S(#6)(.) + 1S(66)() = (/) + (/)(/) SFNBJOT JT UIF QSPCBCJMJUZ PG HFUUJOH UIF QBODBLF UIBU JT CVSOU PO CPUI TJEFT PN UIF QSPCMFN EFĕOJUJPO 4P BMM UPHFUIFS 1S(CVSOU EPXO|CVSOU VQ) = / / =   POU RVJUF CFMJFWF UIJT BOTXFS ZPV DBO EP B RVJDL TJNVMBUJPO UP DPOĕSN JU 
  9. 1 2 3 You are served: Probability other side is

    burnt?
  10. Getting Ruthless • Express information as constraints and distributions =>

    let logic discover implications • No need to be clever • Examples: • Measurement error • Missing data
  11. Measurement error • Measurement always entails error • Typical linear

    regression: interpret sigma as “error” on outcome • What if error isn’t constant? • What if error is on predictors?
  12. Error on outcome • data(WaffleDivorce) • Consider error on outcome,

    divorce rate • Heterogeneity in error • Small State => large error   .*44*/( %"5" "/% 23 24 25 26 27 28 29 4 6 8 10 12 14 Median age marriage Divorce rate 'ĶĴłĿIJ ƉƌƉ -Fę %JWPSDF SBUF CZ 6OJUFE 4UBUFT 7FSUJDBM CBST TIPX QM PG UIF (BVTTJBO VODFSUBJOUZ JO NFBT BHBJO XJUI TUBOEBSE EFWJBUJPOT BHBJO
  13. 23 24 25 26 27 28 29 4 6 8

    10 12 14 Median age marriage Divorce rate 0 1 2 3 4 6 8 10 12 14 log population Divorce rate 'ĶĴłĿIJ ƉƍƉ -Fę %JWPSDF SBUF CZ NFEJBO BHF PG NBSSJBHF 4UBUFT PG UIF 6OJUFE 4UBUFT 7FSUJDBM CBST TIPX QMVT BOE NJOVT POF TUBOEBSE EFWJBUJPO PG UIF (BVTTJBO VODFSUBJOUZ JO NFBTVSFE EJWPSDF SBUF 3JHIU %JWPSDF SBUF BHBJO XJUI TUBOEBSE EFWJBUJPOT BHBJOTU MPH QPQVMBUJPO PG FBDI 4UBUF 4NBMMFS 4UBUFT QSPEVDF NPSF VODFSUBJO FTUJNBUFT
  14. 23 24 25 26 27 28 29 4 6 8

    10 12 14 Median age marriage Divorce rate 0 1 2 3 4 6 8 10 12 14 log population Divorce rate 'ĶĴłĿIJ ƉƍƉ -Fę %JWPSDF SBUF CZ NFEJBO BHF PG NBSSJBHF 4UBUFT PG UIF 6OJUFE 4UBUFT 7FSUJDBM CBST TIPX QMVT BOE NJOVT POF TUBOEBSE EFWJBUJPO PG UIF (BVTTJBO VODFSUBJOUZ JO NFBTVSFE EJWPSDF SBUF 3JHIU %JWPSDF SBUF BHBJO XJUI TUBOEBSE EFWJBUJPOT BHBJOTU MPH QPQVMBUJPO PG FBDI 4UBUF 4NBMMFS 4UBUFT QSPEVDF NPSF VODFSUBJO FTUJNBUFT IF NFBTVSFNFOU FSSPS BSJTFT *U JT KVTU QBSU PG UIF TUBUJTUJDBM NPEFM BOE BVTBM NPEFM NPEFM PG UIF EJWPSDF FYBNQMF GSPN $IBQUFS  -FUT UBLF UIBU TBNF PCTFSWBUJPO FSSPS PO UIF PVUDPNF A D D_obs M N O IFSF #VU XF DBO QSPDFFE POF TUFQ BU B UJNF ćF UPQ USJBOHMF PG UIJT FN UIBU XF XPSLFE XJUI CBDL JO $IBQUFS  "HF BU NBSSJBHF " JOĘV UI EJSFDUMZ BOE JOEJSFDUMZ QBTTJOH UISPVHI NBSSJBHF SBUF .  ćFO XF NPEFM ćF USVF EJWPSDF SBUF % JT PCTFSWFE BT %ļįŀ XIJDI JT B GVODUJPO OE UIF QPQVMBUJPO TJ[F PG FBDI 4UBUF / 4UBUFT XJUI TNBMMFS QPQVMBUJPOT O UIF USVF SBUF CFDBVTF UIFSF JT MFTT EBUB $1*- Ȁ UIF SFQPSUFE TUBOEBSE FSSPST XFSF NFBTVSFE VTJOH UIJT GBDU JOGPSNBUJPO JO B TUBUJTUJDBM NPEFM UIFO *UT KVTU MJLF B TJNVMBUJPO CVU
  15. Error on outcome • Approach: • Treat true divorce rate

    as unknown parameter • Observed rate is sample from Gaussian distribution: observed (data) true (parameter) std error (data) Z ∼ /PSNBM(, ) BTVSFNFOU FSSPS IFSF TISJOLT BMM UIF QSPCBCJMJUZ QJMFT VQ PO  #VU XIFO Z NFBTVSFNFOUT BSF NPSF BOE MFTT QMBVTJCMF ćJT JT XIBU * NFBO CZ TB BUB BSF B TQFDJBM DBTF PG B EJTUSJCVUJPO "OE IFSF JT UIF LFZ JOTJHIU *G XF EP MVF  JO UIJT FYBNQMF UIFO XF DBO KVTU QVU B QBSBNFUFS UIFSF BOE MFU O UFSNT PG UIF %"( BCPWF LOPXJOH / MFUT VT BTTJHO B TUBOEBSE EFWJBUJP O QSPDFTT IPX UP EFĕOF UIF EJTUSJCVUJPO GPS FBDI EJWPSDF SBUF 'PS FBDI PCTFSW SF XJMM CF POF QBSBNFUFS %ŁĿłIJ,J EFĕOFE CZ %ļįŀ,J ∼ /PSNBM(%ŁĿłIJ,J, %ŀIJ,J) FT JT EFĕOF UIF NFBTVSFNFOU %ļįŀ,J BT IBWJOH UIF TQFDJĕFE (BVTTJBO EJT O UIF VOLOPXO QBSBNFUFS %IJŀŁ,J  4P UIF BCPWF EFĕOFT B QSPCBCJMJUZ GPS F E EJWPSDF SBUF HJWFO B LOPXO NFBTVSFNFOU FSSPS B MPU UP UBLF JO #VU XFMM HP POF TUFQ BU B UJNF 3FDBMM UIBU UIF HPBM JT F % BT B MJOFBS GVODUJPO PG BHF BU NBSSJBHF " BOE NBSSJBHF SBUF . )FSFT LT MJLF XJUI UIF NFBTVSFNFOU FSSPST IJHIMJHIUFE JO CMVF
  16. Error on outcome: model 4UBUFT QSPEVDF NPSF VODFSUBJO FTUJNBUFT QSPDFTT

    JUTFMG XIFSF UIF NFBTVSFNFOU FSSPS BSJTFT *U JT KVTU QBSU MJLFXJTF QBSU PG UIF DBVTBM NPEFM 3FDBMM UIF DBVTBM NPEFM PG UIF EJWPSDF FYBNQMF GSPN $IBQ NPEFM BOE OPX BEE PCTFSWBUJPO FSSPS PO UIF PVUDPNF A D D_obs M N ćFSFT B MPU HPJOH PO IFSF #VU XF DBO QSPDFFE POF TUFQ BU B UJN %"( JT UIF TBNF TZTUFN UIBU XF XPSLFE XJUI CBDL JO $IBQUFS  FODFT EJWPSDF % CPUI EJSFDUMZ BOE JOEJSFDUMZ QBTTJOH UISPVHI IBWF UIF PCTFSWBUJPO NPEFM ćF USVF EJWPSDF SBUF % JT PCTFSWFE PG CPUI UIF USVF SBUF BOE UIF QPQVMBUJPO TJ[F PG FBDI 4UBUF / 4UBU ļįŀ,J E PO UIF VOLOPXO QBSBNFUFS %IJŀŁ,J  4P UIF BCPWF EFĕOFT B QSPCBCJMJUZ GPS SWFE EJWPSDF SBUF HJWFO B LOPXO NFBTVSFNFOU FSSPS JT JT B MPU UP UBLF JO #VU XFMM HP POF TUFQ BU B UJNF 3FDBMM UIBU UIF HPBM J SBUF % BT B MJOFBS GVODUJPO PG BHF BU NBSSJBHF " BOE NBSSJBHF SBUF . )FSF MPPLT MJLF XJUI UIF NFBTVSFNFOU FSSPST IJHIMJHIUFE JO CMVF %ļįŀ,J ∼ /PSNBM(%ŁĿłIJ,J, %ŀIJ,J) [distribution for obse %ŁĿłIJ,J ∼ /PSNBM(µJ, σ) [distribution fo µJ = α + β" "J + β. .J [linear model to ass α ∼ /PSNBM(, .) β" ∼ /PSNBM(, .) β. ∼ /PSNBM(, .) σ ∼ &YQPOFOUJBM() MJLF B MJOFBS SFHSFTTJPO CVU XJUI UIF BEEJUJPO PG UIF UPQ MJOF UIBU DPOOFDU UP UIF USVF WBMVF &BDI %ŁĿłIJ QBSBNFUFS BMTP HFUT B TFDPOE SPMF BT UIF NFBO VUJPO POF UIBU QSFEJDUT UIF PCTFSWFE NFBTVSFNFOU " DPPM JNQMJDBUJPO UIB UIBU JOGPSNBUJPO ĘPXT JO CPUI EJSFDUJPOT‰UIF VODFSUBJOUZ JO NFBTVSFNFOU
  17. 4UBUFT QSPEVDF NPSF VODFSUBJO FTUJNBUFT QSPDFTT JUTFMG XIFSF UIF NFBTVSFNFOU

    FSSPS BSJTFT *U JT KVTU QBSU MJLFXJTF QBSU PG UIF DBVTBM NPEFM 3FDBMM UIF DBVTBM NPEFM PG UIF EJWPSDF FYBNQMF GSPN $IBQ NPEFM BOE OPX BEE PCTFSWBUJPO FSSPS PO UIF PVUDPNF A D D_obs M N ćFSFT B MPU HPJOH PO IFSF #VU XF DBO QSPDFFE POF TUFQ BU B UJN %"( JT UIF TBNF TZTUFN UIBU XF XPSLFE XJUI CBDL JO $IBQUFS  FODFT EJWPSDF % CPUI EJSFDUMZ BOE JOEJSFDUMZ QBTTJOH UISPVHI IBWF UIF PCTFSWBUJPO NPEFM ćF USVF EJWPSDF SBUF % JT PCTFSWFE PG CPUI UIF USVF SBUF BOE UIF QPQVMBUJPO TJ[F PG FBDI 4UBUF / 4UBU ļįŀ,J E PO UIF VOLOPXO QBSBNFUFS %IJŀŁ,J  4P UIF BCPWF EFĕOFT B QSPCBCJMJUZ GPS SWFE EJWPSDF SBUF HJWFO B LOPXO NFBTVSFNFOU FSSPS JT JT B MPU UP UBLF JO #VU XFMM HP POF TUFQ BU B UJNF 3FDBMM UIBU UIF HPBM J SBUF % BT B MJOFBS GVODUJPO PG BHF BU NBSSJBHF " BOE NBSSJBHF SBUF . )FSF MPPLT MJLF XJUI UIF NFBTVSFNFOU FSSPST IJHIMJHIUFE JO CMVF %ļįŀ,J ∼ /PSNBM(%ŁĿłIJ,J, %ŀIJ,J) [distribution for obse %ŁĿłIJ,J ∼ /PSNBM(µJ, σ) [distribution fo µJ = α + β" "J + β. .J [linear model to ass α ∼ /PSNBM(, .) β" ∼ /PSNBM(, .) β. ∼ /PSNBM(, .) σ ∼ &YQPOFOUJBM() MJLF B MJOFBS SFHSFTTJPO CVU XJUI UIF BEEJUJPO PG UIF UPQ MJOF UIBU DPOOFDU UP UIF USVF WBMVF &BDI %ŁĿłIJ QBSBNFUFS BMTP HFUT B TFDPOE SPMF BT UIF NFBO VUJPO POF UIBU QSFEJDUT UIF PCTFSWFE NFBTVSFNFOU " DPPM JNQMJDBUJPO UIB UIBU JOGPSNBUJPO ĘPXT JO CPUI EJSFDUJPOT‰UIF VODFSUBJOUZ JO NFBTVSFNFOU estimate standard error of observation
  18. Error on outcome: fitting m15.1 <- ulam( alist( D_obs ~

    dnorm( D_true , D_sd ), vector[N]:D_true ~ dnorm( mu , sigma ), mu <- a + bA*A + bM*M, a ~ dnorm(0,0.2), bA ~ dnorm(0,0.5), bM ~ dnorm(0,0.5), sigma ~ dexp(1) ) , data=dlist , chains=4 , cores=4 )
  19. Error on outcome: fitting m15.1 <- ulam( alist( D_obs ~

    dnorm( D_true , D_sd ), vector[N]:D_true ~ dnorm( mu , sigma ), mu <- a + bA*A + bM*M, a ~ dnorm(0,0.2), bA ~ dnorm(0,0.5), bM ~ dnorm(0,0.5), sigma ~ dexp(1) ) , data=dlist , chains=4 , cores=4 )
  20. • Divorce rate estimates move from observed values. Why? 1.0

    1.2 1.4 AK DC E RI SD VT WY -2 -1 0 1 2 3 -2 -1 0 1 2 median age marriage (std) divorce rate (std) AR ID ME MN ND RI WY SJOLBHF SFTVMUJOH GSPN NPEFMJOH UIF NFBTVSFNFOU FS IF PSJHJOBM NFBTVSFNFOU UIF MFTT TISJOLBHF JO UIF QPT
  21. 0.2 0.4 0.6 0.8 1.0 1.2 1.4 -1.0 -0.5 0.0

    0.5 1.0 1.5 D_sd D_est – D_obs AL AK AR DC ID ME NH ND RI SD UT VT WY -2 -1 0 1 2 3 -2 -1 0 1 2 median age marriage (std) divorce rate (std) AR ID ME MN ND RI WY 'ĶĴłĿIJ ƉƍƊ -Fę 4ISJOLBHF SFTVMUJOH GSPN NPEFMJOH UIF NFBTVSFNFOU FS SPS ćF MFTT FSSPS JO UIF PSJHJOBM NFBTVSFNFOU UIF MFTT TISJOLBHF JO UIF QPT UFSJPS FTUJNBUF 3JHIU $PNQBSJTPO PG SFHSFTTJPO UIBU JHOPSFT NFBTVSFNFOU FSSPS EBTIFE MJOF BOE HSBZ TIBEJOH XJUI SFHSFTTJPO UIBU JODPSQPSBUFT NFB TVSFNFOU FSSPS CMVF MJOF BOE TIBEJOH  ćF QPJOUT BOE MJOF TFHNFOUT TIPX • Shrinkage! Uncertain or extreme states shrink to regression line.
  22. Error on predictor • What about error on predictor? •

    Many procedures invented • errors-in-variables • reduced major axis • total least squares • Our approach will be logical • State information • Deduce implications • Garbage in? You know what comes out. 0 1 2 3 15 20 25 30 log population Marriage rate
  23. Error on predictor: model   .*44*/( %"5" "/% 05)&3

    0110356/*5*&4 GPS NBSSJBHF SBUF 3 )FSFT UIF VQEBUFE NPEFM XJUI UIF OFX CJUT JO CMV %ļįŀ,J ∼ /PSNBM(%ŁĿłIJ,J, %ŀIJ,J) [distributio %ŁĿłIJ,J ∼ /PSNBM(µJ, σ) [distr µJ = α + β" "J + β. .ŁĿłIJ,J .ļįŀ,J ∼ /PSNBM(.ŁĿłIJ,J, .ŀIJ,J) [distribution .ŁĿłIJ,J ∼ /PSNBM(, ) [distri α ∼ /PSNBM(, .) β" ∼ /PSNBM(, .) β. ∼ /PSNBM(, .) σ ∼ &YQPOFOUJBM() ćF .ŁĿłIJ QBSBNFUFST XJMM IPME UIF QPTUFSJPS EJTUSJCVUJPOT PG UIF USVF ĕUUJOH UIF NPEFM JT NVDI MJLF CFGPSF
  24. Error on predictor: model   .*44*/( %"5" "/% 05)&3

    0110356/*5*&4 GPS NBSSJBHF SBUF 3 )FSFT UIF VQEBUFE NPEFM XJUI UIF OFX CJUT JO CMV %ļįŀ,J ∼ /PSNBM(%ŁĿłIJ,J, %ŀIJ,J) [distributio %ŁĿłIJ,J ∼ /PSNBM(µJ, σ) [distr µJ = α + β" "J + β. .ŁĿłIJ,J .ļįŀ,J ∼ /PSNBM(.ŁĿłIJ,J, .ŀIJ,J) [distribution .ŁĿłIJ,J ∼ /PSNBM(, ) [distri α ∼ /PSNBM(, .) β" ∼ /PSNBM(, .) β. ∼ /PSNBM(, .) σ ∼ &YQPOFOUJBM() ćF .ŁĿłIJ QBSBNFUFST XJMM IPME UIF QPTUFSJPS EJTUSJCVUJPOT PG UIF USVF ĕUUJOH UIF NPEFM JT NVDI MJLF CFGPSF use estimates in regression estimated marriage rate standard error of marriage rate likelihood for observed rate
  25. Error on predictor: model   .*44*/( %"5" "/% 05)&3

    0110356/*5*&4 GPS NBSSJBHF SBUF 3 )FSFT UIF VQEBUFE NPEFM XJUI UIF OFX CJUT JO CMV %ļįŀ,J ∼ /PSNBM(%ŁĿłIJ,J, %ŀIJ,J) [distributio %ŁĿłIJ,J ∼ /PSNBM(µJ, σ) [distr µJ = α + β" "J + β. .ŁĿłIJ,J .ļįŀ,J ∼ /PSNBM(.ŁĿłIJ,J, .ŀIJ,J) [distribution .ŁĿłIJ,J ∼ /PSNBM(, ) [distri α ∼ /PSNBM(, .) β" ∼ /PSNBM(, .) β. ∼ /PSNBM(, .) σ ∼ &YQPOFOUJBM() ćF .ŁĿłIJ QBSBNFUFST XJMM IPME UIF QPTUFSJPS EJTUSJCVUJPOT PG UIF USVF ĕUUJOH UIF NPEFM JT NVDI MJLF CFGPSF prior rates Not the best approach: M and A are associated! Will do better later on. 23 24 25 26 27 28 29 4 Median age marriage 0 1 4 log popu 'ĶĴłĿIJ ƉƍƉ -Fę %JWPSDF SBUF CZ NFEJBO BHF PG NBSSJBHF 4 6OJUFE 4UBUFT 7FSUJDBM CBST TIPX QMVT BOE NJOVT POF TUBOEBS PG UIF (BVTTJBO VODFSUBJOUZ JO NFBTVSFE EJWPSDF SBUF 3JHIU % BHBJO XJUI TUBOEBSE EFWJBUJPOT BHBJOTU MPH QPQVMBUJPO PG FBDI 4U 4UBUFT QSPEVDF NPSF VODFSUBJO FTUJNBUFT QSPDFTT JUTFMG XIFSF UIF NFBTVSFNFOU FSSPS BSJTFT *U JT KVTU QBSU PG UIF T MJLFXJTF QBSU PG UIF DBVTBM NPEFM 3FDBMM UIF DBVTBM NPEFM PG UIF EJWPSDF FYBNQMF GSPN $IBQUFS  NPEFM BOE OPX BEE PCTFSWBUJPO FSSPS PO UIF PVUDPNF A D D_obs M N ćFSFT B MPU HPJOH PO IFSF #VU XF DBO QSPDFFE POF TUFQ BU B UJNF ć
  26. filled circles: observed open circles: estimated lines connect points for

    same State  .*44*/( %"5" -1 0 1 2 -2 -1 0 1 2 marriage rate (std) divorce rate (std) 'ĶĴłĿIJ ƉƍƋ BOE NBSSJBH TFSWFE WBMVF NFBOT -JOF TBNF 4UBUF # UIF JOGFSSFE
  27. Error on predictor • Both divorce rate and marriage rate

    shrink • Divorce shrinks more. Why? • Marriage rate not strongly associated with outcome => not much pooling through regression => not much shrinkage  . -1 0 1 2 -2 -1 0 1 2 marriage rate (std) divorce rate (std) +*$)/.ΰ ά ./ ΁ ά ./ α !*- ΰ $ $) ͠΂)-*2ΰα α
  28. Measurement error • Common malady: “data” come from uncertain procedure,

    but uncertainty discarded at analysis • Examples: • Predicting with averages • Parentage analysis • Phylogenetics: distribution of trees • Archaeology/paleontology/forensics: identification, sexing, aging, dating • Propagate uncertainty
  29. Missing data • Missing values commonplace • Usual approach: complete-case

    analysis • drop all cases with any missing values • Discards a lot of information • Alternatives • replace missing with mean of column: NEVER DO THIS • Multiple imputation • Bayesian imputation • others
  30. Why impute? • Missingness can be a confound OE OPU

    DPOEJUJPOJOH PO UIF SJHIU WBSJBCMFT BOE Ķ弳ŁĶĻĴ UIF NJTTJOH WBMVFT "MM UIJT XJMM CFDPNF DMFBS JG XF ESBX TPNF EJBHSBNT -FUT SFUVSO UP UIF QSJNBUF NJM YBNQMF GSPN $IBQUFS  8F VTFE /ΰ($'&α UP JMMVTUSBUF NBTLJOH VTJOH CPUI OFPDPSUF FSDFOU BOE CPEZ NBTT UP QSFEJDU NJML FOFSHZ 0OF BTQFDU PG UIPTF EBUB BSF  NJTTJO BMVFT JO UIF ) **-/ 3΀+ - DPMVNO 8F VTFE B İļĺĽĹIJŁIJİĮŀIJ BOBMZTJT CBDL UIFO IJDI NFBOT XF ESPQQFE UIPTF  DBTFT GSPN UIF BOBMZTJT ćBU NFBOT XF BMTP ESPQQFE  FSGFDUMZ HPPE CPEZ NBTT BOE NJML FOFSHZ WBMVFT ćBU MFę VT XJUI POMZ  DBTFT UP XPS JUI 8BT UIBU B CBE JEFB 5P BOTXFS UIBU RVFTUJPO XF OFFE UP UIJOL NPSF DMFBSMZ BCPVU XIZ UIPTF WBMVFT BSF NJT OH ćF CBTJD %"( GSPN UIJT FYBNQMF JT B K M U IFSF . JT CPEZ NBTT # JT OFPDPSUFY QFSDFOU , JT NJML FOFSHZ BOE 6 JT TPNF VOPCTFSWF BSJBCMF UIBU SFOEFST . BOE # QPTJUJWFMZ DPSSFMBUFE 8F XBOU UP BEE NJTTJOHOFTT UP UI SBQI 8IBU UIBU NFBOT JT SFBMJ[JOH UIBU XF IBWFOU PCTFSWFE # OFPDPSUFY QFSDFOU  8FW OTUFBE PCTFSWFE #PCT B QBSUJBMMZ PCTFSWFE TFU PG WBMVFT HFOFSBUFE CZ # BOE TPNF QSPDFT FUT OBNF UIF QSPDFTT UIBU HFOFSBUFT NJTTJOH WBMVFT 3# BOE OPX BEE JU UP PVS HSBQI Body mass Proportion brain neocortex Milk energy (kcal)
  31. Three Types of Missingness K JT OFPDPSUFY QFSDFOU , JT

    NJML FOFSHZ BOE 6 JT TPNF VOPCTFSWFE OE # QPTJUJWFMZ DPSSFMBUFE 8F XBOU UP BEE NJTTJOHOFTT UP UIJT T SFBMJ[JOH UIBU XF IBWFOU PCTFSWFE # OFPDPSUFY QFSDFOU  8FWF QBSUJBMMZ PCTFSWFE TFU PG WBMVFT HFOFSBUFE CZ # BOE TPNF QSPDFTT U HFOFSBUFT NJTTJOH WBMVFT 3# BOE OPX BEE JU UP PVS HSBQI B B_obs K M R_B U UIJOL PG UIF PCTFSWFEXJUINJTTJOHOFTT #ļįŀ BT CFJOH B GVODUJPO CTFSWFE # BOE UIF NJTTJOHOFTT QSPDFTT 3#  8F DBO USFBU 3# MJLF E ćFO XF DBO VTF PVS GSJFOE UIF CBDLEPPS DSJUFSJPO UP ĕHVSF PVU O BCPVU OFFE UP DPOEJUJPO PO 3#  *O UIF HSBQI BCPWF XF XBOU O , 5P ĕHVSF PVU XIFO UIF FTUJNBUF JT DPOGPVOEFE XF ĕOE BMM UIF OZ PG UIFN BSF CBDLEPPST XF OFFE UP DMPTF UIPTF UP HFU UIF DBVTBM BCPWF XF OFFE UP DPOEJUJPO PO . UP DMPTF UIF JOEJSFDU QBUI KVTU UIFSF JT OP QBUI UISPVHI 3#  4P JU EPFTOU DPOGPVOE JOGFSFODF  .*44*/( %"5"  "OPUIFS QPTTJCJMJUZ JT UIBU TPNF PUIFS WBSJBCMF JOĘVFODFT UIF NJTTJOHOFTT QSPDFTT B B_obs K M R_B U /PX . JOĘVFODFT 3# XIJDI NFBOT GPS FYBNQMF UIBU TQFDJFT XJUI TNBMMFS CPEJFT BSF NPSF PS MFTT MJLFMZ UP IBWF NJTTJOH WBMVFT JO #ļįŀ  ćJT DPVME IBQQFO JG SFTFBSDIFST BSF MFTT JOUFS FTUFE JO TNBMM TQFDJFT BOE TP EP OPU PęFO HP UISPVHI UIF USPVCMF PG NBLJOH EFUBJMFE CSBJO NFBTVSFNFOUT GPS UIFN 8IBU IBQQFOT JO UIJT DBTF ćFSF JT OPX B CBDLEPPS QBUI GSPN #ļįŀ UIPVHI 3# UP , 4P UIF NJTTJOHOFTT QSPDFTT DBO DPOGPVOE PVS JOGFSFODF VOMFTT XF DBO DMPTF UIF CBDLEPPS *O UIJT DBTF XF DBO TIVU UIF CBDLEPPS CZ DPOEJUJPOJOH PO . 8F NJHIU IBWF EPOF UIJT BOZXBZ CFDBVTF XF XBOU UP UIF EJSFDU JOĘVFODF PG # PO , ćJT UZQF PG NJTT JOHOFTT JT LOPXO CZ BOPUIFS VOGPSUVOBUFMZ BXLXBSE OBNF ĺĶŀŀĶĻĴ ĮŁ ĿĮĻıļĺ ."3  8F EPOU OFFE UP EJTDPWFS UIF NJTTJOHOFTT QSPDFTT BCPWF #VU UIFSF JT TPNFUIJOH FMTF XF JU EJEOU QSPEVDF CJBT /PX JU XJMM QSPEVDF CJBT CFDBVTF JU SFNPW PUIFS WBSJBCMFT BOE DPOGPVOET PVS JOGFSFODF #VU XF DBO MVDLJMZ BOE QPTTJCMZ NBLF WBMJE JOGFSFODFT 8JUI .$"3 UIF ĕSTU UZQF J VTFGVM 8JUI ."3 UIJT TFDPOE UZQF JU JT NBOEBUPSZ ćF UIJSE UZQF PG NJTTJOHOFTT JT B USVF UFSSPS 4VQQPTF OPX UI PG # BSF UIF POFT UIBU UFOE UP CF NJTTJOH ćJT DPVME IBQQFO GPS F PG OFPDPSUFY BSF TUVEJFE FYBDUMZ GPS UIBU SFBTPO ćFSF XPVME CF N BCPVU TVDI CSBJOT CVU GFX QSFDJTF NFBTVSFNFOUT BCPVU CSBJOT XJUI NJHIU MPPL MJLF UIJT B B_obs K M R_B U ćJT JT B SFBM QSPCMFN /PX UIFSF JT B CBDLEPPS GSPN #PCT UISPVHI XBZ UP , ćFSF JT OP XBZ UP DMPTF UIJT CBDLEPPS CFDBVTF XF DBOU D WBSJBCMF # $POEJUJPOJOH PO . EPFTOU IFMQ *G XF DBO NPEFM UIF N UIFSF JT TUJMM IPQF #VU JO HFOFSBM UIFSF BSF OP HVBSBOUFFT IFSF ć MISSING COMPLETELY AT RANDOM MISSING AT RANDOM MISSING NOT AT RANDOM MCAR MAR MNAR Possibly most confusing statistical terms ever invented.
  32. XIFSF . JT CPEZ NBTT # JT OFPDPSUFY QFSDFOU ,

    JT NJML FOFSHZ BOE 6 JT TPNF VOPCTFSWF WBSJBCMF UIBU SFOEFST . BOE # QPTJUJWFMZ DPSSFMBUFE 8F XBOU UP BEE NJTTJOHOFTT UP UI HSBQI 8IBU UIBU NFBOT JT SFBMJ[JOH UIBU XF IBWFOU PCTFSWFE # OFPDPSUFY QFSDFOU  8F OTUFBE PCTFSWFE #PCT B QBSUJBMMZ PCTFSWFE TFU PG WBMVFT HFOFSBUFE CZ # BOE TPNF QSPDFT -FUT OBNF UIF QSPDFTT UIBU HFOFSBUFT NJTTJOH WBMVFT 3# BOE OPX BEE JU UP PVS HSBQI B B_obs K M R_B U ćF XBZ UP SFBE UIJT JT UP UIJOL PG UIF PCTFSWFEXJUINJTTJOHOFTT #ļįŀ BT CFJOH B GVODUJP PG UIF DPNQMFUFCVUVOPCTFSWFE # BOE UIF NJTTJOHOFTT QSPDFTT 3#  8F DBO USFBU 3# MJ BOPUIFS QPTTJCMF DPOGPVOE ćFO XF DBO VTF PVS GSJFOE UIF CBDLEPPS DSJUFSJPO UP ĕHVSF P XIFO XF OFFE JOGPSNBUJPO BCPVU OFFE UP DPOEJUJPO PO 3#  *O UIF HSBQI BCPWF XF XB OGFS UIF JOĘVFODF PG # PO , 5P ĕHVSF PVU XIFO UIF FTUJNBUF JT DPOGPVOEFE XF ĕOE BMM UI QBUIT GSPN # UP , *G BOZ PG UIFN BSF CBDLEPPST XF OFFE UP DMPTF UIPTF UP HFU UIF DBVT Missingness mechanism Observed B True (unobserved) B
  33. XIFSF . JT CPEZ NBTT # JT OFPDPSUFY QFSDFOU ,

    JT NJML FOFSHZ BOE 6 JT TPNF VOPCTFSWF WBSJBCMF UIBU SFOEFST . BOE # QPTJUJWFMZ DPSSFMBUFE 8F XBOU UP BEE NJTTJOHOFTT UP UI HSBQI 8IBU UIBU NFBOT JT SFBMJ[JOH UIBU XF IBWFOU PCTFSWFE # OFPDPSUFY QFSDFOU  8F OTUFBE PCTFSWFE #PCT B QBSUJBMMZ PCTFSWFE TFU PG WBMVFT HFOFSBUFE CZ # BOE TPNF QSPDFT -FUT OBNF UIF QSPDFTT UIBU HFOFSBUFT NJTTJOH WBMVFT 3# BOE OPX BEE JU UP PVS HSBQI B B_obs K M R_B U ćF XBZ UP SFBE UIJT JT UP UIJOL PG UIF PCTFSWFEXJUINJTTJOHOFTT #ļįŀ BT CFJOH B GVODUJP PG UIF DPNQMFUFCVUVOPCTFSWFE # BOE UIF NJTTJOHOFTT QSPDFTT 3#  8F DBO USFBU 3# MJ BOPUIFS QPTTJCMF DPOGPVOE ćFO XF DBO VTF PVS GSJFOE UIF CBDLEPPS DSJUFSJPO UP ĕHVSF P XIFO XF OFFE JOGPSNBUJPO BCPVU OFFE UP DPOEJUJPO PO 3#  *O UIF HSBQI BCPWF XF XB OGFS UIF JOĘVFODF PG # PO , 5P ĕHVSF PVU XIFO UIF FTUJNBUF JT DPOGPVOEFE XF ĕOE BMM UI QBUIT GSPN # UP , *G BOZ PG UIFN BSF CBDLEPPST XF OFFE UP DMPTF UIPTF UP HFU UIF DBVT Are there any backdoors from B_obs to K?
  34. WBSJBCMF UIBU SFOEFST . BOE # QPTJUJWFMZ DPSSFMBUFE 8F XBOU

    UP BEE NJTTJOHOFTT UP UI HSBQI 8IBU UIBU NFBOT JT SFBMJ[JOH UIBU XF IBWFOU PCTFSWFE # OFPDPSUFY QFSDFOU  8F OTUFBE PCTFSWFE #PCT B QBSUJBMMZ PCTFSWFE TFU PG WBMVFT HFOFSBUFE CZ # BOE TPNF QSPDFT -FUT OBNF UIF QSPDFTT UIBU HFOFSBUFT NJTTJOH WBMVFT 3# BOE OPX BEE JU UP PVS HSBQI B B_obs K M R_B U ćF XBZ UP SFBE UIJT JT UP UIJOL PG UIF PCTFSWFEXJUINJTTJOHOFTT #ļįŀ BT CFJOH B GVODUJP PG UIF DPNQMFUFCVUVOPCTFSWFE # BOE UIF NJTTJOHOFTT QSPDFTT 3#  8F DBO USFBU 3# MJ BOPUIFS QPTTJCMF DPOGPVOE ćFO XF DBO VTF PVS GSJFOE UIF CBDLEPPS DSJUFSJPO UP ĕHVSF P XIFO XF OFFE JOGPSNBUJPO BCPVU OFFE UP DPOEJUJPO PO 3#  *O UIF HSBQI BCPWF XF XB OGFS UIF JOĘVFODF PG # PO , 5P ĕHVSF PVU XIFO UIF FTUJNBUF JT DPOGPVOEFE XF ĕOE BMM UI QBUIT GSPN #ļįŀ UP , *G BOZ PG UIFN BSF CBDLEPPST XF OFFE UP DMPTF UIPTF UP HFU UIF DBVT OĘVFODF PG # *O UIF DBTF BCPWF XF OFFE UP DPOEJUJPO PO . UP DMPTF UIF JOEJSFDU QBUI KV BT FBSMJFS JO UIF CPPL #VU UIFSF JT OP QBUI UISPVHI 3  4P JU EPFTOU DPOGPVOE JOGFSFODF Can condition on M for direct effect. Either way, R_B is ignorable.
  35. Missing Completely At Random • MCAR: K is unconditionally independent

    of R_B • Do not need to condition on anything for R_B not to be a confound • On right, no path through R_B, conditioning on B_obs • Do not NEED to impute • But imputation adds precision K XIFSF . JT CPEZ NBTT # JT OFPDPSUFY QFSDFOU , JT NJML FOFSHZ BOE WBSJBCMF UIBU SFOEFST . BOE # QPTJUJWFMZ DPSSFMBUFE 8F XBOU UP B HSBQI 8IBU UIBU NFBOT JT SFBMJ[JOH UIBU XF IBWFOU PCTFSWFE # OF JOTUFBE PCTFSWFE #PCT B QBSUJBMMZ PCTFSWFE TFU PG WBMVFT HFOFSBUFE -FUT OBNF UIF QSPDFTT UIBU HFOFSBUFT NJTTJOH WBMVFT 3# BOE OPX BE B B_obs K M R_B U ćF XBZ UP SFBE UIJT JT UP UIJOL PG UIF PCTFSWFEXJUINJTTJOHOFTT # PG UIF DPNQMFUFCVUVOPCTFSWFE # BOE UIF NJTTJOHOFTT QSPDFTT 3# BOPUIFS QPTTJCMF DPOGPVOE ćFO XF DBO VTF PVS GSJFOE UIF CBDLEPP XIFO XF OFFE JOGPSNBUJPO BCPVU OFFE UP DPOEJUJPO PO 3#  *O UI JOGFS UIF JOĘVFODF PG # PO , 5P ĕHVSF PVU XIFO UIF FTUJNBUF JT DPO QBUIT GSPN #ļįŀ UP , *G BOZ PG UIFN BSF CBDLEPPST XF OFFE UP DMPT
  36. WBSJBCMF UIBU SFOEFST . BOE # QPTJUJWFMZ DPSSFMBUFE 8F XBOU

    UP BEE NJTTJOHOFTT UP UI HSBQI 8IBU UIBU NFBOT JT SFBMJ[JOH UIBU XF IBWFOU PCTFSWFE # OFPDPSUFY QFSDFOU  8F OTUFBE PCTFSWFE #PCT B QBSUJBMMZ PCTFSWFE TFU PG WBMVFT HFOFSBUFE CZ # BOE TPNF QSPDFT -FUT OBNF UIF QSPDFTT UIBU HFOFSBUFT NJTTJOH WBMVFT 3# BOE OPX BEE JU UP PVS HSBQI B B_obs K M R_B U ćF XBZ UP SFBE UIJT JT UP UIJOL PG UIF PCTFSWFEXJUINJTTJOHOFTT #ļįŀ BT CFJOH B GVODUJP PG UIF DPNQMFUFCVUVOPCTFSWFE # BOE UIF NJTTJOHOFTT QSPDFTT 3#  8F DBO USFBU 3# MJ BOPUIFS QPTTJCMF DPOGPVOE ćFO XF DBO VTF PVS GSJFOE UIF CBDLEPPS DSJUFSJPO UP ĕHVSF P XIFO XF OFFE JOGPSNBUJPO BCPVU OFFE UP DPOEJUJPO PO 3#  *O UIF HSBQI BCPWF XF XB OGFS UIF JOĘVFODF PG # PO , 5P ĕHVSF PVU XIFO UIF FTUJNBUF JT DPOGPVOEFE XF ĕOE BMM UI QBUIT GSPN #ļįŀ UP , *G BOZ PG UIFN BSF CBDLEPPST XF OFFE UP DMPTF UIPTF UP HFU UIF DBVT OĘVFODF PG # *O UIF DBTF BCPWF XF OFFE UP DPOEJUJPO PO . UP DMPTF UIF JOEJSFDU QBUI KV BT FBSMJFS JO UIF CPPL #VU UIFSF JT OP QBUI UISPVHI 3  4P JU EPFTOU DPOGPVOE JOGFSFODF Does MCAR ever happen in real data? Research assistant randomly deletes values?
  37.  .*44*/( %"5"  "OPUIFS QPTTJCJMJUZ JT UIBU TPNF PUIFS

    WBSJBCMF JOĘVFODFT UIF NJTTJOHOFTT QSPDFTT B B_obs K M R_B U /PX . JOĘVFODFT 3# XIJDI NFBOT GPS FYBNQMF UIBU TQFDJFT XJUI TNBMMFS CPEJFT BSF NPS PS MFTT MJLFMZ UP IBWF NJTTJOH WBMVFT JO #ļįŀ  ćJT DPVME IBQQFO JG SFTFBSDIFST BSF MFTT JOUF TUFE JO TNBMM TQFDJFT BOE TP EP OPU PęFO HP UISPVHI UIF USPVCMF PG NBLJOH EFUBJMFE CSBJ NFBTVSFNFOUT GPS UIFN 8IBU IBQQFOT JO UIJT DBTF ćFSF JT OPX B CBDLEPPS QBUI GSPN ļįŀ UIPVHI 3# UP , 4P UIF NJTTJOHOFTT QSPDFTT DBO DPOGPVOE PVS JOGFSFODF VOMFTT XF DB MPTF UIF CBDLEPPS *O UIJT DBTF XF DBO TIVU UIF CBDLEPPS CZ DPOEJUJPOJOH PO . 8F NJHI Missing At Random Missingness more likely for specific values of M. How can this happen?
  38.  .*44*/( %"5"  "OPUIFS QPTTJCJMJUZ JT UIBU TPNF PUIFS

    WBSJBCMF JOĘVFODFT UIF NJTTJOHOFTT QSPDFTT B B_obs K M R_B U /PX . JOĘVFODFT 3# XIJDI NFBOT GPS FYBNQMF UIBU TQFDJFT XJUI TNBMMFS CPEJFT BSF NPS PS MFTT MJLFMZ UP IBWF NJTTJOH WBMVFT JO #ļįŀ  ćJT DPVME IBQQFO JG SFTFBSDIFST BSF MFTT JOUF TUFE JO TNBMM TQFDJFT BOE TP EP OPU PęFO HP UISPVHI UIF USPVCMF PG NBLJOH EFUBJMFE CSBJ NFBTVSFNFOUT GPS UIFN 8IBU IBQQFOT JO UIJT DBTF ćFSF JT OPX B CBDLEPPS QBUI GSPN ļįŀ UIPVHI 3# UP , 4P UIF NJTTJOHOFTT QSPDFTT DBO DPOGPVOE PVS JOGFSFODF VOMFTT XF DB MPTF UIF CBDLEPPS *O UIJT DBTF XF DBO TIVU UIF CBDLEPPS CZ DPOEJUJPOJOH PO . 8F NJHI Backdoor path from B_obs to K? Missing At Random
  39.  .*44*/( %"5"  "OPUIFS QPTTJCJMJUZ JT UIBU TPNF PUIFS

    WBSJBCMF JOĘVFODFT UIF NJTTJOHOFTT QSPDFTT B B_obs K M R_B U /PX . JOĘVFODFT 3# XIJDI NFBOT GPS FYBNQMF UIBU TQFDJFT XJUI TNBMMFS CPEJFT BSF NPS PS MFTT MJLFMZ UP IBWF NJTTJOH WBMVFT JO #ļįŀ  ćJT DPVME IBQQFO JG SFTFBSDIFST BSF MFTT JOUF TUFE JO TNBMM TQFDJFT BOE TP EP OPU PęFO HP UISPVHI UIF USPVCMF PG NBLJOH EFUBJMFE CSBJ NFBTVSFNFOUT GPS UIFN 8IBU IBQQFOT JO UIJT DBTF ćFSF JT OPX B CBDLEPPS QBUI GSPN ļįŀ UIPVHI 3# UP , 4P UIF NJTTJOHOFTT QSPDFTT DBO DPOGPVOE PVS JOGFSFODF VOMFTT XF DB MPTF UIF CBDLEPPS *O UIJT DBTF XF DBO TIVU UIF CBDLEPPS CZ DPOEJUJPOJOH PO . 8F NJHI Missing At Random Backdoor path from B_obs to K? Can condition on M to close.
  40. Missing (Simply) At Random • MAR: K is conditionally independent

    of R_B • Must to condition on M for R_B not to be a confound • Still must impute to de-bias estimates • Why? If you delete cases of M/K where B is missing, missingness obscures causation.  .*44*/( %"5" "OPUIFS QPTTJCJMJUZ JT UIBU TPNF PUIFS WBSJBCMF JOĘVFODFT UI B B_obs K M R_B U /PX . JOĘVFODFT 3# XIJDI NFBOT GPS FYBNQMF UIBU TQFDJFT XJ PS MFTT MJLFMZ UP IBWF NJTTJOH WBMVFT JO #ļįŀ  ćJT DPVME IBQQFO FTUFE JO TNBMM TQFDJFT BOE TP EP OPU PęFO HP UISPVHI UIF USPVC NFBTVSFNFOUT GPS UIFN 8IBU IBQQFOT JO UIJT DBTF ćFSF JT O #ļįŀ UIPVHI 3# UP , 4P UIF NJTTJOHOFTT QSPDFTT DBO DPOGPVOE P DMPTF UIF CBDLEPPS *O UIJT DBTF XF DBO TIVU UIF CBDLEPPS CZ DPO
  41. Missing Not At Random Missingness more likely for specific values

    of B. How can this happen? FGVM 8JUI ."3 UIJT TFDPOE UZQF JU JT NBOEBUPSZ ćF UIJSE UZQF PG NJTTJOHOFTT JT B USVF UFSSPS 4VQQPTF OPX UIBU TQFDJFT XJUI MPX WBMVFT # BSF UIF POFT UIBU UFOE UP CF NJTTJOH ćJT DPVME IBQQFO GPS FYBNQMF JG TQFDJFT XJUI MPUT OFPDPSUFY BSF TUVEJFE FYBDUMZ GPS UIBU SFBTPO ćFSF XPVME CF NBOZ QSFDJTF NFBTVSFNFOUT PVU TVDI CSBJOT CVU GFX QSFDJTF NFBTVSFNFOUT BCPVU CSBJOT XJUI MFTT OFPDPSUFY ćF %"( JHIU MPPL MJLF UIJT B B_obs K M R_B U JT JT B SFBM QSPCMFN /PX UIFSF JT B CBDLEPPS GSPN #PCT UISPVHI UIF NFDIBOJTN 3# BMM UIF BZ UP , ćFSF JT OP XBZ UP DMPTF UIJT CBDLEPPS CFDBVTF XF DBOU DPOEJUJPO PO UIF DPNQMFUF SJBCMF # $POEJUJPOJOH PO . EPFTOU IFMQ *G XF DBO NPEFM UIF NJTTJOHOFTT NFDIBOJTN 3# FSF JT TUJMM IPQF #VU JO HFOFSBM UIFSF BSF OP HVBSBOUFFT IFSF ćJT TDFOBSJP JT TPNFUJNFT
  42. Missing Not At Random No way to shut the backdoor!

    FGVM 8JUI ."3 UIJT TFDPOE UZQF JU JT NBOEBUPSZ ćF UIJSE UZQF PG NJTTJOHOFTT JT B USVF UFSSPS 4VQQPTF OPX UIBU TQFDJFT XJUI MPX WBMVFT # BSF UIF POFT UIBU UFOE UP CF NJTTJOH ćJT DPVME IBQQFO GPS FYBNQMF JG TQFDJFT XJUI MPUT OFPDPSUFY BSF TUVEJFE FYBDUMZ GPS UIBU SFBTPO ćFSF XPVME CF NBOZ QSFDJTF NFBTVSFNFOUT PVU TVDI CSBJOT CVU GFX QSFDJTF NFBTVSFNFOUT BCPVU CSBJOT XJUI MFTT OFPDPSUFY ćF %"( JHIU MPPL MJLF UIJT B B_obs K M R_B U JT JT B SFBM QSPCMFN /PX UIFSF JT B CBDLEPPS GSPN #PCT UISPVHI UIF NFDIBOJTN 3# BMM UIF BZ UP , ćFSF JT OP XBZ UP DMPTF UIJT CBDLEPPS CFDBVTF XF DBOU DPOEJUJPO PO UIF DPNQMFUF SJBCMF # $POEJUJPOJOH PO . EPFTOU IFMQ *G XF DBO NPEFM UIF NJTTJOHOFTT NFDIBOJTN 3# FSF JT TUJMM IPQF #VU JO HFOFSBM UIFSF BSF OP HVBSBOUFFT IFSF ćJT TDFOBSJP JT TPNFUJNFT
  43. Missing Not At Random Can also arise through unobserved variables

    (right). T TFDPOE UZQF JU JT NBOEBUPSZ NJTTJOHOFTT JT B USVF UFSSPS 4VQQPTF OPX UIBU TQFDJFT XJUI MPX WBMVFT FOE UP CF NJTTJOH ćJT DPVME IBQQFO GPS FYBNQMF JG TQFDJFT XJUI MPUT E FYBDUMZ GPS UIBU SFBTPO ćFSF XPVME CF NBOZ QSFDJTF NFBTVSFNFOUT FX QSFDJTF NFBTVSFNFOUT BCPVU CSBJOT XJUI MFTT OFPDPSUFY ćF %"( B B_obs K M R_B U /PX UIFSF JT B CBDLEPPS GSPN #PCT UISPVHI UIF NFDIBOJTN 3# BMM UIF BZ UP DMPTF UIJT CBDLEPPS CFDBVTF XF DBOU DPOEJUJPO PO UIF DPNQMFUF OH PO . EPFTOU IFMQ *G XF DBO NPEFM UIF NJTTJOHOFTT NFDIBOJTN 3# JO HFOFSBM UIFSF BSF OP HVBSBOUFFT IFSF ćJT TDFOBSJP JT TPNFUJNFT DBMMFE ĺĶŀŀĶĻĴ ĻļŁ ĮŁ ĿĮĻıļĺ ./"3  * LOPX‰UIFTF UFSNT BSF GPDVT PO JT UIBU ./"3 BSJTFT XIFO UIFSF JT OP TFU PG WBSJBCMFT UP DPOEJ CBDLEPPST UISPVHI 3#  -PUT PG EJČFSFOU HSBQIT DBO MFBE UP UIBU )FSF B B_obs K M R_B U1 U2 /PX JU JTOU UIF # WBMVFT UIFNTFMWFT UIBU QSPEVDF NJTTJOHOFTT 3BUIFS U WBSJBCMF 6 UIBU JOĘVFODFT CPUI # BOE NJTTJOHOFTT 6 DPVME CF GPS TJNJMBSJUZ UP IVNBOT )VNBOT IBWF BO VOSFBTPOBCMF BNPVOU PG OFPDPS QBZ BUUFOUJPO UP JU‰BOE PUIFS QSJNBUFT DMPTFMZ SFMBUFE UP VT BMTP UFOE UP *G UIPTF QSJNBUFT BSF TUVEJFE NPSF JOUFOTFMZ # WBMVFT XJMM CF NJTTJOH N
  44. Missing Not At Random • MNAR: K is unconditionally dependent

    on R_B • If you can model R_B, might be okay • No guarantees   .*44*/( %"5" "/% 05)&3 0110356/*5*&4 DBMMFE ĺĶŀŀĶĻĴ ĻļŁ ĮŁ ĿĮĻıļĺ ./"3  * LOPX‰UIFTF UFSNT GPDVT PO JT UIBU ./"3 BSJTFT XIFO UIFSF JT OP TFU PG WBSJBCMFT UP DPO CBDLEPPST UISPVHI 3#  -PUT PG EJČFSFOU HSBQIT DBO MFBE UP UIBU ) B B_obs K M R_B U1 U2 /PX JU JTOU UIF # WBMVFT UIFNTFMWFT UIBU QSPEVDF NJTTJOHOFTT 3BUI WBSJBCMF 6 UIBU JOĘVFODFT CPUI # BOE NJTTJOHOFTT 6 DPVME CF G TJNJMBSJUZ UP IVNBOT )VNBOT IBWF BO VOSFBTPOBCMF BNPVOU PG OFP QBZ BUUFOUJPO UP JU‰BOE PUIFS QSJNBUFT DMPTFMZ SFMBUFE UP VT BMTP UFOE *G UIPTF QSJNBUFT BSF TUVEJFE NPSF JOUFOTFMZ # WBMVFT XJMM CF NJTTJO
  45. MISSING COMPLETELY AT RANDOM MISSING AT RANDOM MISSING NOT AT

    RANDOM H* A D H H* A D H H* A D H DOG EATS ANY HOMEWORK DOG EATS STUDENTS’ HOMEWORK DOG EATS BAD HOMEWORK H: Homework H*: Homework with missing values A: Attribute of student D: Dog (missingness mechanism)
  46. Milk imputation • data(milk) • 12 missing values for neocortex

    • Suppose values are Missing At Random (MAR) • Distribution of observed values provides information • Can use to impute missing values • Same procedure for MCAR kcal.per.g mass neocortex.perc 1 0.49 1.95 55.16 2 0.51 2.09 NA 3 0.46 2.51 NA 4 0.48 1.62 NA 5 0.60 2.19 NA 6 0.47 5.25 64.54 7 0.56 5.37 64.54 8 0.89 2.51 67.64 9 0.91 0.71 NA 10 0.92 0.68 68.85 11 0.80 0.12 58.85 12 0.46 0.47 61.69 13 0.71 0.32 60.32 14 0.71 0.60 NA 15 0.73 3.47 NA 16 0.68 1.55 69.97 17 0.72 7.08 NA 18 0.97 3.24 70.41 19 0.79 7.94 NA 20 0.84 12.30 73.40 21 0.48 7.59 NA 22 0.62 5.37 67.53 23 0.51 10.72 NA 24 0.54 35.48 71.26 25 0.49 79.43 72.60 26 0.53 97.72 NA 27 0.48 40.74 70.24 28 0.55 33.11 76.30 29 0.71 54.95 75.49
  47. Milk energy MAR • Consider just neocortex variable: • Q:

    What is your best guess of each missing value? • A: Posterior distribution derived from remaining data neocortex.perc 1 55.16 2 NA 3 NA 4 NA 5 NA 6 64.54 7 64.54 8 67.64 9 NA 10 68.85 11 58.85 12 61.69 13 60.32 14 NA 15 NA 16 69.97 17 NA 18 70.41 19 NA 20 73.40 21 NA 22 67.53 23 NA 24 71.26 25 72.60 26 NA 27 70.24 28 76.30 29 75.49
  48. Milk energy MCAR • Place a unique parameter for each

    missing value • NC1 ... NC12 • These are values to be imputed neocortex.perc 1 55.16 2 NC1 3 NC2 4 NC3 5 NC4 6 64.54 7 64.54 8 67.64 9 NC5 10 68.85 11 58.85 12 61.69 13 60.32 14 NC6 15 NC7 16 69.97 17 NC8 18 70.41 19 NC9 20 73.40 21 NC10 22 67.53 23 NC11 24 71.26 25 72.60 26 NC12 27 70.24 28 76.30 29 75.49
  49. Milk energy MAR: model ćF PCTUBDMF JO QSBDUJDF JT UIBU

    XF IBWF UP DPODFJWF PG UIF QSFEJDUPS OPX BT B NJYFE WFDUP G EBUB BOE QBSBNFUFST *O PVS DBTF UIF WBSJBCMF XJUI NJTTJOH WBMVFT JT OFPDPSUFY QFSDFO "HBJO XFMM DBMM JU # GPS iCSBJOw # = [., #, #, #, ., ., ..., ., .] PS FWFSZ JOEFY J BU XIJDI UIFSF JT B NJTTJOH WBMVF UIFSF JT BMTP B QBSBNFUFS #J UIBU XJMM GPSN QPTUFSJPS EJTUSJCVUJPO GPS JU  .*44*/( %"5" ćJT JT UIF NPEFM XF OFFE XJUI UIF OFPDPSUFY QJFDFT JO CMVF ,J ∼ /PSNBM(µJ, σ) [distribution µJ = α + β# #J + β. MPH .J #J ∼ /PSNBM(ν, σ#) [distribution for α ∼ /PSNBM(, .) β# ∼ /PSNBM(, .) β. ∼ /PSNBM(, .) σ ∼ &YQPOFOUJBM() ν ∼ /PSNBM(., ) σ# ∼ &YQPOFOUJBM() /PUF UIBU XIFO #J JT PCTFSWFE UIFO UIF UIJSE MJOF BCPWF JT B MJLFMJIPPE KVTU MJLF B SFHSFTTJPO ćF NPEFM MFBSOT UIF EJTUSJCVUJPOT PG ν BOE σ# UIBU BSF DPOTJTUFOU X
  50. Milk energy MAR: model ćF PCTUBDMF JO QSBDUJDF JT UIBU

    XF IBWF UP DPODFJWF PG UIF QSFEJDUPS OPX BT B NJYFE WFDUP G EBUB BOE QBSBNFUFST *O PVS DBTF UIF WBSJBCMF XJUI NJTTJOH WBMVFT JT OFPDPSUFY QFSDFO "HBJO XFMM DBMM JU # GPS iCSBJOw # = [., #, #, #, ., ., ..., ., .] PS FWFSZ JOEFY J BU XIJDI UIFSF JT B NJTTJOH WBMVF UIFSF JT BMTP B QBSBNFUFS #J UIBU XJMM GPSN QPTUFSJPS EJTUSJCVUJPO GPS JU  .*44*/( %"5" ćJT JT UIF NPEFM XF OFFE XJUI UIF OFPDPSUFY QJFDFT JO CMVF ,J ∼ /PSNBM(µJ, σ) [distribution µJ = α + β# #J + β. MPH .J #J ∼ /PSNBM(ν, σ#) [distribution for α ∼ /PSNBM(, .) β# ∼ /PSNBM(, .) β. ∼ /PSNBM(, .) σ ∼ &YQPOFOUJBM() ν ∼ /PSNBM(., ) σ# ∼ &YQPOFOUJBM() /PUF UIBU XIFO #J JT PCTFSWFE UIFO UIF UIJSE MJOF BCPWF JT B MJLFMJIPPE KVTU MJLF B SFHSFTTJPO ćF NPEFM MFBSOT UIF EJTUSJCVUJPOT PG ν BOE σ# UIBU BSF DPOTJTUFOU X linear model using mix of observed and imputed values
  51. Milk energy MAR: model ćF PCTUBDMF JO QSBDUJDF JT UIBU

    XF IBWF UP DPODFJWF PG UIF QSFEJDUPS OPX BT B NJYFE WFDUP G EBUB BOE QBSBNFUFST *O PVS DBTF UIF WBSJBCMF XJUI NJTTJOH WBMVFT JT OFPDPSUFY QFSDFO "HBJO XFMM DBMM JU # GPS iCSBJOw # = [., #, #, #, ., ., ..., ., .] PS FWFSZ JOEFY J BU XIJDI UIFSF JT B NJTTJOH WBMVF UIFSF JT BMTP B QBSBNFUFS #J UIBU XJMM GPSN QPTUFSJPS EJTUSJCVUJPO GPS JU  .*44*/( %"5" ćJT JT UIF NPEFM XF OFFE XJUI UIF OFPDPSUFY QJFDFT JO CMVF ,J ∼ /PSNBM(µJ, σ) [distribution µJ = α + β# #J + β. MPH .J #J ∼ /PSNBM(ν, σ#) [distribution for α ∼ /PSNBM(, .) β# ∼ /PSNBM(, .) β. ∼ /PSNBM(, .) σ ∼ &YQPOFOUJBM() ν ∼ /PSNBM(., ) σ# ∼ &YQPOFOUJBM() /PUF UIBU XIFO #J JT PCTFSWFE UIFO UIF UIJSE MJOF BCPWF JT B MJLFMJIPPE KVTU MJLF B SFHSFTTJPO ćF NPEFM MFBSOT UIF EJTUSJCVUJPOT PG ν BOE σ# UIBU BSF DPOTJTUFOU X when obs, likelihood; when imputed, prior mean neocortex (to be estimated) std dev of neocortex (to be estimated)
  52. Fitting m15.3 <- ulam( alist( K ~ dnorm( mu ,

    sigma ), mu <- a + bB*B + bM*M, B ~ dnorm( nu , sigma_B ), c(a,nu) ~ dnorm( 0 , 0.5 ), c(bB,bM) ~ dnorm( 0, 0.5 ), sigma_B ~ dexp( 1 ), sigma ~ dexp( 1 ) ) , data=dat_list , chains=4 , cores=4 ) ulam detects NA values and tries to cope. More explicit example in text.
  53. 8IFO ZPV TUBSU UIF NPEFM JU XJMM OPUJGZ ZPV UIBU

    JU GPVOE   WBMVFT BOE JT USZ UIFN 0ODF JU ĕOJTIFT UBLF B MPPL BU UIF QPTUFSJPS TVNNBSZ 3 DPEF  +- $.ΰ (ͤ͠΀͢ ΁  +/#Ѵ͡ α ( ) . ͤ΀ͤњ ͨͣ΀ͤњ )ά !! #/ )0 Ζ͟΀ͣ͟ ͟΀͟͡ Ζ͟΀ͥ͢ ͟΀ͧ͡ ͧͥ͟͠ ͠  ͟΀͟͡ ͟΀ͦ͠ Ζ͟΀ͤ͡ ͟΀ͧ͡ ͧ͟͟͡ ͠  Ζ͟΀ͤͤ ͟΀͟͡ Ζ͟΀ͧͦ Ζ͟΀͡͡ ͧͣ͟͠ ͠  ͟΀ͤ͟ ͟΀͢͡ ͟΀͠͡ ͟΀ͧͥ ͧͧ͠ ͠ .$"(ά ͠΀͟͠ ͟΀ͦ͠ ͟΀ͦͦ ͠΀͢͠ ͣͤ͠͡ ͠ .$"( ͟΀ͧͣ ͟΀ͣ͠ ͟΀ͥͣ ͠΀ͦ͟ ͟͟͠͠ ͠ ά$(+0/ β͠γ Ζ͟΀ͤͦ ͟΀ͧͦ Ζ͠΀ͨ͢ ͟΀ͧ͟ ͣͧ͟͡ ͠ ά$(+0/ β͡γ Ζ͟΀ͦ͟ ͟΀ͨ͡ Ζ͡΀ͣ͠ ͟΀ͦͤ ͨͣ͟͠ ͠ ά$(+0/ β͢γ Ζ͟΀ͦ͡ ͟΀ͨͥ Ζ͡΀ͣ͡ ͟΀ͧ͠ ͧ͢͠͡ ͠ ά$(+0/ βͣγ Ζ͟΀͢͠ ͟΀ͧͦ Ζ͠΀ͥͤ ͠΀͟͠ ͣͤ͢͡ ͠ ά$(+0/ βͤγ ͟΀ͣͤ ͟΀ͧͧ Ζ͟΀ͧͧ ͠΀ͧ͢ ͣͥ͡͠ ͠ ά$(+0/ βͥγ Ζ͟΀ͨ͠ ͟΀ͧͨ Ζ͠΀ͤͥ ͠΀͢͡ ͧͦ͡͠ ͠ ά$(+0/ βͦγ ͟΀͡͡ ͟΀ͧͦ Ζ͠΀ͥ͠ ͠΀ͤͨ ͣͧ͡͡ ͠ ά$(+0/ βͧγ ͟΀ͧ͡ ͟΀ͧͤ Ζ͠΀͟͠ ͠΀ͥ͡ ͦ͡͡͠ ͠ ά$(+0/ βͨγ ͟΀ͤ͠ ͟΀ͧͦ Ζ͟΀ͨ͠ ͠΀ͧͧ ͧͦͦ͡ ͠ ά$(+0/ β͟͠γ Ζ͟΀ͣͤ ͟΀ͧͨ Ζ͠΀ͧͣ ͟΀ͨͣ ͥͥͧ͡ ͠ ά$(+0/ β͠͠γ Ζ͟΀ͧ͡ ͟΀ͧͧ Ζ͠΀ͥͨ ͠΀ͥ͠ ͣͣ͟͡ ͠ ά$(+0/ β͠͡γ ͟΀ͤ͠ ͟΀ͨ͠ Ζ͠΀͢͢ ͠΀ͥ͟ ͣͧ͢͡ ͠ &BDI PG UIF  JNQVUFE EJTUSJCVUJPOT GPS NJTTJOH WBMVFT JT TIPXO IFSF BMPOH OBSZ SFHSFTTJPO QBSBNFUFST BCPWF UIFN 5P TFF IPX JODMVEJOH BMM DBTFT IBT JN
  54. Compared to complete-cases ͟΀ͧͧ ͟΀ͨ͠ ͟΀ͥͣ ͠΀͟͡ ͣͤ͠͠ ͠ JOH

    UIJT QPTUFSJPS UP UIF QSFWJPVT XJMM CF FBTJFS XJUI B QMPU * !/ΰ(ͤ͠΀͢΁(ͤ͠΀ͣα ΁ +-.ѴΰΊΊ΁ΊΊα α m15.3 m15.4 m15.3 m15.4 bB bM -1.0 -0.5 0.0 0.5 1.0 Value EFM UIBU JNQVUFT UIF NJTTJOH WBMVFT (ͤ͠΀͢ IBT OBSSPXFS NBSHJOBM EJTUSJCVUJPOT FDUT )PX DPVME UIJT IBQQFO 8F VTFE NPSF JOGPSNBUJPO UIF WBMVFT PG CPEZ N OPU NJTTJOH CVU BSF EJTDBSEFE CZ (ͤ͠΀ͣ ćFTF WBMVFT TVHHFTU B TMJHIUMZ TNBM F PG CPEZ NBTT  BOE UIJT BMTP DBTDBEFT JOUP  EP TPNF QMPUUJOH UP WJTVBMJ[F XIBUT IBQQFOFE IFSF m15.3: full sample (with imputation) m15.4: complete-cases only
  55.   .*44*/( %"5" "/% 05)&3 0110356/*5*&4 -2.0 -1.5 -1.0

    -0.5 0.0 0.5 1.0 1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 neocortex percent (std) kcal milk (std) -2 -1 0 1 2 -2.0 -1.0 0.0 0.5 1.0 1.5 log body mass (std) neocortex percent (std) 'ĶĴłĿIJ Ɖƍƌ -Fę *OGFSSFE EJTUSJCVUJPO PG NJML FOFSHZ WFSUJDBM BOE OFP DPSUFY QSPQPSUJPO IPSJ[POUBM XJUI JNQVUFE WBMVFT TIPXO CZ PQFO QPJOUT ćF MJOF TFHNFOUT BSF  QPTUFSJPS DPNQBUJCJMJUZ JOUFSWBMT 3JHIU *O GFSSFE EJTUSJCVUJPO CFUXFFO UIF UXP QSFEJDUPST OFPDPSUFY QSPQPSUJPO BOE MPH NBTT *NQVUFE WBMVFT BHBJO TIPXO CZ PQFO QPJOUT Imputed values track regression trend
  56.   .*44*/( %"5" "/% 05)&3 0110356/*5*&4 -2.0 -1.5 -1.0

    -0.5 0.0 0.5 1.0 1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 neocortex percent (std) kcal milk (std) -2 -1 0 1 2 -2.0 -1.0 0.0 0.5 1.0 1.5 log body mass (std) neocortex percent (std) 'ĶĴłĿIJ Ɖƍƌ -Fę *OGFSSFE EJTUSJCVUJPO PG NJML FOFSHZ WFSUJDBM BOE OFP DPSUFY QSPQPSUJPO IPSJ[POUBM XJUI JNQVUFE WBMVFT TIPXO CZ PQFO QPJOUT ćF MJOF TFHNFOUT BSF  QPTUFSJPS DPNQBUJCJMJUZ JOUFSWBMT 3JHIU *O GFSSFE EJTUSJCVUJPO CFUXFFO UIF UXP QSFEJDUPST OFPDPSUFY QSPQPSUJPO BOE MPH NBTT *NQVUFE WBMVFT BHBJO TIPXO CZ PQFO QPJOUT Imputed values do not track other predictor!
  57. Results • Observed neocortex positively associated with observed body mass

    • Imputed neocortex NOT associated with observed body mass • Can do better • Imputation model can use full causal model -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 neocortex percent (std) kcal milk (std) -2 -1 0 1 2 -2.0 -1.0 0.0 0.5 1.0 1.5 log body mass (std) neocortex percent (std) 'ĶĴłĿIJ Ɖƍƌ -Fę *OGFSSFE EJTUSJCVUJPO PG NJML FOFSHZ WFSUJDBM BOE OFP DPSUFY QSPQPSUJPO IPSJ[POUBM XJUI JNQVUFE WBMVFT TIPXO CZ PQFO QPJOUT ćF MJOF TFHNFOUT BSF  QPTUFSJPS DPNQBUJCJMJUZ JOUFSWBMT 3JHIU *O GFSSFE EJTUSJCVUJPO CFUXFFO UIF UXP QSFEJDUPST OFPDPSUFY QSPQPSUJPO BOE MPH NBTT *NQVUFE WBMVFT BHBJO TIPXO CZ PQFO QPJOUT +'*/ΰ /ά'$./с ΁ /ά'$./с ΁ +#Ѵͥ͠ ΁ *'Ѵ-)"$͡ ΁ 4'ѴΊ) **-/ 3 + - )/ ΰ./αΊ ΁ 3'ѴΊ'*" *4 (.. ΰ./αΊ α $ ѶΖ /ά'$./сβ($..ά$3γ  .*44*/( %"5" "OPUIFS QPTTJCJMJUZ JT UIBU TPNF PUIFS WBSJBCMF JOĘVFODFT UIF N B B_obs K M R_B U
  58. Full Flavor Imputation  .*44*/( %"5"  JUZ JT UIBU

    TPNF PUIFS WBSJBCMF JOĘVFODFT UIF NJTTJOHOFTT QSPDFTT B B_obs K M R_B U # XIJDI NFBOT GPS FYBNQMF UIBU TQFDJFT XJUI TNBMMFS CPEJFT BSF NPSF F NJTTJOH WBMVFT JO #ļįŀ  ćJT DPVME IBQQFO JG SFTFBSDIFST BSF MFTT JOUFS T BOE TP EP OPU PęFO HP UISPVHI UIF USPVCMF PG NBLJOH EFUBJMFE CSBJO FN 8IBU IBQQFOT JO UIJT DBTF ćFSF JT OPX B CBDLEPPS QBUI GSPN 4P UIF NJTTJOHOFTT QSPDFTT DBO DPOGPVOE PVS JOGFSFODF VOMFTT XF DBO  .*44*/( %"5" ćJT JT UIF NPEFM XF OFFE XJUI UIF OFPDPSUFY QJFDFT JO ,J ∼ /PSNBM(µJ, σ) µJ = α + β# #J + β. MPH .J #J ∼ /PSNBM(ν, σ#) α ∼ /PSNBM(, .) β# ∼ /PSNBM(, .) β. ∼ /PSNBM(, .) σ ∼ &YQPOFOUJBM() ν ∼ /PSNBM(., ) σ# ∼ &YQPOFOUJBM() /PUF UIBU XIFO #J JT PCTFSWFE UIFO UIF UIJSE MJOF BCPWF JT B M SFHSFTTJPO ćF NPEFM MFBSOT UIF EJTUSJCVUJPOT PG ν BOE σ# U K #PUI CPEZ NBTT . BOE OFPDPSUFY # JOĘVFODF NJML FOFSHZ , "OE XJUI POF BOPUIFS UIPVHI TPNF VOLOPXO NFDIBOJTN 6 ćJT NFBO NJTTJOH WBMVFT GPS # XF NJHIU EP B CFUUFS KPC JG XF VTF LOPXMFEHF PG U 6 4P MFUT CVJME B NPEFM OPX UIBU CFUUFS NBUDIFT UIF %"( BCPWF ćF OPUJPO JT UP DIBOHF UIF JNQVUBUJPO MJOF PG UIF NPEFM GSPN UI #J ∼ /PSNBM(ν, σ#) UP B CJWBSJBUF OPSNBM UIBU JODMVEFT CPUI . BOE # (.J, #J) ∼ .7/PSNBM((µ., µ#), 4) ćF 4 NBUSJY JT BOPUIFS DPWBSJBODF NBUSJY BOE JU XJMM NFBTVSF UIF BOE # VTJOH UIF PCTFSWFE DBTFT BOE UIFO VTF UIBU DPSSFMBUJPO UP JOGF )FSFT UIF 0'( JNQMFNFOUBUJPO ćJT JT DPNQMFY DPEF CFDBVT B WBSJBCMF UIBU JODMVEFT CPUI UIF PCTFSWFE . WBMVFT BOE UIF NFSHF JNQVUFE # WBMVFT *MM BMTP EP UIF NFSHJOH NPSF FYQMJDJUMZ *O UIF 0 FOE *MM XBML UISPVHI IPX UIF 4UBO DPEF XPSLT (ͤ͠΀ͤ ѶΖ 0'(ΰ '$./ΰ
  59. m15.5 <- ulam( alist( # K as function of B

    and M K ~ dnorm( mu , sigma ), mu <- a + bB*B_merge + bM*M, # M and B correlation MB ~ multi_normal( c(muM,muB) , Rho_BM , Sigma_BM ), matrix[29,2]:MB <<- append_col( M , B_merge ), # define B_merge as mix of observed and imputed values vector[29]:B_merge <- merge_missing( B , B_impute ), # priors c(a,muB,muM) ~ dnorm( 0 , 0.5 ), c(bB,bM) ~ dnorm( 0, 0.5 ), sigma ~ dexp( 1 ), Rho_BM ~ lkj_corr(2), Sigma_BM ~ exponential(1) ) , data=dat_list , chains=4 , cores=4 ) Full Flavor Imputation "OPUIFS QPTTJCJMJUZ JT UIBU TPNF PUIFS WBSJBCMF JOĘVFODFT UIF B B_obs K M R_B U /PX . JOĘVFODFT 3# XIJDI NFBOT GPS FYBNQMF UIBU TQFDJFT XJUI PS MFTT MJLFMZ UP IBWF NJTTJOH WBMVFT JO #ļįŀ  ćJT DPVME IBQQFO JG FTUFE JO TNBMM TQFDJFT BOE TP EP OPU PęFO HP UISPVHI UIF USPVCMF NFBTVSFNFOUT GPS UIFN 8IBU IBQQFOT JO UIJT DBTF ćFSF JT OP #ļįŀ UIPVHI 3# UP , 4P UIF NJTTJOHOFTT QSPDFTT DBO DPOGPVOE PV DMPTF UIF CBDLEPPS *O UIJT DBTF XF DBO TIVU UIF CBDLEPPS CZ DPOE IBWF EPOF UIJT BOZXBZ CFDBVTF XF XBOU UP UIF EJSFDU JOĘVFODF PG JOHOFTT JT LOPXO CZ BOPUIFS VOGPSUVOBUFMZ BXLXBSE OBNF ĺĶŀŀĶĻ 8F EPOU OFFE UP EJTDPWFS UIF NJTTJOHOFTT QSPDFTT BCPWF #VU U OFFE UP EP 8F OFFE UP JNQVUF NJTTJOH WBMVFT GPS #ļįŀ  8IZ *G X TQFDJFT XJUI BOZ NJTTJOH WBMVFT UIFO XF BSF QPMMVUJOH UIF PUIFS W QSPDFTT ćJT EJEOU IBQQFO JO UIF QSFWJPVT .$"3 FYBNQMF CF EJEOU IBWF BOZ BTTPDJBUJPO XJUI UIF PUIFS WBSJBCMFT $BTF EFMFUJPO
  60. % 05)&3 0110356/*5*&4 -2 -1 0 1 2 -2.0 -1.0

    0.0 0.5 1.0 1.5 log body mass (std) neocortex percent (std) T TIPXO JO 'ĶĴłĿIJ Ɖƍƌ CVU OPX GPS UIF IF BTTPDJBUJPO CFUXFFO UIF QSFEJDUPST CFUXFFO QSFEJDUPST IBT CFFO VTFE UP JO NJML FOFSHZ BOE UIF JNQVUFE WBMVFT neocortex percent (std) 'ĶĴłĿIJ Ɖƍƍ 4BNF SFMBUJPOTIJQT B JNQVUBUJPO NPEFM UIBU FTUJNBUFT ćF JOGPSNBUJPO JO UIF BTTPDJBUJPO GFS B TUSPOHFS SFMBUJPOTIJQ CFUXFFO ( ) . ͤ΀ͤњ ͨͣ΀ͤњ )ά  Ζ͟΀ͥͤ ͟΀͡͡ Ζ͠΀͟͟ Ζ͟΀͢͟ ͠  ͟΀ͤͧ ͟΀ͥ͡ ͟΀ͥ͠ ͟΀ͨͨ ͠ #*άβ͠΁͠γ ͠΀͟͟ ͟΀͟͟ ͠΀͟͟ ͠΀͟͟ #*άβ͠΁͡γ ͟΀ͥ͟ ͟΀͢͠ ͟΀ͦ͢ ͟΀ͦͧ ͠ #*άβ͡΁͠γ ͟΀ͥ͟ ͟΀͢͠ ͟΀ͦ͢ ͟΀ͦͧ ͠ #*άβ͡΁͡γ ͠΀͟͟ ͟΀͟͟ ͠΀͟͟ ͠΀͟͟ ͠ ćF TMPQFT  BOE  IBWFOU DIBOHFE NVDI 8FSF JOUFSFTUFE JO UIBU DPSSFMBUJPO BOE IP UFSJPS DPSSFMBUJPO JT RVJUF TUSPOH  PO BWF CFUXFFO . BOE # UIBU XF BMSFBEZ LOFX FYJ 8IBU EPFT UIJT DPSSFMBUJPO EP UP UIF J DPEF BT CFGPSF 'ĶĴłĿIJ Ɖƍƍ EJTQMBZT UIF TB JNQVUBUJPO NPEFM 0O UIF SJHIU ZPV DBO QSFTFSWF UIF QPTJUJWF BTTPDJBUJPO CFUXFFO O UIJT EPFTOU NBLF B CJH EJČFSFODF JO UIF JOG
  61. % 05)&3 0110356/*5*&4 -2 -1 0 1 2 -2.0 -1.0

    0.0 0.5 1.0 1.5 log body mass (std) neocortex percent (std) TIPXO JO 'ĶĴłĿIJ Ɖƍƌ CVU OPX GPS UIF IF BTTPDJBUJPO CFUXFFO UIF QSFEJDUPST CFUXFFO QSFEJDUPST IBT CFFO VTFE UP JO NJML FOFSHZ BOE UIF JNQVUFE WBMVFT   .*44*/( %"5" "/% 05)&3 0110356/*5*&4 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 neocortex percent (std) kcal milk (std) -2 -1 0 1 2 -2.0 -1.0 0.0 0.5 1.0 1.5 log body mass (std) neocortex percent (std) 'ĶĴłĿIJ Ɖƍƌ -Fę *OGFSSFE EJTUSJCVUJPO PG NJML FOFSHZ WFSUJDBM BOE OFP DPSUFY QSPQPSUJPO IPSJ[POUBM XJUI JNQVUFE WBMVFT TIPXO CZ PQFO QPJOUT ćF MJOF TFHNFOUT BSF  QPTUFSJPS DPNQBUJCJMJUZ JOUFSWBMT 3JHIU *O GFSSFE EJTUSJCVUJPO CFUXFFO UIF UXP QSFEJDUPST OFPDPSUFY QSPQPSUJPO BOE MVNormal Normal
  62. Missing data • Can also impute discrete values, but need

    another technique (see text) • Extends to many model types: • Mark-recapture, occupancy (presence/absence) • Latent-state models (hidden Markov models)
  63. Final Homework • A little imputation practice • Finish for

    complete sense of accomplishment EJČFSFODFT .BLF XIBUFWFS BEEJUJPOBM DBMDVMBUJPOT QVSTVJU PG BO BOTXFS #PT QSJNJHFOJVT
  64. The Golem of Prague “Even the most perfect of Golem,

    risen to life to protect us, can easily change into a destructive force. Therefore let us treat carefully that which is strong, just as we bow kindly and patiently to that which is weak.” Rabbi Judah Loew ben Bezalel (1512–1609) From Breath of Bones: A Tale of the Golem