Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Statistical Rethinking - Lecture 20

Statistical Rethinking - Lecture 20

Lecture 20 - Measurement error, missing data imputation - Statistical Rethinking: A Bayesian Course with R Examples

Richard McElreath

March 12, 2015
Tweet

More Decks by Richard McElreath

Other Decks in Education

Transcript

  1. Error on predictor • What about error on predictor? •

    Many procedures invented • errors-in-variables • reduced major axis • total least squares • Our approach will be logical • State information • Deduce implications 0 1 2 3 15 20 25 30 log population Marriage rate
  2. Error on predictor: model  &SSPS PO CPUI PVUDPNF BOE

    QSFEJDUPS 8IBU IBQQFOT XIFO UIF FSSPS PO QSFEJDUPS WBSJBCMFT BT XFMM ćF BQQSPBDI JT UIF TBNF XF EFĕOF FUFST POF GPS FBDI PCTFSWFE WBMVF BOE UIFO NBLF UIPTF QBSBNFUFST UIF N (BVTTJBO EJTUSJCVUJPOT XJUI LOPXO TUBOEBSE EFWJBUJPOT *O UIF EJWPSDF EBUB UIF NBSSJBHF SBUF QSFEJDUPS WBMVFT BMTP DPNF XJ 4P MFUT JODPSQPSBUF UIBU JOGPSNBUJPO BT XFMM )FSFT UIF OFX NPEFM %IJŀŁ,J ∼ /PSNBM(µJ, σ) >OLNHOLKRRG µJ = α + β" "J + β3 3IJŀŁ,J >OLQHDUPRGHOXVL %ļįŀ,J ∼ /PSNBM(%IJŀŁ,J, %ŀIJ,J) >SULRU 3ļįŀ,J ∼ /PSNBM(3IJŀŁ,J, 3ŀIJ,J) >SULRU α ∼ /PSNBM(, ) β" ∼ /PSNBM(, ) β3 ∼ /PSNBM(, ) σ ∼ $BVDIZ(, .) ćF 3IJŀŁ QBSBNFUFST XJMM IPME UIF QPTUFSJPS EJTUSJCVUJPOT PG UIF USVF N ĕUUJOH UIF NPEFM JT NVDI MJLF CFGPSF '$./ ʚǶ '$./ǿ $1Ǿ*.ʙɶ$1*- Ǣ
  3.  &SSPS PO CPUI PVUDPNF BOE QSFEJDUPS 8IBU IBQQFOT XIFO

    UIF FSSPS PO QSFEJDUPS WBSJBCMFT BT XFMM ćF BQQSPBDI JT UIF TBNF XF EFĕOF FUFST POF GPS FBDI PCTFSWFE WBMVF BOE UIFO NBLF UIPTF QBSBNFUFST UIF N (BVTTJBO EJTUSJCVUJPOT XJUI LOPXO TUBOEBSE EFWJBUJPOT *O UIF EJWPSDF EBUB UIF NBSSJBHF SBUF QSFEJDUPS WBMVFT BMTP DPNF XJ 4P MFUT JODPSQPSBUF UIBU JOGPSNBUJPO BT XFMM )FSFT UIF OFX NPEFM %IJŀŁ,J ∼ /PSNBM(µJ, σ) >OLNHOLKRRG µJ = α + β" "J + β3 3IJŀŁ,J >OLQHDUPRGHOXVL %ļįŀ,J ∼ /PSNBM(%IJŀŁ,J, %ŀIJ,J) >SULRU 3ļįŀ,J ∼ /PSNBM(3IJŀŁ,J, 3ŀIJ,J) >SULRU α ∼ /PSNBM(, ) β" ∼ /PSNBM(, ) β3 ∼ /PSNBM(, ) σ ∼ $BVDIZ(, .) ćF 3IJŀŁ QBSBNFUFST XJMM IPME UIF QPTUFSJPS EJTUSJCVUJPOT PG UIF USVF N ĕUUJOH UIF NPEFM JT NVDI MJLF CFGPSF '$./ ʚǶ '$./ǿ $1Ǿ*.ʙɶ$1*- Ǣ Error on predictor: model likelihood for each observed rate estimated marriage rate standard error of marriage rate use estimates in regression
  4. filled circles: observed open circles: estimated lines connect points for

    same State  .*44*/( %"5" "/% 05)&3 0110356/*5*&4 2.5 3.5 standard error 15 20 25 30 4 6 8 10 12 14 Marriage rate (posterior) Divorce rate (posterior) ę 4ISJOLBHF GPS UIF QSFEJDUPS WBSJBCMF NBSSJBHF SBUF /P
  5. Error on predictor • Both divorce rate and marriage rate

    shrink • Divorce shrinks much more. Why? • Marriage rate not strongly associated with outcome => not much pooling through regression => not much shrinkage 0.5 1.5 2.5 3.5 -1.0 -0.5 0.0 Marriage rate standard error Marriage rate estimated - observed 15 20 25 30 4 6 8 10 12 14 Marriage rate (posterior) Divorce rate (posterior) 'ĶĴłĿIJ ƉƌƋ -Fę 4ISJOLBHF GPS UIF QSFEJDUPS WBSJBCMF NBSSJBHF SBUF /P UJDF UIBU TISJOLBHF JT OPU CBMBODFE CVU SBUIFS UIBU UIF NPEFM CFMJFWFT UIF PCTFSWFE WBMVFT UFOEFE UP CF PWFSFTUJNBUFT 3JHIU 4ISJOLBHF PG CPUI UIF PVUDPNF EJWPSDF SBUF BOE NBSSJBHF SBUF 4PMJE QPJOUT BSF UIF PCTFSWFE WBM VFT 0QFO QPJOUT BSF QPTUFSJPS NFBOT -JOFT DPOOFDU QBJST PG QPJOUT GPS UIF TBNF 4UBUF "MTP OPUF UIBU TJODF UIFSF JTOU NVDI BTTPDJBUJPO CFUXFFO EJWPSDF BOE NBSSJBHF SBUF UIFSF JT MFTT NPWFNFOU PG UIF NBSSJBHF SBUF FTUJNBUFT ćBU JT UP TBZ UIBU UIFSF JTOU NVD JOGPSNBUJPO JO EJWPSDF SBUF UP IFMQ VT JNQSPWF FTUJNBUFT PG NBSSJBHF SBUF *O DPOUSBTU TJO UIF SFMBUJPOTIJQ CFUXFFO EJWPSDF BOE NFEJBO BHF BU NBSSJBHF JT TUSPOH UIFSFT B MPU PG JOGP NBUJPO JO BHF BU NBSSJBHF UP IFMQ VT JNQSPWF FTUJNBUFT PG EJWPSDF SBUF ćBUT XIZ EJWPS   .*44*/( %"5" "/% 0.5 1.5 2.5 3.5 -1.0 -0.5 0.0 Marriage rate standard error Marriage rate estimated - observed
  6. Measurement error • Common malady: “data” come from uncertain procedure,

    but uncertainty discarded at analysis • Examples: • Predicting with averages; use posterior of average • DNA sequence data: respect error rate • Parentage analysis: probability distribution over possible parents • Phylogenetics: distribution of trees • Archaeology/paleontology/forensics: identification, sexing, aging, dating • Propagate uncertainty
  7. Missing data • Missing values commonplace • Usual approach: complete-case

    analysis • drop all cases with any missing values • Discards a lot of information • Alternatives • replace missing with mean of column: NEVER DO THIS • Multiple imputation • Bayesian imputation • others
  8. Milk energy again • data(milk) • 12 missing values for

    neocortex • Suppose values are Missing Completely At Random (MCAR) • MCAR: NAs sprinkled randomly • Distribution of observed values provides information • Can use to impute missing values • Must model the predictor kcal.per.g mass neocortex.perc 1 0.49 1.95 55.16 2 0.51 2.09 NA 3 0.46 2.51 NA 4 0.48 1.62 NA 5 0.60 2.19 NA 6 0.47 5.25 64.54 7 0.56 5.37 64.54 8 0.89 2.51 67.64 9 0.91 0.71 NA 10 0.92 0.68 68.85 11 0.80 0.12 58.85 12 0.46 0.47 61.69 13 0.71 0.32 60.32 14 0.71 0.60 NA 15 0.73 3.47 NA 16 0.68 1.55 69.97 17 0.72 7.08 NA 18 0.97 3.24 70.41 19 0.79 7.94 NA 20 0.84 12.30 73.40 21 0.48 7.59 NA 22 0.62 5.37 67.53 23 0.51 10.72 NA 24 0.54 35.48 71.26 25 0.49 79.43 72.60 26 0.53 97.72 NA 27 0.48 40.74 70.24 28 0.55 33.11 76.30 29 0.71 54.95 75.49
  9. Milk energy MCAR • Suppose your undergrad assistant lost those

    neocortex values • Consider just neocortex variable: • Q: What is your best guess of each missing value? • A: Posterior distribution derived from remaining data neocortex.perc 1 55.16 2 NA 3 NA 4 NA 5 NA 6 64.54 7 64.54 8 67.64 9 NA 10 68.85 11 58.85 12 61.69 13 60.32 14 NA 15 NA 16 69.97 17 NA 18 70.41 19 NA 20 73.40 21 NA 22 67.53 23 NA 24 71.26 25 72.60 26 NA 27 70.24 28 76.30 29 75.49
  10. Milk energy MCAR • Place a unique parameter for each

    missing value • NC1 ... NC12 • These are values to be imputed neocortex.perc 1 55.16 2 NC1 3 NC2 4 NC3 5 NC4 6 64.54 7 64.54 8 67.64 9 NC5 10 68.85 11 58.85 12 61.69 13 60.32 14 NC6 15 NC7 16 69.97 17 NC8 18 70.41 19 NC9 20 73.40 21 NC10 22 67.53 23 NC11 24 71.26 25 72.60 26 NC12 27 70.24 28 76.30 29 75.49
  11. Milk energy MCAR: model   .*44*/( %"5" "/% 05)&3

    0110356/*5*&4 NJTTJOH WBMVFT JT OFPDPSUFY QFSDFOU $BMM JU / / = [., /, /, /, ., ., ..., ., .]. &WFSZ JOEFY J BU XIJDI UIFSF JT B NJTTJOH WBMVF UIFSF JT BMTP B QBSBNFUFS /J UIBU XJMM GPSN BO FTUJNBUF GPS JU ćJT JT UIF NPEFM LJ ∼ /PSNBM(µJ, σ) >OLNHOLKRRGIRURXWFRPH L@ µJ = α + β/ /J + β. MPH .J >OLQHDUPRGHO@ /J ∼ /PSNBM(ν, σ/) >OLNHOLKRRGSULRUIRUREVPLVVLQJ /@ α ∼ /PSNBM(, ) β/ ∼ /PSNBM(, ) β. ∼ /PSNBM(, ) σ ∼ $BVDIZ(, ) ν ∼ /PSNBM(., ) σ/ ∼ $BVDIZ(, ) /PUF UIBU XIFO /J JT PCTFSWFE UIFO UIF UIJSE MJOF BCPWF JT B MJLFMJIPPE KVTU MJLF BOZ PME MJOFBS SFHSFTTJPO #VU XIFO /J JT NJTTJOH BOE UIFSFGPSF B QBSBNFUFS UIBU TBNF MJOF JT JO ćF PCTUBDMF JO QSBDUJDF JT UIBU XF IBWF UP DPODFJWF PG UIF QSFEJDUPS OPX BT B N PG EBUB BOE QBSBNFUFST *O PVS DBTF UIF WBSJBCMF XJUI NJTTJOH WBMVFT JT OFPDPSU $BMM JU / / = [., /, /, /, ., ., ..., ., .]. &WFSZ JOEFY J BU XIJDI UIFSF JT B NJTTJOH WBMVF UIFSF JT BMTP B QBSBNFUFS /J UIBU QPTUFSJPS EJTUSJCVUJPO GPS JU ćJT JT UIF NPEFM XF OFFE LJ ∼ /PSNBM(µJ, σ) >OLNHOLKRRGIR µJ = α + β/ /J + β. MPH .J > /J ∼ /PSNBM(ν, σ/) >OLNHOLKRRGSULRUIRURE α ∼ /PSNBM(, ) β/ ∼ /PSNBM(, ) β. ∼ /PSNBM(, ) σ ∼ $BVDIZ(, ) ν ∼ /PSNBM(., ) σ/ ∼ $BVDIZ(, ) /PUF UIBU XIFO /J JT PCTFSWFE UIFO UIF UIJSE MJOF BCPWF JT B MJLFMJIPPE KVTU MJ MJOFBS SFHSFTTJPO ćF NPEFM MFBSOT UIF EJTUSJCVUJPOT PG ν BOE σ/ UIBU BSF DPOTJTUF
  12. ćF PCTUBDMF JO QSBDUJDF JT UIBU XF IBWF UP DPODFJWF

    PG UIF QSFEJDUPS OPX BT B N PG EBUB BOE QBSBNFUFST *O PVS DBTF UIF WBSJBCMF XJUI NJTTJOH WBMVFT JT OFPDPSU $BMM JU / / = [., /, /, /, ., ., ..., ., .]. &WFSZ JOEFY J BU XIJDI UIFSF JT B NJTTJOH WBMVF UIFSF JT BMTP B QBSBNFUFS /J UIBU QPTUFSJPS EJTUSJCVUJPO GPS JU ćJT JT UIF NPEFM XF OFFE LJ ∼ /PSNBM(µJ, σ) >OLNHOLKRRGIR µJ = α + β/ /J + β. MPH .J > /J ∼ /PSNBM(ν, σ/) >OLNHOLKRRGSULRUIRURE α ∼ /PSNBM(, ) β/ ∼ /PSNBM(, ) β. ∼ /PSNBM(, ) σ ∼ $BVDIZ(, ) ν ∼ /PSNBM(., ) σ/ ∼ $BVDIZ(, ) /PUF UIBU XIFO /J JT PCTFSWFE UIFO UIF UIJSE MJOF BCPWF JT B MJLFMJIPPE KVTU MJ MJOFBS SFHSFTTJPO ćF NPEFM MFBSOT UIF EJTUSJCVUJPOT PG ν BOE σ/ UIBU BSF DPOTJTUF Milk energy MCAR: model   .*44*/( %"5" "/% 05)&3 0110356/*5*&4 NJTTJOH WBMVFT JT OFPDPSUFY QFSDFOU $BMM JU / / = [., /, /, /, ., ., ..., ., .]. &WFSZ JOEFY J BU XIJDI UIFSF JT B NJTTJOH WBMVF UIFSF JT BMTP B QBSBNFUFS /J UIBU XJMM GPSN BO FTUJNBUF GPS JU ćJT JT UIF NPEFM LJ ∼ /PSNBM(µJ, σ) >OLNHOLKRRGIRURXWFRPH L@ µJ = α + β/ /J + β. MPH .J >OLQHDUPRGHO@ /J ∼ /PSNBM(ν, σ/) >OLNHOLKRRGSULRUIRUREVPLVVLQJ /@ α ∼ /PSNBM(, ) β/ ∼ /PSNBM(, ) β. ∼ /PSNBM(, ) σ ∼ $BVDIZ(, ) ν ∼ /PSNBM(., ) σ/ ∼ $BVDIZ(, ) /PUF UIBU XIFO /J JT PCTFSWFE UIFO UIF UIJSE MJOF BCPWF JT B MJLFMJIPPE KVTU MJLF BOZ PME MJOFBS SFHSFTTJPO #VU XIFO /J JT NJTTJOH BOE UIFSFGPSF B QBSBNFUFS UIBU TBNF MJOF JT JO linear model using mix of observed and imputed values
  13. ćF PCTUBDMF JO QSBDUJDF JT UIBU XF IBWF UP DPODFJWF

    PG UIF QSFEJDUPS OPX BT B N PG EBUB BOE QBSBNFUFST *O PVS DBTF UIF WBSJBCMF XJUI NJTTJOH WBMVFT JT OFPDPSU $BMM JU / / = [., /, /, /, ., ., ..., ., .]. &WFSZ JOEFY J BU XIJDI UIFSF JT B NJTTJOH WBMVF UIFSF JT BMTP B QBSBNFUFS /J UIBU QPTUFSJPS EJTUSJCVUJPO GPS JU ćJT JT UIF NPEFM XF OFFE LJ ∼ /PSNBM(µJ, σ) >OLNHOLKRRGIR µJ = α + β/ /J + β. MPH .J > /J ∼ /PSNBM(ν, σ/) >OLNHOLKRRGSULRUIRURE α ∼ /PSNBM(, ) β/ ∼ /PSNBM(, ) β. ∼ /PSNBM(, ) σ ∼ $BVDIZ(, ) ν ∼ /PSNBM(., ) σ/ ∼ $BVDIZ(, ) /PUF UIBU XIFO /J JT PCTFSWFE UIFO UIF UIJSE MJOF BCPWF JT B MJLFMJIPPE KVTU MJ MJOFBS SFHSFTTJPO ćF NPEFM MFBSOT UIF EJTUSJCVUJPOT PG ν BOE σ/ UIBU BSF DPOTJTUF Milk energy MCAR: model   .*44*/( %"5" "/% 05)&3 0110356/*5*&4 NJTTJOH WBMVFT JT OFPDPSUFY QFSDFOU $BMM JU / / = [., /, /, /, ., ., ..., ., .]. &WFSZ JOEFY J BU XIJDI UIFSF JT B NJTTJOH WBMVF UIFSF JT BMTP B QBSBNFUFS /J UIBU XJMM GPSN BO FTUJNBUF GPS JU ćJT JT UIF NPEFM LJ ∼ /PSNBM(µJ, σ) >OLNHOLKRRGIRURXWFRPH L@ µJ = α + β/ /J + β. MPH .J >OLQHDUPRGHO@ /J ∼ /PSNBM(ν, σ/) >OLNHOLKRRGSULRUIRUREVPLVVLQJ /@ α ∼ /PSNBM(, ) β/ ∼ /PSNBM(, ) β. ∼ /PSNBM(, ) σ ∼ $BVDIZ(, ) ν ∼ /PSNBM(., ) σ/ ∼ $BVDIZ(, ) /PUF UIBU XIFO /J JT PCTFSWFE UIFO UIF UIJSE MJOF BCPWF JT B MJLFMJIPPE KVTU MJLF BOZ PME MJOFBS SFHSFTTJPO #VU XIFO /J JT NJTTJOH BOE UIFSFGPSF B QBSBNFUFS UIBU TBNF MJOF JT JO when obs, likelihood; when imputed, prior mean neocortex (to be estimated) std dev of neocortex (to be estimated)
  14. Fitting /ǿ($'&Ȁ  ʚǶ ($'& ɶ) **-/ 3ǡ+-*+ ʚǶ ɶ)

    **-/ 3ǡ+ - ȅ ǎǍǍ ɶ'*"(.. ʚǶ '*"ǿɶ(..Ȁ ćF GPSNVMB MJTU MPPLT NVDI BT ZPVE FYQFDU 3 DPEF  ȕ +- + / /Ǿ'$./ ʚǶ '$./ǿ &' ʙ ɶ&'ǡ+ -ǡ"Ǣ ) **-/ 3 ʙ ɶ) **-/ 3ǡ+-*+Ǣ '*"(.. ʙ ɶ'*"(.. Ȁ ȕ !$/ (* ' (ǎǑǡǐ ʚǶ (+Ǐ./)ǿ '$./ǿ &' ʡ )*-(ǿ(0Ǣ.$"(ȀǢ (0 ʚǶ  ʔ ȉ) **-/ 3 ʔ ȉ'*"(..Ǣ ) **-/ 3 ʡ )*-(ǿ)0Ǣ.$"(ǾȀǢ  ʡ )*-(ǿǍǢǎǍǍȀǢ ǿǢȀ ʡ )*-(ǿǍǢǎȀǢ )0 ʡ )*-(ǿǍǡǒǢǎȀǢ .$"(Ǿ ʡ 0#4ǿǍǢǎȀǢ .$"( ʡ 0#4ǿǍǢǎȀ Ȁ Ǣ /ʙ/Ǿ'$./ Ǣ $/ -ʙǎ Ǒ Ǣ #$).ʙǏ Ȁ 5BLF B MPPL BU UIF FTUJNBUFT 3 DPEF +- $.ǿ(ǎǑǡǐǢ +/#ʙǏȀ Distribution on predictor signals map2stan to look for NAs. If it finds any, replaces with parameters.
  15. Results • Reduced slopes compared to complete case analysis •

    bN: 2.8 => 1.2 • bM: –0.10 => –0.05 • 12 imputed variables • wide confidence intervals • NOT same as prior • Why differ? nu neocortex_impute[12] neocortex_impute[11] neocortex_impute[10] neocortex_impute[9] neocortex_impute[8] neocortex_impute[7] neocortex_impute[6] neocortex_impute[5] neocortex_impute[4] neocortex_impute[3] neocortex_impute[2] neocortex_impute[1] 0.55 0.65 0.75 Value
  16. Results • Imputed values weakly track regression • observed neocortex

    associated with milk energy • imputed values weakly associated with paired milk energy • this is logical, a consequence of the model definition   .*44*/( %"5" "/% 0 0.55 0.60 0.65 0.70 0.75 0.80 0.5 0.6 0.7 0.8 0.9 neocortex proportion kcal per gram 'ĶĴłĿIJ Ɖƌƌ -Fę *OGFSSFE SFMBUJPOTI OFPDPSUFY QSPQPSUJPO IPSJ[POUBM X QPJOUT ćF MJOF TFHNFOUT BSF  QP
  17. Results • Observed neocortex positively associated with observed body mass

    • Imputed neocortex NOT associated with observed body mass • Can do better • Imputation model should use body mass (at least)   .*44*/( %"5" "/% 05)&3 0110356/*5*&4 0.55 0.60 0.65 0.70 0.75 0.80 0.5 0.6 0.7 0.8 0.9 neocortex proportion kcal per gram -2 -1 0 1 2 3 4 0.55 0.65 0.75 log(mass) neocortex proportion 'ĶĴłĿIJ Ɖƌƌ -Fę *OGFSSFE SFMBUJPOTIJQ CFUXFFO NJML FOFSHZ WFSUJDBM BOE OFPDPSUFY QSPQPSUJPO IPSJ[POUBM XJUI JNQVUFE WBMVFT TIPXO CZ PQFO QPJOUT ćF MJOF TFHNFOUT BSF  QPTUFSJPS JOUFSWBMT 3JHIU *OGFSSFE SF
  18. Milk energy MCAR: Model 2 • Naive imputation model: •

    Slightly less naive imputation model:  #VU OPUJDF IFSF UIBU UIF JNQVUFE WBMVFT EP OPU TIPXO BO VQXBSE TMPQ FDBVTF UIF JNQVUBUJPO NPEFM‰UIF ĕSTU SFHSFTTJPO XJUI OFPDPSUFY PCTFS BT UIF PVUDPNF‰BTTVNFE OP SFMBUJPOTIJQ 4P PEET BSF XF DBO JNQS Z DIBOHJOH UIF JNQVUBUJPO NPEFM UP FTUJNBUF UIF SFMBUJPOTIJQ CFUXFFO ST EP UIBU OPX ćF OPUJPO JT UP DIBOHF UIF JNQVUBUJPO MJOF PG UIF NPEFM G /J ∼ /PSNBM(ν, σ/) /J ∼ /PSNBM(νJ, σ/) νJ = α/ + γ. MPH .J DBVTF UIF JNQVUBUJPO NPEFM‰UIF ĕSTU SFHSFTTJPO XJUI OFPDPSUFY PC BT UIF PVUDPNF‰BTTVNFE OP SFMBUJPOTIJQ 4P PEET BSF XF DBO JN DIBOHJOH UIF JNQVUBUJPO NPEFM UP FTUJNBUF UIF SFMBUJPOTIJQ CFUXF T EP UIBU OPX ćF OPUJPO JT UP DIBOHF UIF JNQVUBUJPO MJOF PG UIF NPEF /J ∼ /PSNBM(ν, σ/) /J ∼ /PSNBM(νJ, σ/) νJ = α/ + γ. MPH .J ordinary slope body mass
  19. Milk energy MCAR: Model 2 • Slopes steeper now •

    Confidence intervals on imputed values tighter • Information used to update imputed values: • neocortex association with milk energy • neocortex association with log body mass Ȁ Ǣ /ʙ/Ǿ'$./ Ǣ $/ -ʙǎ Ǒ Ǣ # +- $.ǿ(ǎǑǡǑǢ +/#ʙǏȀ  ) / 1 ' ) **-/ 3Ǿ$(+0/ ȁǎȂ ǍǡǓǑ ǍǡǍǑ ) **-/ 3Ǿ$(+0/ ȁǏȂ ǍǡǓǑ ǍǡǍǑ ) **-/ 3Ǿ$(+0/ ȁǐȂ ǍǡǓǐ ǍǡǍǑ ) **-/ 3Ǿ$(+0/ ȁǑȂ ǍǡǓǒ ǍǡǍǑ ) **-/ 3Ǿ$(+0/ ȁǒȂ ǍǡǓǓ ǍǡǍǑ ) **-/ 3Ǿ$(+0/ ȁǓȂ ǍǡǓǐ ǍǡǍǑ ) **-/ 3Ǿ$(+0/ ȁǔȂ ǍǡǓǕ ǍǡǍǑ ) **-/ 3Ǿ$(+0/ ȁǕȂ ǍǡǔǍ ǍǡǍǑ ) **-/ 3Ǿ$(+0/ ȁǖȂ Ǎǡǔǎ ǍǡǍǑ ) **-/ 3Ǿ$(+0/ ȁǎǍȂ ǍǡǓǔ ǍǡǍǑ ) **-/ 3Ǿ$(+0/ ȁǎǎȂ ǍǡǓǕ ǍǡǍǑ ) **-/ 3Ǿ$(+0/ ȁǎǏȂ ǍǡǔǑ ǍǡǍǑ  ǶǍǡǏǖ ǍǡǑǑ  ǎǡǒǐ ǍǡǓǖ  ǶǍǡǍǔ ǍǡǍǏ " ǍǡǍǏ ǍǡǍǎ Ǿ ǍǡǓǑ ǍǡǍǎ .$"(Ǿ ǍǡǍǑ ǍǡǍǎ .$"( ǍǡǎǑ ǍǡǍǏ ćF NBSHJOBM QPTUFSJPS GPS " DPOĕSNT ZPV BMSFBEZ LOFX ćF NPEFM VTFT UIBU Q
  20. • Range of imputed values still quite wide • Bayes

    is not magic, just logic • Imputation just logical consequence of defining full model for (1) outcome and (2) predictors • Other methods illogical: Prevent feedback from regression to imputed values 0.55 0.60 0.65 0.70 0.75 0.80 0.5 0.6 0.7 0.8 0.9 neocortex proportion kcal per gram -2 -1 0 1 2 3 4 0.55 0.65 0.75 log(mass) neocortex proportion 'ĶĴłĿIJ Ɖƌƍ 4BNF SFMBUJPOTIJQT BT TIPXO JO 'ĶĴłĿIJ Ɖƌƌ CVU OPX GPS UIF JNQVUBUJPO NPEFM UIBU FTUJNBUFT UIF BTTPDJBUJPO CFUXFFO UIF QSFEJDUPST ćF JOGPSNBUJPO JO UIF BTTPDJBUJPO CFUXFFO QSFEJDUPST IBT CFFO VTFE UP JO GFS B TUSPOHFS SFMBUJPOTIJQ CFUXFFO NJML FOFSHZ BOE UIF JNQVUFE WBMVFT Ȁ Ǣ /ʙ/Ǿ'$./ Ǣ $/ -ʙǎ Ǒ Ǣ #$).ʙǏ Ȁ +- $.ǿ(ǎǑǡǑǢ +/#ʙǏȀ
  21. Cutting feedback: Don’t • Some people “cut” feedback from regression

    to variables (BUGS allows this) • Justification? • Want results like multiple imputation? • Don’t trust the regression model, but do trust the error model? • Cutting is bad news • Markov chains never converge, but people think they do (Martyn Plummer simulated to prove BUGS is broken) • No longer doing Bayesian inference, but might think you are • Preventing feedback like refusing to update estimates from previous cafés (Chapter 12)
  22. The Golem of Prague “Even the most perfect of Golem,

    risen to life to protect us, can easily change into a destructive force. Therefore let us treat carefully that which is strong, just as we bow kindly and patiently to that which is weak.” Rabbi Judah Loew ben Bezalel (1512–1609) From Breath of Bones: A Tale of the Golem
  23. Stats not substitute for science • Assume • Probability false

    positive finding is 5% • Probability true positive finding is 80% (power) • Conditional on positive finding, what is probability finding is true?
  24. Stats not substitute for science • Assume • Probability false

    positive finding is 5% • Probability true positive finding is 80% (power) • Conditional on positive finding, what is probability finding is true? 1S(5|+) = 1S(+|5) 1S(5) 1S(+) = 1S(+|5) 1S(5) 1S(+|5) 1S(5) + 1S(+|') 1S(') /!
  25. 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.2 0.4 0.6

    0.8 1.0 base rate Prob(True|+) Pr(T) Pr(+|T) = 1 Pr(+|T) = 0.5 Pr(+|T) = 0.8 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.2 0.4 0.6 0.8 1.0 base rate Prob(True|+) Pr(T) Pr(+|F) = 0.05 Pr(+|F) = 0.10 Pr(+|F) = 0.15 Pr(+|F) = 0.05 Pr(+|T) = 0.5
  26. What’s the base rate? • No one knows the base

    rate • except for GWAS: Pr(T) < 10–5 • Frightening low, judging by replication results 19 Figures 1a and 1b. Replication results organized by replication effect size, 1a for Cohen’s d estimates, 1b for partial eta-squared estimates. When available, the triangle indicates the effect size obtained in the original study (Elaboration Likelihood main effect estimate does not appear because it was extremely large, partial-eta square of .59). Large circles represent the aggregate effect size obtained across all participants. Error bars represent 99% noncentral confidence intervals around the effects. Small x’s represent the effect sizes obtained within each site.
  27. Recipes and mantras • Anxiety => statistical compulsive hand washing

    • Made worse by field of Statistics being autonomous • Objective: Everyone does it the same way => safe • Subjective: Expertise matters • But if we must have recipes and mantras...
  28. Recipes and mantras • Recipe for Bayesian data analysis •

    Define model(s) • Fit model(s) • Check fit(s) • Critique model(s) • Repeat • Details always depend upon context, purpose
  29. Recipes and mantras • Recipe for choosing likelihood functions •

    What constraints do you know, before you see the data? • What aspects of the data do you care about? • What can you actually calculate and understand? • Nothing forces you to choose only one • Recipe for choosing priors • Guard against overfitting (flat never best) • Meaningful parameter: What do you already know? Exploit maximum entropy again. • No ideas? Try different priors and see how sensitive
  30. Recipes and mantras • Mantras: • Assume an effect and

    estimate it • Embrace and propagate uncertainty • Fitting is easy; prediction is hard • There is no right, only less wrong • Math is not real; only then can it be real