Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Statistical Rethinking - Lecture 12

Statistical Rethinking - Lecture 12

Lecture 12 - MCMC / Maximum Entropy - Statistical Rethinking: A Bayesian Course with R Examples

Richard McElreath

February 12, 2015
Tweet

More Decks by Richard McElreath

Other Decks in Education

Transcript

  1. A wild chain • Two observations: {–1,1} • Estimate mean

    and standard deviation  5BNJOH B XJME DIBJO 0OF DPNNPO QSPCMFN XJUI TPNF NPEFMT JT UIBU UIFSF BSF CSPBE ĘBU SFHJPOT PG UIF QPTUFSJPS EFOTJUZ ćJT IBQQFOT NPTU PęFO BT ZPV NJHIU HVFTT XIFO POF VTFT ĘBU QSJPST ćF QSPCMFN UIJT DBO HFOFSBUF JT B XJME XBOEFSJOH .BSLPW DIBJO UIBU FSSBUJDBMMZ TBNQMFT FYUSFNFMZ QPTJUJWF BOE FYUSFNFMZ OFHBUJWF QBSBNFUFS WBMVFT -FUT MPPL BU B TJNQMF FYBNQMF ćF DPEF CFMPX USJFT UP FTUJNBUF UIF NFBO BOE TUBOEBSE EFWJBUJPO PG UIF UXP (BVTTJBO PCTFSWBUJPOT − BOE  3 DPEF  4 ʚǶ ǿǶǎǢǎȀ (ǕǡǏ ʚǶ (+Ǐ./)ǿ '$./ǿ 4 ʡ )*-(ǿ (0 Ǣ .$"( Ȁ Ǣ (0 ʚǶ '+# Ȁ Ǣ /ʙ'$./ǿ4ʙ4Ȁ Ǣ ./-/ʙ'$./ǿ'+#ʙǍǢ.$"(ʙǎȀ Ǣ #$).ʙǏ Ǣ $/ -ʙǎ Ǒ Ǣ 2-(0+ʙǎǍǍǍ Ȁ +- $.ǿ(ǕǡǏȀ  ) / 1 '*2 - Ǎǡǖǒ 0++ - Ǎǡǖǒ )Ǿ !! #/ '+# ǑǒǕǑǑǖǓ ǏǎǒǓǓǑǎǒ ǶǓǖǐǐǐǐǓǡǑǔ ǔǍǑǕǔǑǖǑ ǎǕ ǎǡǎǍ .$"( ǐǕǐǎǒǓǒǓ ǒǖǏǎǑǔǒǔǔ ǎǐǡǏǒ ǎǍǕǐǏǐǑǔǑ ǎǕǐ ǎǡǍǎ 8IPB ćPTF FTUJNBUFT DBOU CF SJHIU ćF NFBO PG − BOE  JT [FSP TP XFSF IPQJOH UP HFU B NFBO WBMVF GPS '+# BSPVOE [FSP *OTUFBE XF HFU DSB[Z WBMVFT BOE JNQMBVTJCMZ XJEF
  2.  $"3& "/% '&&%*/( 'ĶĴłĿIJ ƐƏ %JBHOPTJOH BOE IFBMJO QMPU

    GSPN UXP JOEFQFOEFOU DIBJOT E BSF OPU TUBUJPOBSZ BOE TIPVME OPU C XFBLMZ JOGPSNBUJWF QSJPST TFF (ǃǏƾ ćFTF DIBJOT BSF ĕOF UP VTF GPS JOGFS A wild chain • Problem is flat priors • Flat means flat forever • Little information in likelihood • So most of probability in posterior is out to 30-million • King Monty’s car keeps driving • Flat prior is improper (integrates to infinity) • Fix with weakly informative priors PS  BSPVOE [FSP *OTUFBE XF HFU DSB[Z WBMVFT BOE JNQMBVTJCMZ XJEF *OGFSFODF GPS .$"( JT OP CFUUFS USBDF QMPU GPS UIJT ĕU +'*/#$).ǭ(ǃǏƽǮ *UT TIPXO JO UIF MFę QBOFM FBTPO GPS UIF XFJSE FTUJNBUFT JT UIBU UIF .BSLPW DIBJOT TFFN UP ESJę DBTJPOBMMZ UP FYUSFNF WBMVFT ćJT JT OPU B TUBUJPOBSZ QBJS PG DIBJOT JEF VTFGVM TBNQMFT IJT QBSUJDVMBS DIBJO CZ VTJOH XFBLMZ JOGPSNBUJWF QSJPST ćF SFBTPO ęT XJMEMZ JO CPUI EJNFOTJPOT JT UIBU UIFSF JT WFSZ MJUUMF EBUB KVTU UXP U QSJPST ćF ĘBU QSJPST TBZ UIBU FWFSZ QPTTJCMF WBMVF PG UIF QBSBNFUFS B QSJPSJ 'PS QBSBNFUFST UIBU DBO UBLF B QPUFOUJBMMZ JOĕOJUF OVNCFS NFBOT UIF .BSLPW DIBJO OFFET UP PDDBTJPOBMMZ TBNQMF TPNF QSFUUZ JCMF WBMVFT MJLF OFHBUJWF NJMMJPO ćFTF FYUSFNF ESJęT PWFSXIFMN JIPPE XFSF TUSPOHFS UIFO UIF DIBJO XPVME CF ĕOF CFDBVTF JU XPVME F NVDI JOGPSNBUJPO JO UIF QSJPS UP TUPQ UIJT GPPMJTIOFTT FWFO XJUIPVU  -FUT VTF UIJT NPEFM ZJ ∼ /PSNBM(µ, σ) µ = α α ∼ /PSNBM(, ) σ ∼ $BVDIZ(, )
  3. A wild chain µ = α α ∼ /PSNBM(, )

    σ ∼ )BMG$BVDIZ(, ) *WF KVTU BEEFE XFBLMZ JOGPSNBUJWF QSJPST GPS α BOE σ 8FMM QMPU UIFTF QSJPST JO B NPNFOU TP ZPV XJMM CF BCMF UP TFF KVTU IPX XFBL UIFZ BSF #VU MFUT SFFTUJNBUF ĕSTU 3 DPEF  (Ǖǡǐ ʚǶ (+Ǐ./)ǿ '$./ǿ 4 ʡ )*-(ǿ (0 Ǣ .$"( Ȁ Ǣ (0 ʚǶ '+# Ǣ '+# ʡ )*-(ǿ ǎ Ǣ ǎǍ Ȁ Ǣ .$"( ʡ 0#4ǿ Ǎ Ǣ ǎ Ȁ Ȁ Ǣ /ʙ'$./ǿ4ʙ4Ȁ Ǣ ./-/ʙ'$./ǿ'+#ʙǍǢ.$"(ʙǎȀ Ǣ #$).ʙǏ Ǣ $/ -ʙǎ Ǒ Ǣ 2-(0+ʙǎǍǍǍ Ȁ +- $.ǿ(ǕǡǐȀ  ) / 1 '*2 - Ǎǡǖǒ 0++ - Ǎǡǖǒ )Ǿ !! #/ '+# ǍǡǍǏ ǎǡǔǒ ǶǐǡǏǕ ǐǡǔǏ ǏǏǓǓ ǎ .$"( ǏǡǍǕ ǏǡǍǎ ǍǡǑǎ ǒǡǏǓ ǏǏǒǖ ǎ
  4. 'ĶĴłĿIJ ƐƏ %JBHOPTJOH BOE IFBMJOH B TJDL .BSLPW DIBJO -Fę

    5SBDF QMPU GSPN UXP JOEFQFOEFOU DIBJOT EFĕOFE CZ NPEFM (ǃǏƽ ćFTF DIBJOT BSF OPU TUBUJPOBSZ BOE TIPVME OPU CF VTFE GPS JOGFSFODF 3JHIU "EEJOH XFBLMZ JOGPSNBUJWF QSJPST TFF (ǃǏƾ DMFBST VQ UIF DPOEJUJPO SJHIU BXBZ ćFTF DIBJOT BSF ĕOF UP VTF GPS JOGFSFODF
  5. A wild chain Even with only 2 observations, these priors

    have no effect on inference! Except to allow you to make inferences...  $"3& "/% '&&%*/( 0' :063 ."3,07 $)"*/  -10 0 10 20 0.0 0.1 0.2 0.3 0.4 alpha Density prior posterior 0 2 4 6 8 10 0.0 0.2 0.4 0.6 sigma Density 'ĶĴłĿIJ ƐƎ 1SJPS EBTIFE BOE QPTUFSJPS CMVF GPS UIF NPEFM XJUI XFBLMZ JOGPSNBUJWF QSJPST (Ǖǡǐ &WFO XJUI POMZ UXP PCTFSWBUJPOT UIF MJLFMJIPPE FBTJMZ PWFSDPNFT UIFTF QSJPST :FU UIF NPEFM DBOOPU CF TVDDFTTGVMMZ FTUJ NBUFE XJUIPVU UIFN ćBUT NVDI CFUUFS 5BLF B MPPL BU UIF SJHIUIBOE QBOFM JO 'ĶĴłĿIJ Ɛƍ ćJT USBDF QMPU MPP
  6. Unidentified )ʃƻ ǐ .ʃƼ Ǯ BUB XF LOPX UIF SJHIU

    BOTXFS ćFO XFMM ĕU UIJT NPEFM ZJ ∼ /PSNBM(µ, σ) µ = α + α σ ∼ $BVDIZ(, ) JOT UXP QBSBNFUFST α BOE α XIJDI DBOOPU CF JEFOUJĕFE 0OMZ ĕFE BOE JU TIPVME CF BCPVU [FSP BęFS FTUJNBUJPO PW DIBJO BOE TFF XIBU IBQQFOT ćJT DIBJO JT HPJOH UP UBLF NVDI T POFT #VU JU TIPVME TUJMM ĕOJTI BęFS B GFX NJOVUFT  #Z TJNVMBUJOH UIF EBUB XF LOPX UIF SJHIU BOTXFS ćFO XF ĕU UIJT NPEFM ZJ ∼ /PSNBM(µ, σ) µ = α + α σ ∼ )BMG$BVDIZ(, ) ćF MJOFBS NPEFM DPOUBJOT UXP QBSBNFUFST α BOE α XIJDI DBOOPU CF JEFOUJĕFE 0OMZ UIFJS TVN DBO CF JEFOUJĕFE BOE JU TIPVME CF BCPVU [FSP BęFS FTUJNBUJPO -FUT SVO UIF .BSLPW DIBJO BOE TFF XIBU IBQQFOT ćJT DIBJO JT HPJOH UP UBLF NVDI MPOHFS UIBO UIF QSFWJPVT POFT #VU JU TIPVME TUJMM ĕOJTI BęFS B GFX NJOVUFT 3 DPEF  (ǕǡǑ ʚǶ (+Ǐ./)ǿ '$./ǿ 4 ʡ )*-(ǿ (0 Ǣ .$"( Ȁ Ǣ (0 ʚǶ ǎ ʔ Ǐ Ǣ .$"( ʡ 0#4ǿ Ǎ Ǣ ǎ Ȁ Ȁ Ǣ /ʙ'$./ǿ4ʙ4Ȁ Ǣ ./-/ʙ'$./ǿǎʙǍǢǏʙǍǢ.$"(ʙǎȀ Ǣ #$).ʙǏ Ǣ $/ -ʙǎ Ǒ Ȁ +- $.ǿ(ǕǡǑȀ  ) / 1 '*2 - Ǎǡǖǒ 0++ - Ǎǡǖǒ )Ǿ !! #/ ǎ ǕǎǍǏǡǍǐ ǑǑǏǔǡǑǔ ǕǑǒǡǏǒ ǎǑǏǕǎǡǓǔ ǎ ǑǡǏǔ Ǐ ǶǕǎǍǏǡǍǎ ǑǑǏǔǡǑǔ ǶǎǑǏǕǎǡǕǏ ǶǕǑǒǡǎǔ ǎ ǑǡǏǔ .$"( ǍǡǖǏ ǍǡǍǕ Ǎǡǔǖ ǎǡǎǍ Ǔ ǎǡǕǓ ćPTF FTUJNBUFT MPPL TVTQJDJPVT BOE UIF )Ǿ !! BOE #/ WBMVFT BSF UFSSJCMF ćF NFBOT GPS TVDI QBSBNFUFST MPPL MJLF JOTJEF PG B .BSLPW DIBJ UIFN JO QSJODJQMF CZ VTJOH B MJUUMF QSJPS JOGPSNBUJP DIBJOT QSPEVDFE JO UIJT FYBNQMF XJMM FYIJCJU DIBSB UIF TBNF QBUUFSO JO ZPVS PXO NPEFMT ZPVMM IBWF B 5P DPOTUSVDU B OPOJEFOUJĕBCMF NPEFM XF ĕSTU JBO EJTUSJCVUJPO XJUI NFBO [FSP BOE TUBOEBSE EFWJB 3 DPEF  4 ʚǶ -)*-(ǿ ǎǍǍ Ǣ ( )ʙǍ Ǣ .ʙǎ Ȁ #Z TJNVMBUJOH UIF EBUB XF LOPX UIF SJHIU BOTXFS ć ZJ ∼ /PSNBM(µ, µ = α + α σ ∼ )BMG$BVDIZ ćF MJOFBS NPEFM DPOUBJOT UXP QBSBNFUFST α BOE α TVN DBO CF JEFOUJĕFE BOE JU TIPVME CF BCPVU [FSP -FUT SVO UIF .BSLPW DIBJO BOE TFF XIBU IBQ MPOHFS UIBO UIF QSFWJPVT POFT #VU JU TIPVME TUJMM ĕO 3 DPEF  (ǕǡǑ ʚǶ (+Ǐ./)ǿ '$./ǿ 4 ʡ )*-(ǿ (0 Ǣ .$"( Ȁ Ǣ (0 ʚǶ ǎ ʔ Ǐ Ǣ .$"( ʡ 0#4ǿ Ǎ Ǣ ǎ Ȁ Ȁ Ǣ /ʙ'$./ǿ4ʙ4Ȁ Ǣ ./-/ʙ'$./ǿǎʙǍǢǏʙǍǢ.$
  7. Unidentified • Use weak priors, again TUBOEBSE EFWJBUJPOT PG UIF

    DIBJOT BSF NBTTJWF ćJT JT PG DPVSTF B SFTVMU PG UIF GBDU UIBU XF DBOOPU TJNVMUBOFPVTMZ FTUJNBUF ǎ BOE Ǐ CVU POMZ UIFJS TVN -PPLJOH BU UIF USBDF QMPU SFWFBMT NPSF 'ĶĴłĿIJ ƐƏ TIPXT UXP .BSLPW DIBJOT GSPN UIF NPEFM BCPWF ćFTF DIBJOT EP OPU MPPL MJLF UIFZ BSF TUBUJPOBSZ OPS EP UIFZ TFFN UP CF NJYJOH WFSZ XFMM *OEFFE XIFO ZPV TFF B QBUUFSO MJLF UIJT JU JT SFBTPO UP XPSSZ %POU VTF UIFTF TBNQMFT "HBJO XFBL QSJPST DBO SFTDVF VT /PX UIF NPEFM ĕUUJOH DPEF JT 3 DPEF  (Ǖǡǒ ʚǶ (+Ǐ./)ǿ '$./ǿ 4 ʡ )*-(ǿ (0 Ǣ .$"( Ȁ Ǣ (0 ʚǶ ǎ ʔ Ǐ Ǣ ǎ ʡ )*-(ǿ Ǎ Ǣ ǎǍ Ȁ Ǣ Ǐ ʡ )*-(ǿ Ǎ Ǣ ǎǍ Ȁ Ǣ .$"( ʡ 0#4ǿ Ǎ Ǣ ǎ Ȁ Ȁ Ǣ /ʙ'$./ǿ4ʙ4Ȁ Ǣ ./-/ʙ'$./ǿǎʙǍǢǏʙǍǢ.$"(ʙǎȀ Ǣ #$).ʙǏ Ǣ $/ -ʙǎ Ǒ Ȁ +- $.ǿ(ǕǡǒȀ  ) / 1 '*2 - Ǎǡǖǒ 0++ - Ǎǡǖǒ )Ǿ !! #/ 'ĶĴłĿIJ ƐƐ 5SBDF QMPU GSPN UIF TBNF NPEFM BT JO 'ĶĴłĿIJ ƐƏ CVU OPX XJUI XFBL QSJPST ćF DIBJOT DPOWFSHF BOE NJY XFMM OPX ǎ ǶǍǡǎǏ ǔǡǏǎ ǶǎǑǡǍǕ ǎǑǡǏǑ ǎǕǎǕ ǎ Ǐ ǍǡǎǑ ǔǡǏǎ ǶǎǑǡǏǐ ǎǑǡǎǏ ǎǕǎǖ ǎ .$"( Ǎǡǖǒ ǍǡǍǔ ǍǡǕǏ ǎǡǍǔ ǏǒǍǏ ǎ ćF FTUJNBUFT GPS ǎ BOE Ǐ BSF CFUUFS JEFOUJĕFE OPX "OE UBLF B MPPL BU UIF USBDF QMPU JO 'ĶĴłĿIJ ƐƐ /PUJDF BMTP UIBU UIF NPEFM TBNQMFE B MPU GBTUFS 8JUI ĘBU QSJPST (ǕǡǑ TBNQMJOH TUBOEBSE EFWJBUJPOT PG UIF DIBJOT BSF NBTTJWF ćJT JT PG DPVSTF B SFTVMU PG UIF GBDU UIBU XF DBOOPU TJNVMUBOFPVTMZ FTUJNBUF ǎ BOE Ǐ CVU POMZ UIFJS TVN -PPLJOH BU UIF USBDF QMPU SFWFBMT NPSF 'ĶĴłĿIJ ƐƏ TIPXT UXP .BSLPW DIBJOT GSPN UIF NPEFM BCPWF ćFTF DIBJOT EP OPU MPPL MJLF UIFZ BSF TUBUJPOBSZ OPS EP UIFZ TFFN UP CF NJYJOH WFSZ XFMM *OEFFE XIFO ZPV TFF B QBUUFSO MJLF UIJT JU JT SFBTPO UP XPSSZ %POU VTF UIFTF TBNQMFT "HBJO XFBL QSJPST DBO SFTDVF VT /PX UIF NPEFM ĕUUJOH DPEF JT (Ǖǡǒ ʚǶ (+Ǐ./)ǿ '$./ǿ 4 ʡ )*-(ǿ (0 Ǣ .$"( Ȁ Ǣ (0 ʚǶ ǎ ʔ Ǐ Ǣ ǎ ʡ )*-(ǿ Ǎ Ǣ ǎǍ Ȁ Ǣ Ǐ ʡ )*-(ǿ Ǎ Ǣ ǎǍ Ȁ Ǣ .$"( ʡ 0#4ǿ Ǎ Ǣ ǎ Ȁ Ȁ Ǣ /ʙ'$./ǿ4ʙ4Ȁ Ǣ ./-/ʙ'$./ǿǎʙǍǢǏʙǍǢ.$"(ʙǎȀ Ǣ #$).ʙǏ Ǣ $/ -ʙǎ Ǒ Ȁ +- $.ǿ(ǕǡǒȀ  ) / 1 '*2 - Ǎǡǖǒ 0++ - Ǎǡǖǒ )Ǿ !! #/
  8. Unidentified • Can add a1+a2 to get identified parameter •

    Weak priors also make sampling faster! ćF FTUJNBUFT GPS ǎ BOE Ǐ BSF CFUUFS JEFOUJĕFE OPX "OE UBLF B MPPL BU UIF US 'ĶĴłĿIJ ƐƐ /PUJDF BMTP UIBU UIF NPEFM TBNQMFE B MPU GBTUFS 8JUI ĘBU QSJPST (ǕǡǑ NBZ UBLF  UJNFT BT MPOH BT JU EPFT GPS (Ǖǡǒ 0ęFO B NPEFM UIBU JT WFSZ TMPX UP VOEFSJEFOUJĕFE "OE TJODF UIF DIBJO JT XPSLJOH OPX JO UIJT DBTF XF DBO BDUVBMMZ ĕY UIF XFJS UFSJ[BUJPO QSPCMFN XJUIPVU DIBOHJOH UIF NPEFM PS SVOOJOH UIF DIBJOT BHBJO /P FBDI DIBJO UIF USBDF GPS ǎ JT UIF NJSSPS JNBHF PG UIF USBDF GPS Ǐ ćJT JT CFDBVT JT TVDDFTTGVMMZ FTUJNBUJOH UIF QPTUFSJPS EJTUSJCVUJPO PG UIF TVN PG ǎ BOE Ǐ :PV UIJT XJUI UIF TBNQMFT 3 DPEF  +*./ ʚǶ 3/-/ǡ.(+' .ǿ (Ǖǡǒ Ȁ  ʚǶ +*./ɶǎ ʔ +*./ɶǏ ( )ǿȀ ( )ǿ4Ȁ ȁǎȂ ǍǡǍǍǔǏǏǍǑǍǖ ȁǎȂ ǍǡǍǍǔǑǖǓǐǑǔ
  9. A final example • Recall data(foxes) from old homework •

    Predict weight using food and group size XJ ∼ /PSNBM(µJ, σ) µJ = α + β' 'J + β( (J α ∼ /PSNBM(, ) β' ∼ /PSNBM(, ) β( ∼ /PSNBM(, ) σ ∼ $BVDIZ(, ) XJ ∼ /PSNBM(µJ, σ) µJ = α + β' 'J + β( (J α ∼ /PSNBM(, ) β' ∼ /PSNBM(, ) β( ∼ /PSNBM(, ) σ ∼ $BVDIZ(, ) XJ ∼ /PSNBM(µJ, σ) µJ = α + β' 'J + β( (J α ∼ /PSNBM(, ) β' ∼ -BQMBDF(, ) β( ∼ -BQMBDF(, ) σ ∼ $BVDIZ(, )
  10. A final example • Recall data(foxes) from old homework •

    Predict weight using food and group size XJ ∼ /PSNBM(µJ, σ) µJ = α + β' 'J + β( (J α ∼ /PSNBM(, ) β' ∼ /PSNBM(, ) β( ∼ /PSNBM(, ) σ ∼ $BVDIZ(, ) XJ ∼ /PSNBM(µJ, σ) µJ = α + β' 'J + β( (J α ∼ /PSNBM(, ) β' ∼ /PSNBM(, ) β( ∼ /PSNBM(, ) σ ∼ $BVDIZ(, ) XJ ∼ /PSNBM(µJ, σ) µJ = α + β' 'J + β( (J α ∼ /PSNBM(, ) β' ∼ -BQMBDF(, ) β( ∼ -BQMBDF(, ) σ ∼ $BVDIZ(, )
  11. An example • Recall data(foxes) from old homework • Predict

    weight using food and group size m1 <- map2stan( alist( weight ~ dnorm( mu , sigma ) , mu <- a + bf*avgfood + bg*groupsize, a ~ dnorm(0,100), bf ~ dnorm(0,1), bg ~ dnorm(0,1), sigma ~ dcauchy(0,1) ) , data=d , start=list( a=mean(d$weight), bf=0,bg=0,sigma=sd(d$weight)) , warmup=1000 , iter=11000 , chains=2 ) m2 <- map2stan( alist( weight ~ dnorm( mu , sigma ) , mu <- a + bf*avgfood + bg*groupsize, a ~ dnorm(0,100), bf ~ dlaplace(0,1), bg ~ dlaplace(0,1), sigma ~ dcauchy(0,1) ) , data=d , start=list( a=mean(d$weight), bf=0,bg=0,sigma=sd(d$weight)) , warmup=1000 , iter=11000 , chains=2 )
  12. -2 0 2 4 0.0 0.2 0.4 bf Density -2

    0 2 4 0.0 0.2 0.4 bf Density m1: Gaussian prior m2: Laplace prior
  13. -2 -1 0 1 2 3 4 0.0 0.2 0.4

    bf Density -2 -1 0 1 2 3 4 0.0 0.1 0.2 0.3 0.4 bf Density -2 0 2 4 0.0 0.2 0.4 bf Density -2 0 2 4 0.0 0.2 0.4 bf Density m1: Gaussian prior m2: Laplace prior m1: Gaussian prior, half data m2: Laplace prior, half data
  14. WAIC? Model, data pD WAIC Gauss, all 3.2 365.2 Laplace,

    all 3.8 364.5 Gauss, half 2.5 187.8 Laplace, half 2.6 187.9
  15. WAIC? Model, data pD WAIC Gauss, all 3.2 365.2 Laplace,

    all 3.8 364.5 Gauss, half 2.5 187.8 Laplace, half 2.6 187.9 • Laplace priors more flexible, unless likelihood near zero • => p greater
  16. WAIC? Model, data pD WAIC Gauss, all 3.2 365.2 Laplace,

    all 3.8 364.5 Gauss, half 2.5 187.8 Laplace, half 2.6 187.9 • Laplace priors more flexible, unless likelihood near zero • => p greater • Laplace fits sample better
  17. WAIC? Model, data pD WAIC Gauss, all 3.2 365.2 Laplace,

    all 3.8 364.5 Gauss, half 2.5 187.8 Laplace, half 2.6 187.9 • Laplace priors more flexible, unless likelihood near zero • => p greater • Laplace fits sample better • Expected to overfit more
  18. WAIC? Model, data pD WAIC Gauss, all 3.2 365.2 Laplace,

    all 3.8 364.5 Gauss, half 2.5 187.8 Laplace, half 2.6 187.9 • Laplace priors more flexible, unless likelihood near zero • => p greater • Laplace fits sample better • Expected to overfit more • WAIC is a wash
  19. WAIC? Model, data pD WAIC Gauss, all 3.2 365.2 Laplace,

    all 3.8 364.5 Gauss, half 2.5 187.8 Laplace, half 2.6 187.9 • Laplace priors more flexible, unless likelihood near zero • => p greater • Laplace fits sample better • Expected to overfit more • WAIC is a wash • Gaussian prior: expect many small effects
  20. WAIC? Model, data pD WAIC Gauss, all 3.2 365.2 Laplace,

    all 3.8 364.5 Gauss, half 2.5 187.8 Laplace, half 2.6 187.9 • Laplace priors more flexible, unless likelihood near zero • => p greater • Laplace fits sample better • Expected to overfit more • WAIC is a wash • Gaussian prior: expect many small effects • Laplace prior: expect only few important effects
  21. 1 2 3 4 5 3 4 5 6 7

    8 9 10 11 1 2 100 pebbles
  22. 1 2 3 4 5 n1 n2 n3 n4 n5

     = ! (g!(h!(i!(j!(k! &** = MPH PG QSPEVDU PG BWFSBHF MJLFMJIPPET = TVN PG MPHT PG BWFSBHF MJLFMJIPPET &** =  #=g &)! ,(3#|θ) ,(θ)θ Number of ways:
  23. Suppose only 10 pebbles.... 1 way 90 ways 1 2

    3 4 5 bucket pebbles 0 5 10 10 1 2 3 4 5 bucket pebbles 0 5 10 1 8 1 1 2 3 4 5 bucket pebbles 0 5 10 2 6 2 1260 ways 1 2 3 4 5 bucket pebbles 0 5 10 1 2 4 2 1 37800 ways
  24. 1 2 3 4 5 bucket pebbles 0 5 10

    2 6 2 1260 ways 1 2 3 4 5 bucket pebbles 0 5 10 1 2 4 2 1 37800 ways 1 2 3 4 5 bucket pebbles 0 5 10 2 2 2 2 2 113400 ways
  25. 1 2 3 4 5 n1 n2 n3 n4 n5

    For large N:  = ! (g!(h!(i!(j!(k! g  &)!  ≈ − # (#  &)! (#  = − # *# &)! *# &** = &)! ) *,)/. ) 0,! &#%&#"))- = -/' ) &)!- ) 0,! &#%&#"))- &** =  #=g &)! ,(3#|θ) ,(θ)θ =  &)!  θ ,(3#|θ)
  26. Maximum entropy • Due to Edwin T. Jaynes (1922–1998) •

    The maxent principle: • Distribution with largest entropy is distribution most consistent with stated assumptions • Can happen the largest number of ways • For parameters, provides way to construct priors • For observations, way to construct likelihood • Also reproduces Bayesian updating as special case (minimum cross-entropy) E. T. Jaynes (1922–1998)
  27. Maximum entropy • Ye olde information entropy: • Q: What

    kind of distribution maximizes this quantity? • A: Flattest distribution still consistent with constraints. This is the distribution that can happen the most unique ways. • Whatever does happen, bound to be one of those ways.  .BYJNVN FOUSPQZ $IBQUFS  ZPV NFU UIF CBTJDT PG JOGPSNBUJPO UIFPSZ *O CSJFG XF TFFL B NFBT UBJOUZ UIBU TBUJTĕFT UISFF DSJUFSJB  UIF NFBTVSF TIPVME CF DPOUJOVPVT  JU T TF BT UIF OVNCFS PG QPTTJCMF FWFOUT JODSFBTFT BOE  JU TIPVME CF BEEJUJWF ć H VOJRVF NFBTVSF PG UIF VODFSUBJOUZ PG B QSPCBCJMJUZ EJTUSJCVUJPO Q XJUI QSPCB FBDI QPTTJCMF FWFOU J UVSOT PVU UP CF KVTU UIF BWFSBHF MPHQSPCBCJMJUZ )(Q) = − J QJ MPH QJ VODUJPO JT LOPXO BT JOGPSNBUJPO FOUSPQZ
  28. Uniform distribution • Constraints: bounded between a and b •

    Maxent distribution is uniform, because flattest • What if there are other constraints, such that flat is impossible? 0.0 1.0 2.0 Density a b entropy = 0 0.0 1.0 2.0 Density a b entropy = -0.19 0.0 1.0 2.0 Density a b entropy = -0.13
  29. 1 2 3 4 5 3 4 5 6 7

    8 9 10 11 1 2 100 pebbles
  30. 0 2 1 3 4 5 6 7 8 –1

    –2 –8 –7 –6 –5 –4 –3 Constraint: variance must equal 1
  31. Revisit Gaussian • Constraints: unbounded real, finite variance • Add

    up fluctuations, distribution of sums converges to Gaussian • Why? Vastly many more ways to realize Gaussian than another shape. • Flattest distribution with given variance • Ergo, Gaussian has maxent for all continuous, unbounded distributions with finite variance -4 -2 0 2 4 0.0 0.2 0.4 0.6 value Density 'ĶĴłĿIJ ƑƉ .BYJNVN FOUSPQZ BOE UI QBSJTPO PG (BVTTJBO CMVF BOE TFWFSBM UIF TBNF WBSJBODF 3JHIU &OUSPQZ JT N FSBMJ[FE OPSNBM EJTUSJCVUJPO NBUDIFT U  HFOFSBMJ[FE EJTUSJCVUJPOT XJUI FRVBM WBSJBODF CZ UIF QSPCBCJMJUZ EFOTJUZ 1S(Z|µ, α, β) = αΓ 8F XBOU UP DPNQBSF B SFHVMBS (BVTTJBO EJTUSJC  ."9*.6. &/5301: -4 -2 0 2 4 0.0 0.2 0.4 0.6 value Density 1.0 1.5 2.0 2.5 3.0 3.5 4.0 1.36 1.38 1.40 1.42 shape entropy
  32. Revisit binomial • Constraints: binary outcomes, constant expected value across

    trials • Maxent now is binomial QFDUFE WBMVF *O CPUI FYBNQMFT XF IBWF UP BTTJHO QSPCBCJMJUZ UP FBDI F LFFQJOH UIF FYQFDUFE WBMVF PG UIF EJTUSJCVUJPO DPOTUBOU "OE JO CPUI EJTUSJCVUJPO UIBU NBYJNJ[FT FOUSPQZ JT UIF CJOPNJBM EJTUSJCVUJPO XJUI F NQMF 4VQQPTF BHBJO MJLF JO $IBQUFS  UIBU XF IBWF B CBH XJUI BO MVF BOE XIJUF NBSCMFT XJUIJO JU 8F ESBX UXP NBSCMFT GSPN UIF CBH SF BSF UIFSFGPSF GPVS QPTTJCMF TFRVFODFT  UXP XIJUF NBSCMFT  XIJUF  POF XIJUF BOE UIFO POF CMVF BOE  UXP CMVF NBSCMFT PCBCJMJUJFT UP FBDI PG UIFTF QPTTJCMF PVUDPNFT 4VQQPTF XF LOPX UIBU G CMVF NBSCMFT PWFS UXP ESBXT JT FYBDUMZ  ćJT JT UIF FYQFDUFE WBMVF CVUJPOT XFMM DPOTJEFS CVUJPO XJUI UIF CJHHFTU FOUSPQZ -FUT DPOTJEFS GPVS DBOEJEBUF EJTUSJ łĿIJ ƑƊ )FSF BSF UIF QSPCBCJMJUJFT UIBU EFĕOF FBDI EJTUSJCVUJPO %JTUSJCVUJPO XX CX XC CC "     #     $     %     CJOPNJBM EJTUSJCVUJPO XJUI O =  BOE Q = . ćF PVUDPNFT CX MBQTFE JOUP UIF TBNF PVUDPNF UZQF #VU JO QSJODJQMF UIFZ BSF EJČFS XF DBSF BCPVU UIF PSEFS PG PVUDPNFT PS OPU 4P UIF DPSSFTQPOEJOH    #*( &/5301: "/% 5)& (&/ ww bw wb bb ww bw wb bb ww bw wb bb ww bw wb bb A B C D 'ĶĴ UIF ESB DPS UIF BOE (1.39) (1.33) (1.33) (1.21)
  33. Revisit binomial • Constraints: binary outcomes, constant expected value across

    trials • Maxent now is binomial • e.g.: 2 trials, expected value 1.4   #*( &/5301: "/% 5)& (&/&3"-*;&% -*/&"3 .0%&- ww bw wb bb ww bw wb bb ww bw wb bb ww bw wb bb 0.7 0.8 0.9 1.0 1.1 1.2 0 2 4 6 8 Entropy Density A B C D A B C D binomial
  34. Maximum entropy Constraints Maxent distribution Real value in interval Uniform

    Real value, finite variance Gaussian Binary events, fixed probability Binomial Non-negative real, has mean Exponential