Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Statistical Rethinking Fall 2017 Lecture 11

Statistical Rethinking Fall 2017 Lecture 11

Week 6, Lecture 11, Statistical Rethinking: A Bayesian Course with Examples in R and Stan. This lecture covers Chapters 8 and 9 of the book.

A0f2f64b2e58f3bfa48296fb9ed73853?s=128

Richard McElreath

December 01, 2017
Tweet

Transcript

  1. Week 6: Markov Chains & Maximum Entropy Richard McElreath Statistical

    Rethinking
  2. Check the chain • Sometimes it doesn’t work • First

    and most important check: trace plot  &"4: ).$ ƽ  'ĶĴłĿIJ ƐƎ 5SBDF QMPU PG UIF .BSLPW DIBJO GSPN UIF SVHHFEOFTT NPEFM (ǃǏƼ./) ćJT JT B DMFBO IFBMUIZ .BSLPW DIBJO CPUI TUBUJPOBSZ BOE warmup ĽĹļŁ " USBDF QMPU NFSFMZ QMPUT UIF TBNQMFT JO TFRVFOUJBM PSEFS KPJO .BSLPWT QBUI UISPVHI UIF JTMBOET JO UIF NFUBQIPS BU UIF TUBSU PG UIF D USBDF QMPU PG FBDI QBSBNFUFS DBO IFMQ UP EJBHOPTF NBOZ DPNNPO QSP DPNF UP SFDPHOJ[F B IFBMUIZ GVODUJPOJOH .BSLPW DIBJO RVJDL DIFDLT B MPU PG QFBDF PG NJOE " USBDF QMPU JTOU UIF MBTU UIJOH BOBMZTUT EP UP JO #VU JUT OFBSMZ BMXBZT UIF ĕSTU *O UIF UFSSBJO SVHHFEOFTT FYBNQMF UIF USBDF QMPU TIPXT B WFSZ I XJUI +'*/ǿ(Ǖǡǎ./)Ȁ ćF SFTVMU JT TIPXO JO 'ĶĴłĿIJ Ɛƌ &BDI QMPU JO UIJT ĕHVSF JT TJNJM ZPV KVTU VTFE GPS FYBNQMF +'*/ǿ+*./ɶǢ/4+ ʙǫ'ǫȀ CVU XJUI TPNF MBCFMJOH UP IFMQ PVU :PV DBO UIJOL PG UIF [JH[BHHJOH USBDF PG FBDI UIF DIBJO UPPL UISPVHI FBDI EJNFOTJPO PG QBSBNFUFS TQBDF *UT FBT UIF [PPNFE QMPU JO UIF MPXFS SJHIU DPSOFS XIJDI EJTQMBZT POMZ UIF XBSNVQ ćF HSBZ SFHJPO JO FBDI QMPU UIF ĕSTU POFUIPVTBOE TBNQMFT NBS QMFT %VSJOH BEBQUBUJPO UIF .BSLPW DIBJO JT MFBSOJOH UP NPSF FďDJ QPTUFSJPS EJTUSJCVUJPO 4P UIFTF TBNQMFT BSF OPU OFDFTTBSJMZ SFMJBCMF UP BSF BVUPNBUJDBMMZ EJTDBSEFE CZ 3/-/ǡ.(+' . XIJDI SFUVSOT PO JO UIF XIJUF SFHJPOT PG 'ĶĴłĿIJ Ɛƌ
  3. “Hairy caterpillar ocular inspection test”

  4. Warmup • What is “warmup”? • Adaptation to posterior for

    efficient sampling • Samples during warmup NOT from posterior • Automatically discarded by precis/summary and other functions • Warmup is NOT “burn in”  &"4: ).$ warmup
  5. ǰ ǒ !1ʅ!!Ǒ1/&* ǰ ćFSF JT POF DIBOHF UP OPUF

    IFSF CVU UP FYQMBJO MBUFS ćF VOJGPSN QSJPS PO 0&$* IBT CFFO DIBOHFE UP B IBMG$Įłİĵņ QSJPS ćF $BVDIZ EJTUSJCVUJPO JT B VTFGVM UIJDLUBJMFE QSPCBCJMJUZ EJTUSJCVUJPO SFMBUFE UP UIF 4UVEFOU U EJTUSJCVUJPO ćFSFT BO 0WFSUIJOLJOH CPY BCPVU JU MBUFS JO UIF DIBQUFS QBHF   :PV DBO UIJOL PG JU BT B XFBLMZ SFHVMBSJ[JOH QSJPS GPS TUBOEBSE EFWJBUJPOT 8FMM VTF JU BHBJO MBUFS JO UIF DIBQUFS "OE ZPVMM IBWF NBOZ DIBODFT UP HFU VTFE UP JU BT UIF CPPL DPOUJOVFT #VU OPUF UIBU JU JT OPU OFDFTTBSZ UP VTF B IBMG$BVDIZ ćF VOJGPSN QSJPS XJMM TUJMM XPSL BOE B TJNQMF FYQPOFOUJBM QSJPS JT BMTP BQQSPQSJBUF *O UIJT FYBNQMF BT JO NBOZ UIFSF JT TP NVDI EBUB UIBU UIF QSJPS IBSEMZ NBUUFST ćFSF JT B QSBDUJDF QSPCMFN BU UIF FOE PG UIJT DIBQUFS UP HVJEF ZPV JO DPNQBSJOH UIFTF QSJPST "ęFS NFTTBHFT BCPVU USBOTMBUJOH DPNQJMJOH BOE TBNQMJOH TFF UIF 0WFSUIJOLJOH CPY MBUFS JO UIJT TFDUJPO GPS TPNF FYQMBOBUJPOT PG UIFTF NFTTBHFT *-ƿ01+ SFUVSOT BO PCKFDU UIBU DPOUBJOT B CVODI PG TVNNBSZ JOGPSNBUJPO BT XFMM BT TBNQMFT GSPN UIF QPTUFSJPS EJTUSJ CVUJPO PG BMM QBSBNFUFST :PV DBO DPNQBSF FTUJNBUFT 3 DPEF  -/" &0ǯ*DžǑƾ01+ǰ "+ 1!"3 ),4"/ ƽǑDždž 2--"/ ƽǑDždž +Ǯ"## %1  džǑƿǁ ƽǑƾǁ džǑƽǀ džǑǁDŽ ƿdžƾ ƾ  ǦƽǑƿƾ ƽǑƽDž ǦƽǑǀƿ ǦƽǑƽDŽ ǀƽǃ ƾ  ǦƾǑdžDŽ ƽǑƿǀ ǦƿǑǀƾ ǦƾǑǂDž ǀǂƾ ƾ  ƽǑǁƽ ƽǑƾǀ ƽǑƿƽ ƽǑǃǀ ǀǂƽ ƾ 0&$* ƽǑdžǂ ƽǑƽǂ ƽǑDžǃ ƾǑƽǀ ǂǃǃ ƾ Convergence diagnostics • n_eff: “effective” number of samples • n_eff/n < 0.1, be alarmed • R-hat –> Run multiple chains! • R-hat: crudely, ratio of variance between chains to variance within chains • Should approach 1 • Both may mislead
  6. A wild chain • Two observations: {–1,1} • Estimate mean

    and standard deviation FWFO GPS BO JOWBMJE DIBJO 4P WJFX JU QFSIBQT BT B TJHOBM PG EBOHFS CVU OFWFS PG TBGFUZ 'PS DPOWFOUJPOBM NPEFMT UIFTF NFUSJDT UZQJDBMMZ XPSL XFMM  5BNJOH B XJME DIBJO 0OF DPNNPO QSPCMFN XJUI TPNF NPEFMT JT UIBU UIFSF BSF CSPBE ĘBU SFHJPOT PG UIF QPTUFSJPS EFOTJUZ ćJT IBQQFOT NPTU PęFO BT ZPV NJHIU HVFTT XIFO POF VTFT ĘBU QSJPST ćF QSPCMFN UIJT DBO HFOFSBUF JT B XJME XBOEFSJOH .BSLPW DIBJO UIBU FSSBUJDBMMZ TBNQMFT FYUSFNFMZ QPTJUJWF BOE FYUSFNFMZ OFHBUJWF QBSBNFUFS WBMVFT -FUT MPPL BU B TJNQMF FYBNQMF ćF DPEF CFMPX USJFT UP FTUJNBUF UIF NFBO BOE TUBOEBSE EFWJBUJPO PG UIF UXP (BVTTJBO PCTFSWBUJPOT − BOE  #VU JU VTFT UPUBMMZ ĘBU QSJPST 3 DPEF  6 ʆǦ ǯǦƾǒƾǰ *DžǑƿ ʆǦ *-ƿ01+ǯ )&01ǯ 6 ʍ !+,/*ǯ *2 ǒ 0&$* ǰ ǒ *2 ʆǦ )-% ǰ ǒ !1ʅ)&01ǯ6ʅ6ǰ ǒ 01/1ʅ)&01ǯ)-%ʅƽǒ0&$*ʅƾǰ ǒ %&+0ʅƿ ǒ &1"/ʅǁƽƽƽ ǒ 4/*2-ʅƾƽƽƽ ǰ /PX MFUT MPPL BU UIF -/" &0 PVUQVU 0 1000 2000 3000 4000 ï ï  0 1000 2000 3000 400 0 10  'ĶĴłĿIJ Ɛƍ %JBHOPTJOH BOE IFBMJOH B TJDL .BSLPW DIBJO 5PQ SPX 5SBDF QMPU GSPN UXP JOEFQFOEFOU DIBJOT EFĕOFE CZ NPEFM *DžǑƿ ćFTF DIBJOT BSF OPU TUBUJPOBSZ BOE TIPVME OPU CF VTFE GPS JOGFSFODF #PUUPN SPX "EEJOH XFBLMZ JOGPSNBUJWF QSJPST TFF *DžǑǀ DMFBST VQ UIF DPOEJUJPO SJHIU BXBZ ćFTF DIBJOT BSF ĕOF UP VTF GPS JOGFSFODF DPEF  -/" &0ǯ*DžǑƿǰ "+ 1!"3 ),4"/ ƽǑDždž 2--"/ ƽǑDždž +Ǯ"## %1 )-% ƿƾǂDžǀǃdžƾ ǂǁǁǁDžǂǂƽ ǦƾdžǃƾƾƿDžDŽǑdžƿ ƾƿdždžƿƿDžƾƿ DŽ ƾǑǀǃ 0&$* ƾǀdžǀdždžǂdžǀ ƾƾǁDŽǂƾǁDŽǀDž ƿdžǑƽǃ ƾDžǂDžǃDžƾǃDŽ DŽƿ ƾǑƽƿ 8IPB ćPTF FTUJNBUFT DBOU CF SJHIU ćF NFBO PG − BOE  JT [FSP TP XFSF IPQ HFU B NFBO WBMVF GPS )-% BSPVOE [FSP *OTUFBE XF HFU DSB[Z WBMVFT BOE JNQMBVTJC
  7. 0 1000 2000 3000 4000 0.0e+00 2.5e+08 n_eff = 7

    alpha 0 0e+00 3e+10 sigma 0 1000 2000 3000 4000 ï ï  n_eff = 1121 alpha 0 0 10  sigma 3,07 $)"*/ .0/5& $"3-0 4000 n_eff = 7 0 1000 2000 3000 4000 0e+00 3e+10 n_eff = 72 sigma
  8. A wild chain • Problem is flat priors • Flat

    means flat forever • Little information in likelihood • Most probability is out to 30-million • King Monty’s car keeps driving • Flat prior is improper (integrates to infinity) • Fix with weakly informative priors 0 1000 2000 3000 4000 0.0e+00 2.5e+08 n_eff = 7 alpha 0e+00 3e+10 s 0 1000 2000 3000 4000 ï ï  n_eff = 1121 alpha 0 10  s 'ĶĴłĿIJ Ɛƍ %JBHOPTJOH BOE IFBMJOH B TJD QMPU GSPN UXP JOEFQFOEFOU DIBJOT EFĕOFE OPU TUBUJPOBSZ BOE TIPVME OPU CF VTFE GPS XFBLMZ JOGPSNBUJWF QSJPST TFF *DžǑǀ DMF ćFTF DIBJOT BSF ĕOF UP VTF GPS JOGFSFODF 0 10 20 alpha prior posterior 0 2 4 6 8 10 0.0 0.2 sigma De Ǝ 1SJPS EBTIFE BOE QPTUFSJPS CMVF GPS UIF NPEFM XJUI XFBLMZ F QSJPST *DžǑǀ &WFO XJUI POMZ UXP PCTFSWBUJPOT UIF MJLFMJIPPE SDPNFT UIFTF QSJPST :FU UIF NPEFM DBOOPU CF TVDDFTTGVMMZ FTUJ IPVU UIFN BLF NVDI JOGPSNBUJPO JO UIF QSJPS UP TUPQ UIJT GPPMJTIOFTT FWFO XJUIPVU PE -FUT VTF UIJT NPEFM ZJ ∼ /PSNBM(µ, σ) µ = α α ∼ /PSNBM(, ) σ ∼ )BMG$BVDIZ(, ) BLMZ JOGPSNBUJWF QSJPST GPS α BOE σ 8FMM QMPU UIFTF QSJPST JO B NPNFOU
  9. A wild chain α ∼ /PSNBM(, ) σ ∼ )BMG$BVDIZ(,

    ) *WF KVTU BEEFE XFBLMZ JOGPSNBUJWF QSJPST GPS α BOE σ 8FMM QMPU UIFTF QSJPST JO B NPNFOU TP ZPV XJMM CF BCMF UP TFF KVTU IPX XFBL UIFZ BSF #VU MFUT SFFTUJNBUF ĕSTU 3 DPEF  *DžǑǀ ʆǦ *-ƿ01+ǯ )&01ǯ 6 ʍ !+,/*ǯ *2 ǒ 0&$* ǰ ǒ *2 ʆǦ )-% ǒ )-% ʍ !+,/*ǯ ƾ ǒ ƾƽ ǰ ǒ 0&$* ʍ ! 2 %6ǯ ƽ ǒ ƾ ǰ ǰ ǒ !1ʅ)&01ǯ6ʅ6ǰ ǒ 01/1ʅ)&01ǯ)-%ʅƽǒ0&$*ʅƾǰ ǒ %&+0ʅƿ ǒ &1"/ʅǁƽƽƽ ǒ 4/*2-ʅƾƽƽƽ ǰ -/" &0ǯ*DžǑǀǰ "+ 1!"3 ),4"/ ƽǑDždž 2--"/ ƽǑDždž +Ǯ"## %1 )-% ǦƽǑƽƾ ƾǑǃƽ ǦƾǑdžDž ƿǑǀDŽ ƾƾƿƾ ƾ 0&$* ƾǑdžDž ƾǑdžƾ ƽǑǁDŽ ǀǑǁǂ ƾƽDŽDŽ ƾ ćBUT NVDI CFUUFS 5BLF B MPPL BU UIF CPUUPN SPX JO 'ĶĴłĿIJ Ɛƍ ćJT USBDF QMPU MPPLT IFBMUIZ #PUI DIBJOT BSF TUBUJPOBSZ BSPVOE UIF TBNF WBMVFT BOE NJYJOH JT HPPE /P NPSF XJME EFUPVST PČ UP OFHBUJWF  NJMMJPO 5P BQQSFDJBUF XIBU IBT IBQQFOFE UBLF B MPPL BU UIF QSJPST EBTIFE BOE QPTUFSJPST CMVF JO 'ĶĴłĿIJ ƐƎ #PUI UIF (BVTTJBO QSJPS GPS α BOE UIF $BVDIZ QSJPS GPS σ DPOUBJO WFSZ HSBEVBM
  10.   ."3,07 $)"*/ .0/5& $"3-0 0 1000 2000 3000

    4000 0.0e+00 2.5e+08 n_eff = 7 alpha 0 1000 2000 3000 4000 0e+00 3e+10 n_eff = 72 sigma 0 1000 2000 3000 4000 ï ï  n_eff = 1121 alpha 0 1000 2000 3000 4000 0 10  n_eff = 1077 sigma 'ĶĴłĿIJ Ɛƍ %JBHOPTJOH BOE IFBMJOH B TJDL .BSLPW DIBJO 5PQ SPX 5SBDF QMPU GSPN UXP JOEFQFOEFOU DIBJOT EFĕOFE CZ NPEFM *DžǑƿ ćFTF DIBJOT BSF OPU TUBUJPOBSZ BOE TIPVME OPU CF VTFE GPS JOGFSFODF #PUUPN SPX "EEJOH XFBLMZ JOGPSNBUJWF QSJPST TFF *DžǑǀ DMFBST VQ UIF DPOEJUJPO SJHIU BXBZ
  11. A wild chain Even with only 2 observations, these priors

    have no effect on inference! Except to allow you to make inferences...  $"3& "/% '&&%*/( 0' :063 ."3,07 $)"*/  -10 0 10 20 0.0 0.1 0.2 0.3 0.4 alpha Density prior posterior 0 2 4 6 8 10 0.0 0.2 0.4 0.6 sigma Density 'ĶĴłĿIJ ƐƎ 1SJPS EBTIFE BOE QPTUFSJPS CMVF GPS UIF NPEFM XJUI XFBLMZ JOGPSNBUJWF QSJPST (Ǖǡǐ &WFO XJUI POMZ UXP PCTFSWBUJPOT UIF MJLFMJIPPE FBTJMZ PWFSDPNFT UIFTF QSJPST :FU UIF NPEFM DBOOPU CF TVDDFTTGVMMZ FTUJ NBUFE XJUIPVU UIFN ćBUT NVDI CFUUFS 5BLF B MPPL BU UIF SJHIUIBOE QBOFM JO 'ĶĴłĿIJ Ɛƍ ćJT USBDF QMPU MPP -10 0 10 20 0.0 0.1 0.2 alpha Dens prior posterior 0 2 4 0.0 0.2 si Dens 'ĶĴłĿIJ ƐƎ 1SJPS EBTIFE BOE QPTUFSJPS CMVF GPS UIF NPE JOGPSNBUJWF QSJPST *DžǑǀ &WFO XJUI POMZ UXP PCTFSWBUJPOT FBTJMZ PWFSDPNFT UIFTF QSJPST :FU UIF NPEFM DBOOPU CF TV NBUFE XJUIPVU UIFN #VU JU EPFTOU UBLF NVDI JOGPSNBUJPO JO UIF QSJPS UP TUPQ UIJT GPP B TUSPOHFS MJLFMJIPPE -FUT VTF UIJT NPEFM ZJ ∼ /PSNBM(µ, σ) µ = α α ∼ /PSNBM(, ) σ ∼ )BMG$BVDIZ(, )
  12. Unidentified 3 DPEF  +ʅƽ ǒ 0!ʅƾ ǰ XF LOPX

    UIF SJHIU BOTXFS ćFO XF ĕU UIJT NPEFM ZJ ∼ /PSNBM(µ, σ) µ = α + α σ ∼ )BMG$BVDIZ(, ) OT UXP QBSBNFUFST α BOE α XIJDI DBOOPU CF JEFOUJĕFE 0OMZ UIFJS OE JU TIPVME CF BCPVU [FSP BęFS FTUJNBUJPO W DIBJO BOE TFF XIBU IBQQFOT ćJT DIBJO JT HPJOH UP UBLF NVDI POFT #VU JU TIPVME TUJMM ĕOJTI BęFS B GFX NJOVUFT 3 DPEF  ǒ 0&$* ǰ ǒ ǒ %6ǯ ƽ ǒ ƾ ǰ 01/1ʅ)&01ǯƾʅƽǒƿʅƽǒ0&$*ʅƾǰ ǒ ƽƽƽ ǒ 4/*2-ʅƾƽƽƽ ǰ  $"3& "/% '&&%*/( 0' :0 6 ʆǦ /+,/*ǯ ƾƽƽ ǒ *"+ʅƽ ǒ 0!ʅƾ ǰ #Z TJNVMBUJOH UIF EBUB XF LOPX UIF SJHIU BOTXFS ZJ ∼ /PSNBM(µ, µ = α + α σ ∼ )BMG$BVDI ćF MJOFBS NPEFM DPOUBJOT UXP QBSBNFUFST α BOE α TVN DBO CF JEFOUJĕFE BOE JU TIPVME CF BCPVU [FSP -FUT SVO UIF .BSLPW DIBJO BOE TFF XIBU IBQ MPOHFS UIBO UIF QSFWJPVT POFT #VU JU TIPVME TUJMM ĕO *DžǑǁ ʆǦ *-ƿ01+ǯ )&01ǯ 6 ʍ !+,/*ǯ *2 ǒ 0&$* ǰ ǒ *2 ʆǦ ƾ ʀ ƿ ǒ 0&$* ʍ ! 2 %6ǯ ƽ ǒ ƾ ǰ ǰ ǒ !1ʅ)&01ǯ6ʅ6ǰ ǒ 01/1ʅ)&01ǯƾʅƽǒƿʅƽǒ0 #Z TJNVMBUJOH UIF EBUB XF LOPX UIF SJHIU BOTXFS ćFO XF ĕU UIJT NPEFM ZJ ∼ /PSNBM(µ, σ) µ = α + α σ ∼ )BMG$BVDIZ(, ) ćF MJOFBS NPEFM DPOUBJOT UXP QBSBNFUFST α BOE α XIJDI DBOOPU CF JEFOUJĕFE 0 TVN DBO CF JEFOUJĕFE BOE JU TIPVME CF BCPVU [FSP BęFS FTUJNBUJPO -FUT SVO UIF .BSLPW DIBJO BOE TFF XIBU IBQQFOT ćJT DIBJO JT HPJOH UP UBL MPOHFS UIBO UIF QSFWJPVT POFT #VU JU TIPVME TUJMM ĕOJTI BęFS B GFX NJOVUFT *DžǑǁ ʆǦ *-ƿ01+ǯ )&01ǯ 6 ʍ !+,/*ǯ *2 ǒ 0&$* ǰ ǒ *2 ʆǦ ƾ ʀ ƿ ǒ 0&$* ʍ ! 2 %6ǯ ƽ ǒ ƾ ǰ ǰ ǒ !1ʅ)&01ǯ6ʅ6ǰ ǒ 01/1ʅ)&01ǯƾʅƽǒƿʅƽǒ0&$*ʅƾǰ ǒ %&+0ʅƿ ǒ &1"/ʅǁƽƽƽ ǒ 4/*2-ʅƾƽƽƽ ǰ -/" &0ǯ*DžǑǁǰ "+ 1!"3 ),4"/ ƽǑDždž 2--"/ ƽǑDždž +Ǯ"## %1 ƾ ǦƾƾdžǁǑDŽǃ ƾǀǁǁǑƾdž ǦƿdžƿDžǑǃƿ ƾƽǂǀǑǂƿ ƾ ƿǑDžǀ ƿ ƾƾdžǁǑDžƾ ƾǀǁǁǑƾdž ǦƾƽǂǁǑDžǃ ƿdžƿDŽǑǀdž ƾ ƿǑDžǀ 0&$* ƽǑdžƿ ƽǑƽDŽ ƽǑDžƾ ƾǑƽƿ ƾDŽ ƾǑƾǀ ćPTF FTUJNBUFT MPPL TVTQJDJPVT BOE UIF +Ǯ"## BOE %1 WBMVFT BSF UFSSJCMF ćF N
  13. 0 1000 2000 3000 4000 ï ï 1000 n_eff =

    1 a1 0 1000 2000 3000 4000 ï 1000 3000 n_eff = 1 a2 0 1000 2000 3000 4000 0.8 1.0 1.2 n_eff = 17 sigma 0 1000 ï 0 20 a1 0 1000 ï ï 10 a2 0 1000 0.8 1.0 1.2 sigma
  14. Unidentified • Use weak priors, again ćPTF FTUJNBUFT MPPL TVTQJDJPVT

    BOE UIF +Ǯ"## BOE %1 WBMVFT BSF UFSSJCMF ćF NFBOT GPS ƾ BOE ƿ BSF BMNPTU FYBDUMZ UIF TBNF EJTUBODF GSPN [FSP CVU PO PQQPTJUF TJEFT "OE UIF TUBOEBSE EFWJBUJPOT PG UIF DIBJOT BSF NBTTJWF ćJT JT PG DPVSTF B SFTVMU PG UIF GBDU UIBU XF DBOOPU TJNVMUBOFPVTMZ FTUJNBUF ƾ BOE ƿ CVU POMZ UIFJS TVN -PPLJOH BU UIF USBDF QMPU SFWFBMT NPSF ćF MFę DPMVNO JO 'ĶĴłĿIJ ƐƏ TIPXT UXP .BSLPW DIBJOT GSPN UIF NPEFM BCPWF ćFTF DIBJOT EP OPU MPPL MJLF UIFZ BSF TUBUJPOBSZ OPS EP UIFZ TFFN UP CF NJYJOH WFSZ XFMM *OEFFE XIFO ZPV TFF B QBUUFSO MJLF UIJT JU JT SFBTPO UP XPSSZ %POU VTF UIFTF TBNQMFT "HBJO XFBL QSJPST DBO SFTDVF VT /PX UIF NPEFM ĕUUJOH DPEF JT 3 DPEF  *DžǑǂ ʆǦ *-ƿ01+ǯ )&01ǯ 6 ʍ !+,/*ǯ *2 ǒ 0&$* ǰ ǒ *2 ʆǦ ƾ ʀ ƿ ǒ ƾ ʍ !+,/*ǯ ƽ ǒ ƾƽ ǰ ǒ ƿ ʍ !+,/*ǯ ƽ ǒ ƾƽ ǰ ǒ 0&$* ʍ ! 2 %6ǯ ƽ ǒ ƾ ǰ ǰ ǒ !1ʅ)&01ǯ6ʅ6ǰ ǒ 01/1ʅ)&01ǯƾʅƽǒƿʅƽǒ0&$*ʅƾǰ ǒ %&+0ʅƿ ǒ &1"/ʅǁƽƽƽ ǒ 4/*2-ʅƾƽƽƽ ǰ -/" &0ǯ*DžǑǂǰ 0 1000 2000 3000 4000 ï 1000 300 0 1000 2000 3000 4000 0.8 1.0 1.2 n_eff = 17 sigma 0 1000 2000 3000 ï ï 10 0 1000 2000 3000 0.8 1.0 1.2 n_eff = sigma 'ĶĴłĿIJ ƐƏ -Fę DPMVNO " DIBJO XJUI XBOEFSJOH QBSBNFUFST ƾ BOE  HFOFSBUFE CZ *DžǑǁ 3JHIU DPMVNO 4BNF NPEFM CVU OPX XJUI XFBLMZ JOGP NBUJWF QSJPST *DžǑǂ "+ 1!"3 ),4"/ ƽǑDždž 2--"/ ƽǑDždž +Ǯ"## %1 ƾ ǦƽǑƿǀ ǃǑdžDŽ ǦƾƾǑƿǂ ƾƽǑdžǂ ƾƿƿǀ ƾ ƿ ƽǑƿDž ǃǑdžDŽ ǦƾƽǑDžǂ ƾƾǑǀǃ ƾƿƿǁ ƾ 0&$* ƽǑdžǀ ƽǑƽDŽ ƽǑDžƿ ƾǑƽǁ ƾǃǀƾ ƾ ćF FTUJNBUFT GPS ƾ BOE ƿ BSF CFUUFS JEFOUJĕFE OPX "OE UBLF B MPPL BU UIF SJHI
  15.   ."3,07 $)"*/ .0/5& $"3-0 0 1000 2000 3000

    4000 ï ï 1000 n_eff = 1 a1 0 1000 2000 3000 4000 ï 1000 3000 n_eff = 1 a2 0 1000 2000 3000 4000 0.8 1.0 1.2 n_eff = 17 sigma 0 1000 2000 3000 4000 ï 0 20 n_eff = 1223 a1 0 1000 2000 3000 4000 ï ï 10 n_eff = 1224 a2 0 1000 2000 3000 4000 0.8 1.0 1.2 n_eff = 1631 sigma 'ĶĴłĿIJ ƐƏ -Fę DPMVNO " DIBJO XJUI XBOEFSJOH QBSBNFUFST ƾ BOE ƿ
  16. Homework • Problems 8H1, 8H2, 8H3 • Next week: Generalized

    Linear Models (GLMs), Chapters 9, 10, 11 • Next next week: Holiday break, resume in 2018
  17. 1 2 3 4 5 3 4 5 6 7

    8 9 10 11 1 2 100 pebbles
  18. 1 2 3 4 5

  19. 1 2 3 4 5 100

  20. 1 2 3 4 5 100

  21. 1 2 3 4 5 5 22 12 37 24

  22. 1 2 3 4 5 n1 n2 n3 n4 n5

  23. 1 2 3 4 5 n1 n2 n3 n4 n5

     = ! (g!(h!(i!(j!(k! &** = MPH PG QSPEVDU PG BWFSBHF MJLFMJIPPET = TVN PG MPHT PG BWFSBHF MJLFMJIPPET &** =  #=g &)! ,(3#|θ) ,(θ)θ Number of ways:
  24. Suppose only 10 pebbles.... 1 way 1 2 3 4

    5 bucket pebbles 0 5 10 10
  25. Suppose only 10 pebbles.... 1 way 1 2 3 4

    5 bucket pebbles 0 5 10 10 1 2 3 4 5 bucket pebbles 0 5 10 1 8 1
  26. Suppose only 10 pebbles.... 1 way 90 ways 1 2

    3 4 5 bucket pebbles 0 5 10 10 1 2 3 4 5 bucket pebbles 0 5 10 1 8 1
  27. Suppose only 10 pebbles.... 1 way 90 ways 1 2

    3 4 5 bucket pebbles 0 5 10 10 1 2 3 4 5 bucket pebbles 0 5 10 1 8 1 1 2 3 4 5 bucket pebbles 0 5 10 2 6 2
  28. Suppose only 10 pebbles.... 1 way 90 ways 1 2

    3 4 5 bucket pebbles 0 5 10 10 1 2 3 4 5 bucket pebbles 0 5 10 1 8 1 1 2 3 4 5 bucket pebbles 0 5 10 2 6 2 1260 ways
  29. Suppose only 10 pebbles.... 1 way 90 ways 1 2

    3 4 5 bucket pebbles 0 5 10 10 1 2 3 4 5 bucket pebbles 0 5 10 1 8 1 1 2 3 4 5 bucket pebbles 0 5 10 2 6 2 1260 ways 1 2 3 4 5 bucket pebbles 0 5 10 1 2 4 2 1
  30. Suppose only 10 pebbles.... 1 way 90 ways 1 2

    3 4 5 bucket pebbles 0 5 10 10 1 2 3 4 5 bucket pebbles 0 5 10 1 8 1 1 2 3 4 5 bucket pebbles 0 5 10 2 6 2 1260 ways 1 2 3 4 5 bucket pebbles 0 5 10 1 2 4 2 1 37800 ways
  31. 1 2 3 4 5 bucket pebbles 0 5 10

    2 6 2 1260 ways 1 2 3 4 5 bucket pebbles 0 5 10 1 2 4 2 1 37800 ways 1 2 3 4 5 bucket pebbles 0 5 10 2 2 2 2 2 113400 ways
  32. 1 2 3 4 5 n1 n2 n3 n4 n5

    For large N:  = ! (g!(h!(i!(j!(k! g  &)!  ≈ − # (#  &)! (#  = − # *# &)! *# &** = &)! ) *,)/. ) 0,! &#%&#"))- = -/' ) &)!- ) 0,! &#%&#"))- &** =  #=g &)! ,(3#|θ) ,(θ)θ =  &)!  θ ,(3#|θ)
  33. 1 2 3 4 5 n1 n2 n3 n4 n5

    For large N:  = ! (g!(h!(i!(j!(k! g  &)!  ≈ − # (#  &)! (#  = − # *# &)! *# &** = &)! ) *,)/. ) 0,! &#%&#"))- = -/' ) &)!- ) 0,! &#%&#"))- &** =  #=g &)! ,(3#|θ) ,(θ)θ =  &)!  θ ,(3#|θ)
  34. Maximum entropy • Due to Edwin T. Jaynes (1922–1998) •

    The maxent principle: • Distribution with largest entropy is distribution most consistent with stated assumptions • Can happen the largest number of ways • For parameters, provides way to construct priors • For observations, way to construct likelihood • Also reproduces Bayesian updating as special case (minimum cross-entropy) E. T. Jaynes (1922–1998)
  35. Maximum entropy • Due to Edwin T. Jaynes (1922–1998) •

    The maxent principle: • Distribution with largest entropy is distribution most consistent with stated assumptions • Can happen the largest number of ways • For parameters, provides way to construct priors • For observations, way to construct likelihood • Also reproduces Bayesian updating as special case (minimum cross-entropy) E. T. Jaynes (1922–1998)
  36. Maximum entropy • Ye olde information entropy: • Q: What

    kind of distribution maximizes this quantity? • A: Flattest distribution still consistent with constraints. This is the distribution that can happen the most unique ways. • Whatever does happen, bound to be one of those ways.  .BYJNVN FOUSPQZ $IBQUFS  ZPV NFU UIF CBTJDT PG JOGPSNBUJPO UIFPSZ *O CSJFG XF TFFL B NFBT UBJOUZ UIBU TBUJTĕFT UISFF DSJUFSJB  UIF NFBTVSF TIPVME CF DPOUJOVPVT  JU T TF BT UIF OVNCFS PG QPTTJCMF FWFOUT JODSFBTFT BOE  JU TIPVME CF BEEJUJWF ć H VOJRVF NFBTVSF PG UIF VODFSUBJOUZ PG B QSPCBCJMJUZ EJTUSJCVUJPO Q XJUI QSPCB FBDI QPTTJCMF FWFOU J UVSOT PVU UP CF KVTU UIF BWFSBHF MPHQSPCBCJMJUZ )(Q) = − J QJ MPH QJ VODUJPO JT LOPXO BT JOGPSNBUJPO FOUSPQZ
  37. Uniform distribution • Constraints: bounded between a and b •

    Maxent distribution is uniform, because flattest • What if there are other constraints, such that flat is impossible? 0.0 1.0 2.0 Density a b entropy = 0 0.0 1.0 2.0 Density a b entropy = -0.19 0.0 1.0 2.0 Density a b entropy = -0.13
  38. 1 2 3 4 5 3 4 5 6 7

    8 9 10 11 1 2 100 pebbles
  39. 1 2 3 4 5

  40. 0 2 1 3 4 5 6 7 8 –1

    –2 –8 –7 –6 –5 –4 –3 Constraint: variance must equal 1
  41. Revisit Gaussian • Constraints: unbounded real, finite variance • Add

    up fluctuations, distribution of sums converges to Gaussian • Why? Vastly many more ways to realize Gaussian than another shape. • Flattest distribution with given variance • Ergo, Gaussian has maxent for all continuous, unbounded distributions with finite variance -4 -2 0 2 4 0.0 0.2 0.4 0.6 value Density 'ĶĴłĿIJ ƑƉ .BYJNVN FOUSPQZ BOE UI QBSJTPO PG (BVTTJBO CMVF BOE TFWFSBM UIF TBNF WBSJBODF 3JHIU &OUSPQZ JT N FSBMJ[FE OPSNBM EJTUSJCVUJPO NBUDIFT U  HFOFSBMJ[FE EJTUSJCVUJPOT XJUI FRVBM WBSJBODF CZ UIF QSPCBCJMJUZ EFOTJUZ 1S(Z|µ, α, β) = αΓ 8F XBOU UP DPNQBSF B SFHVMBS (BVTTJBO EJTUSJC  ."9*.6. &/5301: -4 -2 0 2 4 0.0 0.2 0.4 0.6 value Density 1.0 1.5 2.0 2.5 3.0 3.5 4.0 1.36 1.38 1.40 1.42 shape entropy
  42. Revisit binomial • Constraints: binary outcomes, constant expected value across

    trials • Maxent now is binomial QFDUFE WBMVF *O CPUI FYBNQMFT XF IBWF UP BTTJHO QSPCBCJMJUZ UP FBDI F LFFQJOH UIF FYQFDUFE WBMVF PG UIF EJTUSJCVUJPO DPOTUBOU "OE JO CPUI EJTUSJCVUJPO UIBU NBYJNJ[FT FOUSPQZ JT UIF CJOPNJBM EJTUSJCVUJPO XJUI F NQMF 4VQQPTF BHBJO MJLF JO $IBQUFS  UIBU XF IBWF B CBH XJUI BO MVF BOE XIJUF NBSCMFT XJUIJO JU 8F ESBX UXP NBSCMFT GSPN UIF CBH SF BSF UIFSFGPSF GPVS QPTTJCMF TFRVFODFT  UXP XIJUF NBSCMFT  XIJUF  POF XIJUF BOE UIFO POF CMVF BOE  UXP CMVF NBSCMFT PCBCJMJUJFT UP FBDI PG UIFTF QPTTJCMF PVUDPNFT 4VQQPTF XF LOPX UIBU G CMVF NBSCMFT PWFS UXP ESBXT JT FYBDUMZ  ćJT JT UIF FYQFDUFE WBMVF CVUJPOT XFMM DPOTJEFS CVUJPO XJUI UIF CJHHFTU FOUSPQZ -FUT DPOTJEFS GPVS DBOEJEBUF EJTUSJ łĿIJ ƑƊ )FSF BSF UIF QSPCBCJMJUJFT UIBU EFĕOF FBDI EJTUSJCVUJPO %JTUSJCVUJPO XX CX XC CC "     #     $     %     CJOPNJBM EJTUSJCVUJPO XJUI O =  BOE Q = . ćF PVUDPNFT CX MBQTFE JOUP UIF TBNF PVUDPNF UZQF #VU JO QSJODJQMF UIFZ BSF EJČFS XF DBSF BCPVU UIF PSEFS PG PVUDPNFT PS OPU 4P UIF DPSSFTQPOEJOH    #*( &/5301: "/% 5)& (&/ ww bw wb bb ww bw wb bb ww bw wb bb ww bw wb bb A B C D 'ĶĴ UIF ESB DPS UIF BOE (1.39) (1.33) (1.33) (1.21)
  43. Revisit binomial • Constraints: binary outcomes, constant expected value across

    trials • Maxent now is binomial • e.g.: 2 trials, expected value 1.4   #*( &/5301: "/% 5)& (&/&3"-*;&% -*/&"3 .0%&- ww bw wb bb ww bw wb bb ww bw wb bb ww bw wb bb 0.7 0.8 0.9 1.0 1.1 1.2 0 2 4 6 8 Entropy Density A B C D A B C D binomial
  44. Generalized Linear Models • Goal: Connect linear model to outcome

    variable • Strategy: 1. Pick an outcome distribution 2. Model its parameters using links to linear models 3. Compute posterior • Can model multivariate relationships and non- linear responses • Building blocks of multilevel models