Slide 1

Slide 1 text

Week 4: Model Comparison Richard McElreath Statistical Rethinking

Slide 2

Slide 2 text

http://facultyweb.berry.edu/ttimberlake/copernican/

Slide 3

Slide 3 text

Ockham’s Razor? William of Ockham (c.1288–c.1348) Numquam ponenda est pluralitas sine necessitate. (Plurality should never be posited without necessity.)

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

Ulysses’ Compass • Two major hazards: 1. Too simple 2. Too complex

Slide 6

Slide 6 text

Stargazing • Stargazing: Using asterisks (p < 0.05) to decide which variables improve prediction • Arbitrary 5% is arbitrary; doesn’t optimize anything Coefficients: Estimate Std. Error z value Pr(z) a 1.5699e+02 9.3802e-16 1.6736e+17 < 2.2e-16 *** b1 1.6540e-01 6.6628e-14 2.4825e+12 < 2.2e-16 *** b2 -4.7063e-02 3.2586e-13 -1.4443e+11 < 2.2e-16 *** b3 1.9168e-03 5.6805e-11 3.3743e+07 < 2.2e-16 *** b4 -1.4002e-05 6.6694e-11 -2.0994e+05 < 2.2e-16 *** b5 -4.7965e-07 4.7818e-08 -1.0031e+01 < 2.2e-16 *** b6 6.6002e-09 9.5819e-10 6.8882e+00 5.651e-12 *** tau 1.2132e-01 5.2829e-20 2.2965e+18 < 2.2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 * * *

Slide 7

Slide 7 text

Goals this week • Understand overfitting and underfitting • Learn AIC/DIC/WAIC as ways to: • guard against overfitting and underfitting • explicitly compare models • Introduce regularizing priors as complementary strategy • Learn how to average predictions across models AIC DIC WAIC

Slide 8

Slide 8 text

A B C D E F

Slide 9

Slide 9 text

The Problem with Parameters • Underfitting: Learning too little from the data. Too simple models both fit and predict poorly. • Overfitting: Learning too much from the data. Complex models always fit better, but often predict worse. • Need to find a model that navigates between underfitting and overfitting

Slide 10

Slide 10 text

The Problem with Parameters Figure 6.2 afarensis sapiens   07&3'*55*/( 3&(6-"3*;"5*0/ "/% */'03."5*0/ $3*5&3*" 30 40 50 60 70 600 800 1000 1200 body mass (kg) brain volume (cc) afarensis africanus habilis boisei rudolfensis ergaster sapiens 'ĶĴłĿIJ ƎƊ "WFSBHF CSBJO WPMVNF UJNFUFST BHBJOTU CPEZ NBTT JO LJMPH IPNJOJO TQFDJFT 8IBU NPEFM CFTU SFMBUJPOTIJQ CFUXFFO CSBJO TJ[F BOE iWBSJBODF FYQMBJOFE w 3 JT EFĕOFE BT

Slide 11

Slide 11 text

Hominin brains • Simplest model:  .0%&- 4&-&$5*0/ $0.1"3*40/ "/% "7&3"(*/( NPEFM UIBU SFMBUFT CSBJO TJ[F UP CPEZ TJ[F JT UIF MJOFBS POF *U XJMM FM XF DPOTJEFS WJ ∼ /PSNBM(µJ, σ) µJ = α + β NJ Z UIBU UIF BWFSBHF CSBJO WPMVNF WJ PG TQFDJFT J JT B MJOFBS GVODUJPO NJ 1SJPST BSF OFDFTTBSJMZ ĘBU IFSF TJODF XFSF VTJOH )* /PX ĕU F EBUB VTJOH )* &+ ʍ *00 ǒ !1ʅ! ǰ OH UP QMPU UIF ĕU NPEFM MJLF XF EJE JO QSFWJPVT DIBQUFST MFUT GPDVT PG IPX XFMM UIJT NPEFM ĕUT UIF EBUB ćF DPOWFOUJPOBM NFBTVSF FYU JT 3 UIF QSPQPSUJPO PG WBSJBODF iFYQMBJOFEw CZ UIF NPEFM *   .0%&- 4&-&$5*0/ $0.1"3*40/ "/% "7&3"(*/( ćF TJNQMFTU NPEFM UIBU SFMBUFT CSBJO TJ[F UP CPEZ TJ[F JT UIF MJOFBS POF *U XJMM CF UIF ĕSTU NPEFM XF DPOTJEFS WJ ∼ /PSNBM(µJ, σ) µJ = α + β NJ ćJT JT KVTU UP TBZ UIBU UIF BWFSBHF CSBJO WPMVNF WJ PG TQFDJFT J JT B MJOFBS GVODUJPO PG JUT CPEZ NBTT NJ 1SJPST BSF OFDFTTBSJMZ ĘBU IFSF TJODF XFSF VTJOH )* /PX ĕU UIJT NPEFM UP UIF EBUB VTJOH )* 3 DPEF  *ǃǑƾ ʆǦ )*ǯ /&+ ʍ *00 ǒ !1ʅ! ǰ *OTUFBE PG QBVTJOH UP QMPU UIF ĕU NPEFM MJLF XF EJE JO QSFWJPVT DIBQUFST MFUT GPDVT PO UIF RVFTUJPO PG IPX XFMM UIJT NPEFM ĕUT UIF EBUB ćF DPOWFOUJPOBM NFBTVSF PG ĕU JO UIJT DPOUFYU JT 3 UIF QSPQPSUJPO PG WBSJBODF iFYQMBJOFEw CZ UIF NPEFM * QVU iFYQMBJOFEw JO TDBSF RVPUFT CFDBVTF FYQMBOBUJPO JNQMJFT VOEFSTUBOEJOH BOE UIBU NBZ OPU CF UIF DBTF XJUI TVDI NPEFMT 8IBU JT SFBMMZ NFBOU IFSF JT UIBU UIF MJOFBS NPEFM SFUSPEJDUT TPNF QSPQPSUJPO PG UIF UPUBM WBSJBUJPO JO UIF EBUB JU XBT ĕU  5)& 130#-&. 35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) R^2 = 0.49 (a) 800 1200 ain volume (cc) R^2 = 0.68 (c)

Slide 12

Slide 12 text

Hominin brains • Why not parabola? IF CPUUPN PG UIF PVUQVU GSPN 02**/6ǯ*ǃǑƾǰ BOE ZPVMM ĕOE UIF BCFMFE UIFSF i.VMUJQMF 3TRVBSFEw NF PUIFS NPEFMT UP DPNQBSF UP UIF ĕU PG *ǃǑƾ 8FMM DPOTJEFS ĕWF BDI NPSF DPNQMFY UIBO UIF MBTU &BDI PG UIFTF NPEFMT XJMM KVTU PG IJHIFS EFHSFF 'PS FYBNQMF B TFDPOEEFHSFF QPMZOPNJBM UIBU UP CSBJO TJ[F JT B QBSBCPMB *O NBUI GPSN JU JT WJ ∼ /PSNBM(µJ, σ) µJ = α + β NJ + β N J JMZ BEET POF NPSF QBSBNFUFS β CVU VTFT BMM PG UIF TBNF EBUB BT PEFM UP UIF EBUB &+ ʍ *00 ʀ ǯ*00ʋƿǰ ǒ !1ʅ! ǰ HF  JG UIBU ǯ*00ʋƿǰ UIJOH DPOGVTFT ZPV UIF SFTU PG UIF NPEFM GBNJMJFT ćF NPEFMT *ǃǑǀ UISPVHI *ǃǑǃ BSF F GPVSUIEFHSFF ĕęIEFHSFF BOE TJYUIEFHSFF QPMZOPNJBMT CVJMU UP ćF SFNBJOJOH WBSJBUJPO JT KVTU UIF WBSJBUJPO PG UIF SFTJEVBMT QBHF   :PV DBO FBTJMZ DPNQVUF 3 ZPVSTFMG XJUI 3 DPEF  ƾ Ǧ 3/ǯ/"0&!ǯ*ǃǑƾǰǰǵ3/ǯ!ɢ/&+ǰ DZƾDz ƽǑǁdžƽƾǂDž 5BLF B MPPL BU UIF CPUUPN PG UIF PVUQVU GSPN 02**/6ǯ*ǃǑƾǰ BOE ZPVMM ĕOE UIF TBNF OVNCFS MBCFMFE UIFSF i.VMUJQMF 3TRVBSFEw -FUT HFU TPNF PUIFS NPEFMT UP DPNQBSF UP UIF ĕU PG *ǃǑƾ 8FMM DPOTJEFS ĕWF NPSF NPEFMT FBDI NPSF DPNQMFY UIBO UIF MBTU &BDI PG UIFTF NPEFMT XJMM KVTU CF B QPMZOPNJBM PG IJHIFS EFHSFF 'PS FYBNQMF B TFDPOEEFHSFF QPMZOPNJBM UIBU SFMBUFT CPEZ TJ[F UP CSBJO TJ[F JT B QBSBCPMB *O NBUI GPSN JU JT WJ ∼ /PSNBM(µJ, σ) µJ = α + β NJ + β N J ćJT NPEFM GBNJMZ BEET POF NPSF QBSBNFUFS β CVU VTFT BMM PG UIF TBNF EBUB BT *ǃǑƾ 'JU UIJT NPEFM UP UIF EBUB 3 DPEF  *ǃǑƿ ʆǦ )*ǯ /&+ ʍ *00 ʀ ǯ*00ʋƿǰ ǒ !1ʅ! ǰ -PPL CBDL BU QBHF  JG UIBU ǯ*00ʋƿǰ UIJOH DPOGVTFT ZPV /PX MFUT ĕU UIF SFTU PG UIF NPEFM GBNJMJFT ćF NPEFMT *ǃǑǀ UISPVHI *ǃǑǃ BSF KVTU UIJSEEFHSFF GPVSUIEFHSFF ĕęIEFHSFF BOE TJYUIEFHSFF QPMZOPNJBMT CVJMU BOE ĕU JO UIF TBNF XBZ )FSF JT UIF DPEF UP ĕU BMM PG UIFN UP UIF EBUB 3 DPEF  *ǃǑǀ ʆǦ )*ǯ /&+ ʍ *00 ʀ ǯ*00ʋƿǰ ʀ ǯ*00ʋǀǰ ǒ !1ʅ! ǰ  5)& 130#-&. 8*5) 1"3".&5&34 35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) R^2 = 0.49 (a) 35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) R^2 = 0.54 (b) 800 1200 brain volume (cc) R^2 = 0.68 (c) 800 1200 brain volume (cc) R^2 = 0.81 (d)

Slide 13

Slide 13 text

Hominin brains • Why not higher order polynomials? BSPVOE XJMEMZ JO UIJT JOUFSWBM *O 'ĶĴłĿIJ ƎƋ G FTQFDJBMMZ UIF TXJOH JT TP FYUSFN UIBU * IBE UP FYUFOE UIF SBOHF PG UIF WFSUJDBM BYJT UP EJTQMBZ UIF EFQUI BU XIJDI UI QSFEJDUFE NFBO ĕOBMMZ UVSOT CBDL BSPVOE "U BSPVOE LH UIF NPEFM QSFEJD B OFHBUJWF CSBJO TJ[F ćF NPEFM QBZT OP QSJDF ZFU GPS UIJT BCTVSEJUZ CFDBVT UIFSF BSF OP DBTFT JO UIF EBUB XJUI CPEZ NBTT OFBS LH 8IZ EPFT UIF TJYUIEFHSFF QPMZOPNJBM ĕU QFSGFDUMZ #FDBVTF JU IBT FOPVH QBSBNFUFST UP BTTJHO POF UP FBDI QPJOU PG EBUB ćF NPEFMT FRVBUJPO GPS UIF NFB IBT  QBSBNFUFST µJ = α + β NJ + β N J + β N J + β N J + β N J + β N J , BOE UIFSF BSF  TQFDJFT UP QSFEJDU CSBJO TJ[FT GPS 4P FČFDUJWFMZ UIJT NPEFM BTTJHO POF QBSBNFUFS UP KVTU SFJUFSBUF FBDI PCTFSWFE CSBJO TJ[F ćJT JT B HFOFSBM QIF OPNFOPO *G ZPV BEPQU B NPEFM GBNJMZ XJUI FOPVHI QBSBNFUFST ZPV DBO ĕU UI EBUB FYBDUMZ #VU TVDI B NPEFM XJMM NBLF SBUIFS BCTVSE QSFEJDUJPOT GPS ZFUUPCF PCTFSWFE DBTFT 3FUIJOLJOH .PEFM ĕUUJOH BT DPNQSFTTJPO "OPUIFS QFSTQFDUJWF PO UIF BCTVSE NPE KVTU BCPWF JT UP DPOTJEFS UIBU NPEFM ĕUUJOH DBO CF DPOTJEFSFE B GPSN PG ıĮŁĮ İļĺĽĿIJŀ ŀĶļĻ 1BSBNFUFST TVNNBSJ[F SFMBUJPOTIJQT BNPOH UIF EBUB ćFTF TVNNBSJFT DPNQSF UIF EBUB JOUP B TJNQMFS GPSN BMUIPVHI XJUI MPTT PG JOGPSNBUJPO iMPTTZw DPNQSFTTJPO

Slide 14

Slide 14 text

Figure 6.3 35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) R^2 = 0.49 (a) 35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) R^2 = 0.54 (b) 35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) R^2 = 0.68 (c) 35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) R^2 = 0.81 (d) 1500 e (cc) R^2 = 0.99 (e) 800 1200 e (cc) R^2 = 1.00 (f) body mass (kg) 35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) R^2 = 0.68 (c) 35 40 45 50 55 60 500 1000 1500 body mass (kg) brain volume (cc) R^2 = 0.99 (e) 'ĶĴłĿIJ ƎƋ 1PMZOPNJBM MJOFBS NPEFMT NJOJO EBUB &BDI QMPU TIPXT UIF QSFEJDUF PG UIF NFBO TIBEFE 3 JT EJTQMBZFE BCP OPNJBM C 4FDPOE EFHSFF D ćJSE E EFHSFF G 4JYUI EFHSFF 35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) R^2 = 0.54 (b) 35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) R^2 = 0.81 (d) 0 1200 (cc) R^2 = 1.00 (f) 35 40 45 50 55 60 body mass (kg) 35 40 45 50 55 60 body mass (kg) 35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) R^2 = 0.68 (c) 35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) R^2 = 0.81 (d) 35 40 45 50 55 60 500 1000 1500 body mass (kg) brain volume (cc) R^2 = 0.99 (e) 35 40 45 50 55 60 0 400 800 1200 body mass (kg) brain volume (cc) R^2 = 1.00 (f) 'ĶĴłĿIJ ƎƋ 1PMZOPNJBM MJOFBS NPEFMT PG JODSFBTJOH EFHSFF ĕU UP UIF IP NJOJO EBUB &BDI QMPU TIPXT UIF QSFEJDUFE NFBO JO CMBDL XJUI  JOUFSWBM PG UIF NFBO TIBEFE 3 JT EJTQMBZFE BCPWF FBDI QMPU B 'JSTU EFHSFF QPMZ OPNJBM C 4FDPOE EFHSFF D ćJSE EFHSFF E 'PVSUI EFHSFF F 'JęI

Slide 15

Slide 15 text

Figure 6.5 Underfitting Insensitive to exact data Overfitting Very sensitive to exact data  5)& 130#-&. 8*5) 1"3".&5&34  35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) (a) 35 40 45 50 55 60 0 500 1000 2000 body mass (kg) brain volume (cc) (b) 'ĶĴłĿIJ Ǝƍ 6OEFSĕUUJOH BOE PWFSĕUUJOH BT VOEFSTFOTJUJWJUZ BOE PWFS TFOTJUJWJUZ UP TBNQMF *O CPUI QMPUT B SFHSFTTJPO JT ĕU UP UIF TFWFO TFUT PG EBUB NBEF CZ ESPQQJOH POF SPX GSPN UIF PSJHJOBM EBUB B "O VOEFSĕU NPEFM JT JOTFOTJUJWF UP UIF TBNQMF DIBOHJOH MJUUMF BT JOEJWJEVBM QPJOUT BSF ESPQQFE C "O PWFSĕU NPEFM JT TFOTJUJWF UP UIF TBNQMF DIBOHJOH ESBNBU

Slide 16

Slide 16 text

Importance of being regular • Want the regular features of the sample • Strategies • Cross-validation • Regularizing priors (penalized likelihood) • Information criteria • Science! (iterative group learning) • Proper approach depends upon purpose

Slide 17

Slide 17 text

Information criteria Compare models?

Slide 18

Slide 18 text

The road to AIC/DIC/WAIC • What’s a good target? • How measure distance from the target? • How can we estimate that distance? • How can we adjust that estimate to account for overfitting?

Slide 19

Slide 19 text

How far from truth? • Truth: The real joint probability of events • Truth defines probability distribution • Model defines another • Need a way to measure distance of a model from truth • Distance needs to accommodate complexity of prediction task 0CTFSWFE i4P CZ SBUF PG DPSSFDU QSFEJDUJPO BMPOF w UIF OFXDPNFS BOOPVODFT i*N UIF CFTU QFSTPO GPS UIF KPCw ćF OFXDPNFS JT SJHIU %FĕOF IJU SBUF BT UIF BWFSBHF DIBODF PG B DPSSFDU QSFEJDUJPO 4P GPS UIF DVSSFOU XFBUIFSQFSTPO TIF HFUT  ×  +  × . = . IJUT JO  EBZT GPS B SBUF PG ./ = . DPSSFDU QSFEJDUJPOT QFS EBZ *O DPOUSBTU UIF OFXDPNFS HFUT ×+× =  GPS / = . IJUT QFS EBZ ćF OFXDPNFS XJOT  $PTUT BOE CFOFĕUT #VU JUT OPU IBSE UP ĕOE BOPUIFS DSJUFSJPO PUIFS UIBO SBUF PG DPSSFDU QSFEJDUJPO UIBU NBLFT UIF OFXDPNFS MPPL GPPMJTI "OZ DPOTJEFSBUJPO PG DPTUT BOE CFOFĕUT XJMM TVďDF 4VQQPTF GPS FYBNQMF UIBU ZPV IBUF HFUUJOH DBVHIU JO UIF SBJO CVU ZPV BMTP IBUF DBSSZJOH BO VNCSFMMB -FUT EFĕOF UIF DPTU PG HFUUJOH XFU BT − QPJOUT PG IBQQJOFTT BOE UIF DPTU PG DBSSZJOH BO VNCSFMMB BT − QPJOUT PG IBQQJOFTT 4VQQPTF ZPVS DIBODF PG DBSSZJOH BO VNCSFMMB JT FRVBM UP UIF GPSFDBTU QSPCBCJMJUZ PG SBJO :PVS KPC JT OPX UP NBYJNJ[F ZPVS IBQQJOFTT CZ DIPPTJOH B XFBUIFSQFSTPO )FSF BSF ZPVS QPJOUT GPMMPXJOH FJUIFS UIF DVSSFOU XFBUIFSQFSTPO PS UIF OFXDPNFS %BZ           0CTFSWFE 1PJOUT $VSSFOU − − − −. −. −. −. −. −. −. /FXDPNFS − − −       

Slide 20

Slide 20 text

Information theory • Information: Reduction in uncertainty caused by learning an outcome. Today Tomorrow ?

Slide 21

Slide 21 text

Information theory • Information: Reduction in uncertainty caused by learning an outcome. Today Tomorrow

Slide 22

Slide 22 text

Today Tomorrow Los Angeles Seattle ? ? ? Atlanta

Slide 23

Slide 23 text

Information theory • Information: Reduction in uncertainty caused by learning an outcome. • How to quantify uncertainty? Should be: 1. Continuous 2. Increasing with number of possible events 3. Additive • These criteria intuitive, but effectiveness is why we keep using them • Like Bayes: intuitive, but effectiveness is reason to use

Slide 24

Slide 24 text

Information entropy • 1948, Claude Shannon derived information entropy: Shannon (1916–2001) Uncertainty in a probability distribution is average (minus) log-probability of an event. VODFSUBJOUZ PWFS UIF GPVS DPNCJOBUJPOT PG UIFTF FWFOUT‰SBJOIPU ODPME TIJOFIPU TIJOFDPME‰TIPVME CF UIF TVN PG UIF TFQBSBUF VO BJOUJFT POF GVODUJPO UIBU TBUJTĕFT UIFTF EFTJEFSBUB ćJT GVODUJPO JT VTVBMMZ ijļĿĺĮŁĶļĻ IJĻŁĿļĽņ BOE IBT B TVSQSJTJOHMZ TJNQMF EFĕOJUJPO *G JČFSFOU QPTTJCMF FWFOUT BOE FBDI FWFOU J IBT QSPCBCJMJUZ QJ BOE XF G QSPCBCJMJUJFT Q UIFO UIF VOJRVF NFBTVSF PG VODFSUBJOUZ XF TFFL JT )(Q) = − & MPH(QJ) = − O J= QJ MPH(QJ).  PSET VODFSUBJOUZ DPOUBJOFE JO B QSPCBCJMJUZ EJTUSJCVUJPO JT UIF BWFS PHQSPCBCJMJUZ PG BO FWFOU NJHIU SFGFS UP B UZQF PG XFBUIFS MJLF SBJO PS TIJOF PS B QBSUJDVMBS E PS FWFO B QBSUJDVMBS OVDMFPUJEF JO B %/" TFRVFODF 8IJMF JUT OPU JOUP UIF EFUBJMT PG UIF EFSJWBUJPO PG ) JU JT XPSUI QPJOUJOH PVU UIBU VU UIJT GVODUJPO JT BSCJUSBSZ &WFSZ QBSU PG JU EFSJWFT GSPN UIF UISFF

Slide 25

Slide 25 text

Entropy to accuracy • Two probability distributions: p, q • How accurate is q, for describing p? • Distance from q to p: Divergence  */'03."5*0/ 5)&03: "/% .0%&- 1&3'03."/$&  PS FYBNQMF UIBU UIF USVF EJTUSJCVUJPO PG FWFOUT JT Q = ., Q = . OTUFBE UIBU UIFTF FWFOUT IBQQFO XJUI QSPCBCJMJUJFT R = ., R = DI BEEJUJPOBM VODFSUBJOUZ IBWF XF JOUSPEVDFE BT B DPOTFRVFODF PG , R} UP BQQSPYJNBUF Q = {Q, Q} ćF GPSNBM BOTXFS UP UIJT RVFT VQPO ) BOE IBT B TJNJMBSMZ TJNQMF GPSNVMB %,-(Q, R) = J QJ MPH(QJ) − MPH(RJ) . HVBHF UIF EJWFSHFODF JT UIF BWFSBHF EJČFSFODF JO MPH QSPCBCJMJUZ CF FU Q BOE NPEFM R  ćJT EJWFSHFODF JT KVTU UIF EJČFSFODF CFUXFFO  ćF FOUSPQZ PG UIF UBSHFU EJTUSJCVUJPO Q BOE UIF FOUSPQZ BSJTJOH UP QSFEJDU Q 8IFO Q = R XF LOPX UIF BDUVBM QSPCBCJMJUJFT PG UIF U DBTF Good news, everyone! Distance from q to p is the average difference in log-probability.

Slide 26

Slide 26 text

Entropy to accuracy   OTUFBE UIBU UIFTF FWFOUT IBQQFO XJUI QSPCBCJMJUJFT R = ., R = DI BEEJUJPOBM VODFSUBJOUZ IBWF XF JOUSPEVDFE BT B DPOTFRVFODF PG , R} UP BQQSPYJNBUF Q = {Q, Q} ćF GPSNBM BOTXFS UP UIJT RVFT VQPO ) BOE IBT B TJNJMBSMZ TJNQMF GPSNVMB %,-(Q, R) = J QJ MPH(QJ) − MPH(RJ) . HVBHF UIF EJWFSHFODF JT UIF BWFSBHF EJČFSFODF JO MPH QSPCBCJMJUZ CF FU Q BOE NPEFM R  ćJT EJWFSHFODF JT KVTU UIF EJČFSFODF CFUXFFO  ćF FOUSPQZ PG UIF UBSHFU EJTUSJCVUJPO Q BOE UIF FOUSPQZ BSJTJOH UP QSFEJDU Q 8IFO Q = R XF LOPX UIF BDUVBM QSPCBCJMJUJFT PG UIF U DBTF ,-(Q, R) = %,-(Q, Q) = J QJ MPH(QJ) − MPH(QJ) = . EJUJPOBM VODFSUBJOUZ JOEVDFE XIFO XF VTF B QSPCBCJMJUZ EJTUSJCVUJPO TFMG ćBUT TPNFIPX B DPNGPSUJOH UIPVHIU #VU NPSF JNQPSUBOUMZ PSF EJČFSFOU GSPN Q UIF EJWFSHFODF %,- BMTP HSPXT FSHFODF DBO EP GPS VT OPX JT IFMQ VT DPOUSBTU EJČFSFOU BQQSPYJNB  */'03."5*0/ 5)&0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.5 1.0 1.5 2.0 2.5 q[1] Divergence of q from p q = p ' J Q U Q Q J U p <- c(0.3,0.7) DKL <- function(p,q) sum(p*(log(p)-log(q))) q1seq <- seq(from=0.01,to=0.99,by=0.01) DKLseq <- sapply(q1seq, function(q1) DKL(p,c(q1,1-q1)) ) plot( q1seq , DKLseq )

Slide 27

Slide 27 text

Direction matters

Slide 28

Slide 28 text

0 1 2 3 4 information entropy Earth Earth Mars Mars

Slide 29

Slide 29 text

Estimating divergence • How to estimate DKL ? • Don’t know p! Don’t need it. Focus on difference between two approximating models: 4VQQPTF GPS FYBNQMF UIBU UIF USVF EJTUSJCVUJPO PG FWFOUT JT Q = ., Q = . *G XF CFMJFWF JOTUFBE UIBU UIFTF FWFOUT IBQQFO XJUI QSPCBCJMJUJFT R = ., R = . IPX NVDI BEEJUJPOBM VODFSUBJOUZ IBWF XF JOUSPEVDFE BT B DPOTFRVFODF P VTJOH R = {R, R} UP BQQSPYJNBUF Q = {Q, Q} ćF GPSNBM BOTXFS UP UIJT RVFT UJPO JT CBTFE VQPO ) BOE IBT B TJNJMBSMZ TJNQMF GPSNVMB %,-(Q, R) = J QJ MPH(QJ) − MPH(RJ) . *O QMBJOFS MBOHVBHF UIF EJWFSHFODF JT UIF BWFSBHF EJČFSFODF JO MPH QSPCBCJMJUZ CF UXFFO UIF UBSHFU Q BOE NPEFM R  ćJT EJWFSHFODF JT KVTU UIF EJČFSFODF CFUXFFO UXP FOUSPQJFT ćF FOUSPQZ PG UIF UBSHFU EJTUSJCVUJPO Q BOE UIF FOUSPQZ BSJTJOH GSPN VTJOH R UP QSFEJDU Q 8IFO Q = R XF LOPX UIF BDUVBM QSPCBCJMJUJFT PG UIF FWFOUT *O UIBU DBTF %,-(Q, R) = %,-(Q, Q) = J QJ MPH(QJ) − MPH(QJ) = . ćFSF JT OP BEEJUJPOBM VODFSUBJOUZ JOEVDFE XIFO XF VTF B QSPCBCJMJUZ EJTUSJCVUJPO UP SFQSFTFOU JUTFMG ćBUT TPNFIPX B DPNGPSUJOH UIPVHIU #VU NPSF JNQPSUBOUMZ BT R HSPXT NPSF EJČFSFOU GSPN Q UIF EJWFSHFODF %,- BMTP HSPXT  */'03."5*0/ 5)&03: "/% .0%&- 1&3'03."/$&  EJWFSHFODF PG CPUI R BOE S ćJT UFSN IBT OP FČFDU PO UIF EJTUBODF PG R BOE S GSPN POF BOPUIFS 4P XIJMF XF EPOU LOPX XIFSF Q JT XF DBO FTUJNBUF IPX GBS BQBSU R BOE S BSF BOE XIJDI JT DMPTFS UP UIF UBSHFU *UT BT JG XF DBOU UFMM IPX GBS BOZ QBSUJDVMBS BSDIFS JT GSPN UIF UBSHFU CVU XF DBO UFMM XIJDI BSDIFS JT DMPTFS BOE CZ IPX NVDI %,-(Q, R) − %,-(Q, S) = − J QJ(MPH RJ − MPH QJ) − − J QJ(MPH SJ − MPH QJ) = − J QJ(MPH RJ − MPH SJ) = −(& MPH RJ − & MPH SJ) "MM PG UIJT BMTP NFBOT UIBU BMM XF OFFE UP LOPX JT B NPEFMT BWFSBHF MPH QSPC BCJMJUZ & MPH(RJ) GPS R BOE & MPH(SJ) GPS S ćFTF FYQSFTTJPOT MPPL B MPU MJLF MPH

Slide 30

Slide 30 text

Deviance • Deviance, estimate of relative information divergence: UBLFT DBSF PG QSFTFOUJOH UIF FWFOUT GPS VT 4P XF DBO DPNQBSF UIF BWFSBHF MPH QSPCBCJMJUZ GSPN FBDI NPEFM UP HFU BO FT UJNBUF PG UIF SFMBUJWF EJTUBODF PG FBDI NPEFM GSPN UIF UBSHFU ćJT BMTP NFBOT UIBU UIF BCTPMVUF NBHOJUVEF PG UIFTF WBMVFT XJMM OPU CF JOUFSQSFUBCMF‰OFJUIFS & MPH(RJ) OPS & MPH(SJ) CZ JUTFMG TVHHFTUT B HPPE PS CBE NPEFM 0OMZ UIF EJČFS FODF & MPH(RJ)−& MPH(SJ) JOGPSNT VT BCPVU UIF EJWFSHFODF PG FBDI NPEFM GSPN UIF UBSHFU Q "MM PG UIJT EFMJWFST VT UP B WFSZ DPNNPO NFBTVSF PG NPEFM ĕU POF UIBU BMTP UVSOT PVU UP CF BO BQQSPYJNBUJPO PG ,- EJWFSHFODF UIF ıIJŃĶĮĻİIJ XIJDI JT EF ĕOFE BT %(R) = − J MPH(RJ) XIFSF J JOEFYFT FBDI PCTFSWBUJPO DBTF BOE FBDI RJ JT KVTU UIF MJLFMJIPPE PG DBTF J ćF − JO GSPOU EPFTOU EP BOZUIJOH JNQPSUBOU *UT UIFSF GPS IJTUPSJDBM SFBTPOT :PV DBO DPNQVUF UIF EFWJBODF GPS BOZ NPEFM ZPVWF ĕU BMSFBEZ JO UIJT CPPL KVTU CZ VTJOH UIF ."1 FTUJNBUFT UP DPNQVUF B MPHQSPCBCJMJUZ PG UIF PCTFSWFE EBUB GPS FBDI SPX ćFTF QSPCBCJMJUJFT BSF UIF R WBMVFT ćFO ZPV BEE UIFTF MPH QSPCBCJMJUJFT UPHFUIFS BOE NVMUJQMZ CZ − )FSFT B RVJDL FYBNQMF VTJOH UIF IP NJOJO CSBJO EBUB BHBJO Deviance is the sum log-probability of the data, with a minus-two because reasons. Strange, since not average, but rather sum log-prob. EJWFSHFODF PG CPUI R BOE S ćJT UFSN IBT OP FČFDU PO UIF EJTUBODF PG R POF BOPUIFS 4P XIJMF XF EPOU LOPX XIFSF Q JT XF DBO FTUJNBUF IPX R BOE S BSF BOE XIJDI JT DMPTFS UP UIF UBSHFU *UT BT JG XF DBOU UFMM I QBSUJDVMBS BSDIFS JT GSPN UIF UBSHFU CVU XF DBO UFMM XIJDI BSDIFS JT DMP IPX NVDI %,-(Q, R) − %,-(Q, S) = − J QJ(MPH RJ − MPH QJ) − − J QJ(MPH SJ − = − J QJ(MPH RJ − MPH SJ) = −(& MPH RJ − & MPH "MM PG UIJT BMTP NFBOT UIBU BMM XF OFFE UP LOPX JT B NPEFMT BWFSBH BCJMJUZ & MPH(RJ) GPS R BOE & MPH(SJ) GPS S ćFTF FYQSFTTJPOT MPPL B QSPCBCJMJUJFT PG PVUDPNFT MJLF UIF MPHMJLFMJIPPET ZPVWF CFFO VTJOH B EFFE KVTU TVNNJOH UIF MPHMJLFMJIPPET PG FBDI DBTF QSPWJEFT BO BQQS PG & MPH(RJ) 8F EPOU IBWF UP LOPX UIF Q JOTJEF UIF FYQFDUBUJPO CFDB UBLFT DBSF PG QSFTFOUJOH UIF FWFOUT GPS VT 4P XF DBO DPNQBSF UIF BWFSBHF MPH QSPCBCJMJUZ GSPN FBDI NPEFM UP UJNBUF PG UIF SFMBUJWF EJTUBODF PG FBDI NPEFM GSPN UIF UBSHFU ćJT B P XIJMF XF EPOU LOPX XIFSF Q JT XF DBO FTUJNBUF IPX GBS BQBSU E XIJDI JT DMPTFS UP UIF UBSHFU *UT BT JG XF DBOU UFMM IPX GBS BOZ FS JT GSPN UIF UBSHFU CVU XF DBO UFMM XIJDI BSDIFS JT DMPTFS BOE CZ ,-(Q, S) = − J QJ(MPH RJ − MPH QJ) − − J QJ(MPH SJ − MPH QJ) = − J QJ(MPH RJ − MPH SJ) = −(& MPH RJ − & MPH SJ) MTP NFBOT UIBU BMM XF OFFE UP LOPX JT B NPEFMT BWFSBHF MPH QSPC J) GPS R BOE & MPH(SJ) GPS S ćFTF FYQSFTTJPOT MPPL B MPU MJLF MPH PVUDPNFT MJLF UIF MPHMJLFMJIPPET ZPVWF CFFO VTJOH BMSFBEZ *O NJOH UIF MPHMJLFMJIPPET PG FBDI DBTF QSPWJEFT BO BQQSPYJNBUJPO F EPOU IBWF UP LOPX UIF Q JOTJEF UIF FYQFDUBUJPO CFDBVTF OBUVSF FTFOUJOH UIF FWFOUT GPS VT PNQBSF UIF BWFSBHF MPH QSPCBCJMJUZ GSPN FBDI NPEFM UP HFU BO FT FMBUJWF EJTUBODF PG FBDI NPEFM GSPN UIF UBSHFU ćJT BMTP NFBOT UF NBHOJUVEF PG UIFTF WBMVFT XJMM OPU CF JOUFSQSFUBCMF‰OFJUIFS MPH(SJ) CZ JUTFMG TVHHFTUT B HPPE PS CBE NPEFM 0OMZ UIF EJČFS BT OP FČFDU PO UIF EJTUBODF PG R BOE S GSPN XIFSF Q JT XF DBO FTUJNBUF IPX GBS BQBSU F UBSHFU *UT BT JG XF DBOU UFMM IPX GBS BOZ U XF DBO UFMM XIJDI BSDIFS JT DMPTFS BOE CZ RJ − MPH QJ) − − J QJ(MPH SJ − MPH QJ) RJ − MPH SJ) = −(& MPH RJ − & MPH SJ) FFE UP LOPX JT B NPEFMT BWFSBHF MPH QSPC PS S ćFTF FYQSFTTJPOT MPPL B MPU MJLF MPH MJLFMJIPPET ZPVWF CFFO VTJOH BMSFBEZ *O T PG FBDI DBTF QSPWJEFT BO BQQSPYJNBUJPO F Q JOTJEF UIF FYQFDUBUJPO CFDBVTF OBUVSF VT QSPCBCJMJUZ GSPN FBDI NPEFM UP HFU BO FT NPEFM GSPN UIF UBSHFU ćJT BMTP NFBOT % ∝ Y MQQE = MPH PG QSPEVDU PG BWFSBHF MJLFMJIPPET = TVN PG MPHT PG BWFSBHF MJLFMJIPPET MQQE = / J= MPH 1S(ZJ|θ) 1S(θ)Eθ /

Slide 31

Slide 31 text

Deviance • Compute it: • Compute log probability of each observation • Sum all of these log probabilities • Multiply by –2 • Typical to use MAP estimates for probabilities, but can use entire posterior • Will do so later, when compute WAIC as estimate of deviance UBSHFU Q "MM PG UIJT EFMJWFST VT UP B WFSZ DPNNPO NFBTVSF PG NPEFM ĕU POF UIBU BMTP UVSOT PVU UP CF BO BQQSPYJNBUJPO PG ,- EJWFSHFODF UIF ıIJŃĶĮĻİIJ XIJDI JT EF ĕOFE BT %(R) = − J MPH(RJ) XIFSF J JOEFYFT FBDI PCTFSWBUJPO DBTF BOE FBDI RJ JT KVTU UIF MJLFMJIPPE PG DBTF J ćF − JO GSPOU EPFTOU EP BOZUIJOH JNQPSUBOU *UT UIFSF GPS IJTUPSJDBM SFBTPOT :PV DBO DPNQVUF UIF EFWJBODF GPS BOZ NPEFM ZPVWF ĕU BMSFBEZ JO UIJT CPPL KVTU CZ VTJOH UIF ."1 FTUJNBUFT UP DPNQVUF B MPHQSPCBCJMJUZ PG UIF PCTFSWFE EBUB GPS FBDI SPX ćFTF QSPCBCJMJUJFT BSF UIF R WBMVFT ćFO ZPV BEE UIFTF MPH QSPCBCJMJUJFT UPHFUIFS BOE NVMUJQMZ CZ − )FSFT B RVJDL FYBNQMF VTJOH UIF IP NJOJO CSBJO EBUB BHBJO *ǃǑDž ʆǦ *-ǯ )&01ǯ /&+ ʍ !+,/*ǯ *2 ǒ 0&$* ǰ ǒ *2 ʍ  ʀ ǹ*00 ǰ ǒ !1ʅ! ǒ 01/1ʅ)&01ǯʅ*"+ǯ!ɢ/&+ǰǒʅƽǒ0&$*ʅ0!ǯ!ɢ/&+ǰǰǒ

Slide 32

Slide 32 text

The road to AIC/DIC/WAIC ✓ What’s a good prediction? ✓ How far is the model from the target? ✓ How can we estimate that distance? • How can we adjust that estimate to account for overfitting?

Slide 33

Slide 33 text

Deviance overfits • A meta-model of forecasting: • Two samples: training and testing, size N • Fit model to training sample, get Dtrain • Use fit to training to compute Dtest • Difference Dtest – Dtrain is overfitting

Slide 34

Slide 34 text

NFBTVSFE JO BOE PVU PG TBNQMF VTJOH B TJNQMF QSFEJDUJPO TDFOBSJP 5P WJTVBMJ[F UIF SFTVMUT PG UIF UIPVHIU FYQFSJNFOU XIBU XFMM EP OPX JT DPOE UIPVHIU FYQFSJNFOU UIPVTBOE UJNFT GPS FBDI PG  EJČFSFOU MJOFBS SFHSFTTJPO NPEFM UIBU HFOFSBUFT UIF EBUB JT ZJ ∼ /PSNBM(µJ, ) µJ = (.)Y,J − (.)Y,J ćJT DPSSFTQPOET UP B (BVTTJBO PVUDPNF Z GPS XIJDI UIF JOUFSDFQU JT α =  B GPS FBDI PG UXP QSFEJDUPST BSF β = . BOE β = −. ćF NPEFMT GPS EBUB BSF MJOFBS SFHSFTTJPOT XJUI CFUXFFO  BOE  GSFF QBSBNFUFST ćF ĕSTU NPE QBSBNFUFS UP FTUJNBUF JT KVTU B MJOFBS SFHSFTTJPO XJUI BO VOLOPXO NFBO BOE &BDI QBSBNFUFS BEEFE UP UIF NPEFM BEET B QSFEJDUPS WBSJBCMF BOE JUT CFUBDPFď UIF iUSVFw NPEFM IBT OPO[FSP DPFďDJFOUT GPS POMZ UIF ĕSTU UXP QSFEJDUPST XF UIF USVF NPEFM IBT  QBSBNFUFST #Z ĕUUJOH BMM ĕWF NPEFMT XJUI CFUXFFO  BOE UP USBJOJOH TBNQMFT GSPN UIF TBNF QSPDFTTFT XF DBO HFU BO JNQSFTTJPO GPS I CFIBWFT 'ĶĴłĿIJ ƎƏ TIPXT UIF SFTVMUT PG UIPVTBOE TJNVMBUJPOT GPS FBDI NPEFM UZQ GFSFOU TBNQMF TJ[FT ćF GVODUJPO UIBU DPOEVDUT UIF TJNVMBUJPOT JT .$(Ǐ/-$) Data generating model: Models fit to data: µJ = α µJ = α + β Y,J µJ = α + β Y,J + β Y,J µJ = α + β Y,J + β Y,J + β Y,J µJ = α + β Y,J + β Y,J + β Y,J + β Y,J MQQE = MPH PG QSPEVDU PG BWFSBHF MJLFMJIPPET = TVN PG MPHT PG BWFSBHF MJLFMJIPPET / (flat priors) Deviance overfits

Slide 35

Slide 35 text

  07&3'*55*/( 3&(6-"3*;"5*0/ "/% */'03."5*0/ $3*5&3*" 1 2 3 4 5 45 50 55 60 65 number of parameters deviance N = 20 in out +1SD –1SD 1 2 3 4 5 250 260 270 280 290 300 number of parameters deviance N = 100 in out 'ĶĴłĿIJ ƎƏ %FWJBODF JO BOE PVU PG TBNQMF *O FBDI QMPU NPEFMT XJUI EJG GFSFOU OVNCFST PG QSFEJDUPS WBSJBCMFT BSF TIPXO PO UIF IPSJ[POUBM BYJT %F WJBODF BDSPTT UIPVTBOE TJNVMBUJPOT JT TIPXO PO UIF WFSUJDBM #MVF TIPXT Data generating model Deviance overfits

Slide 36

Slide 36 text

  07&3'*55*/( 3&(6-"3*;"5*0/ "/% */'03."5*0/ $3*5&3*" 1 2 3 4 5 45 50 55 60 65 number of parameters deviance N = 20 in out +1SD –1SD 1 2 3 4 5 250 260 270 280 290 300 number of parameters deviance N = 100 in out 'ĶĴłĿIJ ƎƏ %FWJBODF JO BOE PVU PG TBNQMF *O FBDI QMPU NPEFMT XJUI EJG GFSFOU OVNCFST PG QSFEJDUPS WBSJBCMFT BSF TIPXO PO UIF IPSJ[POUBM BYJT %F WJBODF BDSPTT UIPVTBOE TJNVMBUJPOT JT TIPXO PO UIF WFSUJDBM #MVF TIPXT Deviance overfits

Slide 37

Slide 37 text

Regularization • Use informative, conservative priors to reduce overfitting => model learns less from sample • But if too informative, model learns too little • Such priors are regularizing 1 0 1 2 3 rameter value /PSNBM(, ) ćJO TPMJE /PSNBM(, .) ćJDL TPMJE /PSNBM(, .) T SFBMMZ POF PG UVOJOH #VU BT ZPVMM TFF FWFO NJME TLFQUJDJTN DBO IFMQ B BOE EPJOH CFUUFS JT BMM XF DBO SFBMMZ IPQF GPS JO UIF MBSHF XPSME XIFSF OP JT PQUJNBM DPOTJEFS UIJT (BVTTJBO NPEFM ZJ ∼ /PSNBM(µJ, σ) µJ = α + βYJ α ∼ /PSNBM(, ) β ∼ /PSNBM(, ) σ ∼ 6OJGPSN(, ) E QSBDUJDF UIBU UIF QSFEJDUPS Y JT TUBOEBSEJ[FE TP UIBU JUT TUBOEBSE EFWJBUJPO JT [FSP ćFO UIF QSJPS PO α JT B OFBSMZĘBU QSJPS UIBU IBT OP QSBDUJDBM FČFDU   07&3'*55*/( 3&(6-"3*;"5*0/ -3 -2 -1 0 1 2 3 0.0 0.5 1.0 1.5 2.0 parameter value Density 'ĶĴłĿIJ TUSPOH TUBOEBS ĕUUJOH /PSNB TPMJE / regularizing prior N(0,1) N(0,0.5) N(0,0.2)

Slide 38

Slide 38 text

Regularization  3&(6-"3*;"5*0/  1 2 3 4 5 48 50 52 54 56 58 60 number of parameters deviance N = 20 N(0,1) N(0,0.5) N(0,0.2) 1 2 3 4 5 260 265 270 275 280 285 number of parameters deviance N = 100 'ĶĴłĿIJ ƎƑ 3FHVMBSJ[JOH QSJPST BOE PVUPGTBNQMF EFWJBODF ćF QPJOUT JO   07&3'*55*/( 3&(6-"3*;"5*0/ " -3 -2 -1 0 1 2 3 0.0 0.5 1.0 1.5 2.0 parameter value Density 'ĶĴłĿIJ TUSPOH TUBOEBSE ĕUUJOH /PSNBM TPMJE / 4P UIF QSPCMFN JT SFBMMZ POF PG UVOJOH #VU BT ZP NPEFM EP CFUUFS BOE EPJOH CFUUFS JT BMM XF DBO SF NPEFM OPS QSJPS JT PQUJNBM N(0,1) N(0,0.5) N(0,0.2) in sample out of sample

Slide 39

Slide 39 text

Regularization  3&(6-"3*;"5*0/  1 2 3 4 5 48 50 52 54 56 58 60 number of parameters deviance N = 20 N(0,1) N(0,0.5) N(0,0.2) 1 2 3 4 5 260 265 270 275 280 285 number of parameters deviance N = 100 'ĶĴłĿIJ ƎƑ 3FHVMBSJ[JOH QSJPST BOE PVUPGTBNQMF EFWJBODF ćF QPJOUT JO in sample out of sample in sample out of sample