Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Statistical Rethinking Fall 2017 Lecture 07

Statistical Rethinking Fall 2017 Lecture 07

Week 4, Lecture 7, Statistical Rethinking: A Bayesian Course with Examples in R and Stan. This lecture covers Chapter 6 of the book.

Richard McElreath

November 15, 2017
Tweet

More Decks by Richard McElreath

Other Decks in Education

Transcript

  1.   07&3'*55*/( 3&(6-"3*;"5*0/ "/% */'03."5*0/ $3*5&3*" Ptolemaic Model Copernican

    Model Earth Earth Sun Sun 'ĶĴłĿIJ ƎƉ 1UPMFNBJD MFę BOE $PQFSOJDBO SJHIU NPEFMT PG UIF TPMBS TZTUFN #PUI NPEFMT VTF FQJDZDMFT DJSDMFT PO DJSDMFT BOE CPUI NPEFMT QSPEVDF FYBDUMZ UIF TBNF QSFEJDUJPOT )PXFWFS UIF $PQFSOJDBO NPEFM SF
  2. Ockham’s Razor? William of Ockham (c.1288–c.1348) Numquam ponenda est pluralitas

    sine necessitate. (Plurality should never be posited without necessity.)
  3. Stargazing • Stargazing: Using asterisks (p < 0.05) to decide

    which variables improve prediction • Arbitrary 5% is arbitrary Coefficients: Estimate Std. Error z value Pr(z) a 1.5699e+02 9.3802e-16 1.6736e+17 < 2.2e-16 *** b1 1.6540e-01 6.6628e-14 2.4825e+12 < 2.2e-16 *** b2 -4.7063e-02 3.2586e-13 -1.4443e+11 < 2.2e-16 *** b3 1.9168e-03 5.6805e-11 3.3743e+07 < 2.2e-16 *** b4 -1.4002e-05 6.6694e-11 -2.0994e+05 < 2.2e-16 *** b5 -4.7965e-07 4.7818e-08 -1.0031e+01 < 2.2e-16 *** b6 6.6002e-09 9.5819e-10 6.8882e+00 5.651e-12 *** tau 1.2132e-01 5.2829e-20 2.2965e+18 < 2.2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 * * *
  4. Goals this week • Understand overfitting and underfitting • Learn

    AIC/DIC/WAIC as ways to: • guard against overfitting and underfitting • explicitly compare models • Introduce regularizing priors as complementary strategy • Learn how to average predictions across models AIC DIC WAIC
  5. The Problem with Parameters • Underfitting: Learning too little from

    the data. Too simple models both fit and predict poorly. • Overfitting: Learning too much from the data. Complex models always fit better, but often predict worse. • Need to find a model that navigates between underfitting and overfitting
  6. The Problem with Parameters Figure 6.2 afarensis sapiens  

    07&3'*55*/( 3&(6-"3*;"5*0/ "/% */'03."5*0/ $3*5&3*" 30 40 50 60 70 600 800 1000 1200 body mass (kg) brain volume (cc) afarensis africanus habilis boisei rudolfensis ergaster sapiens 'ĶĴłĿIJ ƎƊ "WFSBHF CSBJO WPMVNF UJNFUFST BHBJOTU CPEZ NBTT JO LJMPH IPNJOJO TQFDJFT 8IBU NPEFM CFTU SFMBUJPOTIJQ CFUXFFO CSBJO TJ[F BOE iWBSJBODF FYQMBJOFE w 3 JT EFĕOFE BT
  7. Hominin brains • Simplest model:  .0%&- 4&-&$5*0/ $0.1"3*40/ "/%

    "7&3"(*/( NPEFM UIBU SFMBUFT CSBJO TJ[F UP CPEZ TJ[F JT UIF MJOFBS POF *U XJMM FM XF DPOTJEFS WJ ∼ /PSNBM(µJ, σ) µJ = α + β NJ Z UIBU UIF BWFSBHF CSBJO WPMVNF WJ PG TQFDJFT J JT B MJOFBS GVODUJPO NJ 1SJPST BSF OFDFTTBSJMZ ĘBU IFSF TJODF XFSF VTJOH )* /PX ĕU F EBUB VTJOH )* &+ ʍ *00 ǒ !1ʅ! ǰ OH UP QMPU UIF ĕU NPEFM MJLF XF EJE JO QSFWJPVT DIBQUFST MFUT GPDVT PG IPX XFMM UIJT NPEFM ĕUT UIF EBUB ćF DPOWFOUJPOBM NFBTVSF FYU JT 3 UIF QSPQPSUJPO PG WBSJBODF iFYQMBJOFEw CZ UIF NPEFM *  5)& 130#-&. 8*5) 35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) R^2 = 0.49 (a) brain volume (cc) (b) 1200 lume (cc) R^2 = 0.68 (c) lume (cc) (d)
  8. Hominin brains • Why not parabola? IF CPUUPN PG UIF

    PVUQVU GSPN 02**/6ǯ*ǃǑƾǰ BOE ZPVMM ĕOE UIF BCFMFE UIFSF i.VMUJQMF 3TRVBSFEw NF PUIFS NPEFMT UP DPNQBSF UP UIF ĕU PG *ǃǑƾ 8FMM DPOTJEFS ĕWF BDI NPSF DPNQMFY UIBO UIF MBTU &BDI PG UIFTF NPEFMT XJMM KVTU PG IJHIFS EFHSFF 'PS FYBNQMF B TFDPOEEFHSFF QPMZOPNJBM UIBU UP CSBJO TJ[F JT B QBSBCPMB *O NBUI GPSN JU JT WJ ∼ /PSNBM(µJ, σ) µJ = α + β NJ + β N J JMZ BEET POF NPSF QBSBNFUFS β CVU VTFT BMM PG UIF TBNF EBUB BT PEFM UP UIF EBUB &+ ʍ *00 ʀ ǯ*00ʋƿǰ ǒ !1ʅ! ǰ HF  JG UIBU ǯ*00ʋƿǰ UIJOH DPOGVTFT ZPV UIF SFTU PG UIF NPEFM GBNJMJFT ćF NPEFMT *ǃǑǀ UISPVHI *ǃǑǃ BSF F GPVSUIEFHSFF ĕęIEFHSFF BOE TJYUIEFHSFF QPMZOPNJBMT CVJMU  5)& 130#-&. 8*5) 1"3".&5&34  35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) R^2 = 0.49 (a) 35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) R^2 = 0.54 (b) 00 1200 volume (cc) R^2 = 0.68 (c) 00 1200 volume (cc) R^2 = 0.81 (d)
  9. Hominin brains • Why not higher order polynomials? BSPVOE XJMEMZ

    JO UIJT JOUFSWBM *O 'ĶĴłĿIJ ƎƋ G FTQFDJBMMZ UIF TXJOH JT TP FYUSFN UIBU * IBE UP FYUFOE UIF SBOHF PG UIF WFSUJDBM BYJT UP EJTQMBZ UIF EFQUI BU XIJDI UI QSFEJDUFE NFBO ĕOBMMZ UVSOT CBDL BSPVOE "U BSPVOE LH UIF NPEFM QSFEJD B OFHBUJWF CSBJO TJ[F ćF NPEFM QBZT OP QSJDF ZFU GPS UIJT BCTVSEJUZ CFDBVT UIFSF BSF OP DBTFT JO UIF EBUB XJUI CPEZ NBTT OFBS LH 8IZ EPFT UIF TJYUIEFHSFF QPMZOPNJBM ĕU QFSGFDUMZ #FDBVTF JU IBT FOPVH QBSBNFUFST UP BTTJHO POF UP FBDI QPJOU PG EBUB ćF NPEFMT FRVBUJPO GPS UIF NFB IBT  QBSBNFUFST µJ = α + β NJ + β N J + β N J + β N J + β N J + β N J , BOE UIFSF BSF  TQFDJFT UP QSFEJDU CSBJO TJ[FT GPS 4P FČFDUJWFMZ UIJT NPEFM BTTJHO POF QBSBNFUFS UP KVTU SFJUFSBUF FBDI PCTFSWFE CSBJO TJ[F ćJT JT B HFOFSBM QIF OPNFOPO *G ZPV BEPQU B NPEFM GBNJMZ XJUI FOPVHI QBSBNFUFST ZPV DBO ĕU UI EBUB FYBDUMZ #VU TVDI B NPEFM XJMM NBLF SBUIFS BCTVSE QSFEJDUJPOT GPS ZFUUPCF PCTFSWFE DBTFT 3FUIJOLJOH .PEFM ĕUUJOH BT DPNQSFTTJPO "OPUIFS QFSTQFDUJWF PO UIF BCTVSE NPE KVTU BCPWF JT UP DPOTJEFS UIBU NPEFM ĕUUJOH DBO CF DPOTJEFSFE B GPSN PG ıĮŁĮ İļĺĽĿIJŀ ŀĶļĻ 1BSBNFUFST TVNNBSJ[F SFMBUJPOTIJQT BNPOH UIF EBUB ćFTF TVNNBSJFT DPNQSF UIF EBUB JOUP B TJNQMFS GPSN BMUIPVHI XJUI MPTT PG JOGPSNBUJPO iMPTTZw DPNQSFTTJPO
  10. Figure 6.3 35 40 45 50 55 60 400 800

    1200 body mass (kg) brain volume (cc) R^2 = 0.49 (a) 35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) R^2 = 0.54 (b) 35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) R^2 = 0.68 (c) 35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) R^2 = 0.81 (d) 1500 e (cc) R^2 = 0.99 (e) 800 1200 e (cc) R^2 = 1.00 (f) body mass (kg) 35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) R^2 = 0.68 (c) 35 40 45 50 55 60 500 1000 1500 body mass (kg) brain volume (cc) R^2 = 0.99 (e) 'ĶĴłĿIJ ƎƋ 1PMZOPNJBM MJOFBS NPEFMT NJOJO EBUB &BDI QMPU TIPXT UIF QSFEJDUF PG UIF NFBO TIBEFE 3 JT EJTQMBZFE BCP OPNJBM C 4FDPOE EFHSFF D ćJSE E EFHSFF G 4JYUI EFHSFF 35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) R^2 = 0.54 (b) 35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) R^2 = 0.81 (d) 0 1200 (cc) R^2 = 1.00 (f) 35 40 45 50 55 60 body mass (kg) 35 40 45 50 55 60 body mass (kg) 35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) R^2 = 0.68 (c) 35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) R^2 = 0.81 (d) 35 40 45 50 55 60 500 1000 1500 body mass (kg) brain volume (cc) R^2 = 0.99 (e) 35 40 45 50 55 60 0 400 800 1200 body mass (kg) brain volume (cc) R^2 = 1.00 (f) 'ĶĴłĿIJ ƎƋ 1PMZOPNJBM MJOFBS NPEFMT PG JODSFBTJOH EFHSFF ĕU UP UIF IP NJOJO EBUB &BDI QMPU TIPXT UIF QSFEJDUFE NFBO JO CMBDL XJUI  JOUFSWBM PG UIF NFBO TIBEFE 3 JT EJTQMBZFE BCPWF FBDI QMPU B 'JSTU EFHSFF QPMZ OPNJBM C 4FDPOE EFHSFF D ćJSE EFHSFF E 'PVSUI EFHSFF F 'JęI
  11. Figure 6.3 35 40 45 50 55 60 400 800

    1200 body mass (kg) brain volume (cc) R^2 = 0.49 (a) 35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) R^2 = 0.54 (b) 35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) R^2 = 0.68 (c) 35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) R^2 = 0.81 (d) 1500 e (cc) R^2 = 0.99 (e) 800 1200 e (cc) R^2 = 1.00 (f) body mass (kg) 35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) R^2 = 0.68 (c) 35 40 45 50 55 60 500 1000 1500 body mass (kg) brain volume (cc) R^2 = 0.99 (e) 'ĶĴłĿIJ ƎƋ 1PMZOPNJBM MJOFBS NPEFMT NJOJO EBUB &BDI QMPU TIPXT UIF QSFEJDUF PG UIF NFBO TIBEFE 3 JT EJTQMBZFE BCP OPNJBM C 4FDPOE EFHSFF D ćJSE E EFHSFF G 4JYUI EFHSFF 35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) R^2 = 0.54 (b) 35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) R^2 = 0.81 (d) 0 1200 (cc) R^2 = 1.00 (f) 35 40 45 50 55 60 body mass (kg) 35 40 45 50 55 60 body mass (kg) 35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) R^2 = 0.68 (c) 35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) R^2 = 0.81 (d) 35 40 45 50 55 60 500 1000 1500 body mass (kg) brain volume (cc) R^2 = 0.99 (e) 35 40 45 50 55 60 0 400 800 1200 body mass (kg) brain volume (cc) R^2 = 1.00 (f) 'ĶĴłĿIJ ƎƋ 1PMZOPNJBM MJOFBS NPEFMT PG JODSFBTJOH EFHSFF ĕU UP UIF IP NJOJO EBUB &BDI QMPU TIPXT UIF QSFEJDUFE NFBO JO CMBDL XJUI  JOUFSWBM PG UIF NFBO TIBEFE 3 JT EJTQMBZFE BCPWF FBDI QMPU B 'JSTU EFHSFF QPMZ OPNJBM C 4FDPOE EFHSFF D ćJSE EFHSFF E 'PVSUI EFHSFF F 'JęI
  12. Figure 6.3 35 40 45 50 55 60 400 800

    1200 body mass (kg) brain volume (cc) R^2 = 0.49 (a) 35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) R^2 = 0.54 (b) 35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) R^2 = 0.68 (c) 35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) R^2 = 0.81 (d) 1500 e (cc) R^2 = 0.99 (e) 800 1200 e (cc) R^2 = 1.00 (f) body mass (kg) 35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) R^2 = 0.68 (c) 35 40 45 50 55 60 500 1000 1500 body mass (kg) brain volume (cc) R^2 = 0.99 (e) 'ĶĴłĿIJ ƎƋ 1PMZOPNJBM MJOFBS NPEFMT NJOJO EBUB &BDI QMPU TIPXT UIF QSFEJDUF PG UIF NFBO TIBEFE 3 JT EJTQMBZFE BCP OPNJBM C 4FDPOE EFHSFF D ćJSE E EFHSFF G 4JYUI EFHSFF 35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) R^2 = 0.54 (b) 35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) R^2 = 0.81 (d) 0 1200 (cc) R^2 = 1.00 (f) 35 40 45 50 55 60 body mass (kg) 35 40 45 50 55 60 body mass (kg) 35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) R^2 = 0.68 (c) 35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) R^2 = 0.81 (d) 35 40 45 50 55 60 500 1000 1500 body mass (kg) brain volume (cc) R^2 = 0.99 (e) 35 40 45 50 55 60 0 400 800 1200 body mass (kg) brain volume (cc) R^2 = 1.00 (f) 'ĶĴłĿIJ ƎƋ 1PMZOPNJBM MJOFBS NPEFMT PG JODSFBTJOH EFHSFF ĕU UP UIF IP NJOJO EBUB &BDI QMPU TIPXT UIF QSFEJDUFE NFBO JO CMBDL XJUI  JOUFSWBM PG UIF NFBO TIBEFE 3 JT EJTQMBZFE BCPWF FBDI QMPU B 'JSTU EFHSFF QPMZ OPNJBM C 4FDPOE EFHSFF D ćJSE EFHSFF E 'PVSUI EFHSFF F 'JęI
  13. Figure 6.3 35 40 45 50 55 60 400 800

    1200 body mass (kg) brain volume (cc) R^2 = 0.49 (a) 35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) R^2 = 0.54 (b) 35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) R^2 = 0.68 (c) 35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) R^2 = 0.81 (d) 1500 e (cc) R^2 = 0.99 (e) 800 1200 e (cc) R^2 = 1.00 (f) body mass (kg) 35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) R^2 = 0.68 (c) 35 40 45 50 55 60 500 1000 1500 body mass (kg) brain volume (cc) R^2 = 0.99 (e) 'ĶĴłĿIJ ƎƋ 1PMZOPNJBM MJOFBS NPEFMT NJOJO EBUB &BDI QMPU TIPXT UIF QSFEJDUF PG UIF NFBO TIBEFE 3 JT EJTQMBZFE BCP OPNJBM C 4FDPOE EFHSFF D ćJSE E EFHSFF G 4JYUI EFHSFF 35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) R^2 = 0.54 (b) 35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) R^2 = 0.81 (d) 0 1200 (cc) R^2 = 1.00 (f) 35 40 45 50 55 60 body mass (kg) 35 40 45 50 55 60 body mass (kg) 35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) R^2 = 0.68 (c) 35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) R^2 = 0.81 (d) 35 40 45 50 55 60 500 1000 1500 body mass (kg) brain volume (cc) R^2 = 0.99 (e) 35 40 45 50 55 60 0 400 800 1200 body mass (kg) brain volume (cc) R^2 = 1.00 (f) 'ĶĴłĿIJ ƎƋ 1PMZOPNJBM MJOFBS NPEFMT PG JODSFBTJOH EFHSFF ĕU UP UIF IP NJOJO EBUB &BDI QMPU TIPXT UIF QSFEJDUFE NFBO JO CMBDL XJUI  JOUFSWBM PG UIF NFBO TIBEFE 3 JT EJTQMBZFE BCPWF FBDI QMPU B 'JSTU EFHSFF QPMZ OPNJBM C 4FDPOE EFHSFF D ćJSE EFHSFF E 'PVSUI EFHSFF F 'JęI
  14. Figure 6.3 35 40 45 50 55 60 400 800

    1200 body mass (kg) brain volume (cc) R^2 = 0.49 (a) 35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) R^2 = 0.54 (b) 35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) R^2 = 0.68 (c) 35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) R^2 = 0.81 (d) 1500 e (cc) R^2 = 0.99 (e) 800 1200 e (cc) R^2 = 1.00 (f) body mass (kg) 35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) R^2 = 0.68 (c) 35 40 45 50 55 60 500 1000 1500 body mass (kg) brain volume (cc) R^2 = 0.99 (e) 'ĶĴłĿIJ ƎƋ 1PMZOPNJBM MJOFBS NPEFMT NJOJO EBUB &BDI QMPU TIPXT UIF QSFEJDUF PG UIF NFBO TIBEFE 3 JT EJTQMBZFE BCP OPNJBM C 4FDPOE EFHSFF D ćJSE E EFHSFF G 4JYUI EFHSFF 35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) R^2 = 0.54 (b) 35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) R^2 = 0.81 (d) 0 1200 (cc) R^2 = 1.00 (f) 35 40 45 50 55 60 body mass (kg) 35 40 45 50 55 60 body mass (kg) 35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) R^2 = 0.68 (c) 35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) R^2 = 0.81 (d) 35 40 45 50 55 60 500 1000 1500 body mass (kg) brain volume (cc) R^2 = 0.99 (e) 35 40 45 50 55 60 0 400 800 1200 body mass (kg) brain volume (cc) R^2 = 1.00 (f) 'ĶĴłĿIJ ƎƋ 1PMZOPNJBM MJOFBS NPEFMT PG JODSFBTJOH EFHSFF ĕU UP UIF IP NJOJO EBUB &BDI QMPU TIPXT UIF QSFEJDUFE NFBO JO CMBDL XJUI  JOUFSWBM PG UIF NFBO TIBEFE 3 JT EJTQMBZFE BCPWF FBDI QMPU B 'JSTU EFHSFF QPMZ OPNJBM C 4FDPOE EFHSFF D ćJSE EFHSFF E 'PVSUI EFHSFF F 'JęI
  15. Figure 6.5 Underfitting Insensitive to exact data Overfitting Very sensitive

    to exact data  5)& 130#-&. 8*5) 1"3".&5&34  35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) (a) 35 40 45 50 55 60 0 500 1000 2000 body mass (kg) brain volume (cc) (b) 'ĶĴłĿIJ Ǝƍ 6OEFSĕUUJOH BOE PWFSĕUUJOH BT VOEFSTFOTJUJWJUZ BOE PWFS TFOTJUJWJUZ UP TBNQMF *O CPUI QMPUT B SFHSFTTJPO JT ĕU UP UIF TFWFO TFUT PG EBUB NBEF CZ ESPQQJOH POF SPX GSPN UIF PSJHJOBM EBUB B "O VOEFSĕU NPEFM JT JOTFOTJUJWF UP UIF TBNQMF DIBOHJOH MJUUMF BT JOEJWJEVBM QPJOUT BSF ESPQQFE C "O PWFSĕU NPEFM JT TFOTJUJWF UP UIF TBNQMF DIBOHJOH ESBNBU
  16. Figure 6.5 Underfitting Insensitive to exact data Overfitting Very sensitive

    to exact data  5)& 130#-&. 8*5) 1"3".&5&34  35 40 45 50 55 60 400 800 1200 body mass (kg) brain volume (cc) (a) 35 40 45 50 55 60 0 500 1000 2000 body mass (kg) brain volume (cc) (b) 'ĶĴłĿIJ Ǝƍ 6OEFSĕUUJOH BOE PWFSĕUUJOH BT VOEFSTFOTJUJWJUZ BOE PWFS TFOTJUJWJUZ UP TBNQMF *O CPUI QMPUT B SFHSFTTJPO JT ĕU UP UIF TFWFO TFUT PG EBUB NBEF CZ ESPQQJOH POF SPX GSPN UIF PSJHJOBM EBUB B "O VOEFSĕU NPEFM JT JOTFOTJUJWF UP UIF TBNQMF DIBOHJOH MJUUMF BT JOEJWJEVBM QPJOUT BSF ESPQQFE C "O PWFSĕU NPEFM JT TFOTJUJWF UP UIF TBNQMF DIBOHJOH ESBNBU
  17. Importance of being regular • Want the regular features of

    the sample • Strategies • Cross-validation • Regularizing priors (penalized likelihood) • Information criteria • Science! (iterative group learning) • Proper approach depends upon purpose
  18. The road to AIC/DIC/WAIC • What’s a good target? •

    average correct? • average probability correct? • average log probability correct? • How measure distance from the target? • How can we estimate that distance? • How can we adjust that estimate to account for overfitting?
  19. How far from truth? • Truth: The real joint probability

    of events • Truth defines probability distribution • Model defines another • Need a way to measure distance of a model from truth • Distance needs to accommodate complexity of prediction task 0CTFSWFE i4P CZ SBUF PG DPSSFDU QSFEJDUJPO BMPOF w UIF OFXDPNFS BOOPVODFT i*N UIF CFTU QFSTPO GPS UIF KPCw ćF OFXDPNFS JT SJHIU %FĕOF IJU SBUF BT UIF BWFSBHF DIBODF PG B DPSSFDU QSFEJDUJPO 4P GPS UIF DVSSFOU XFBUIFSQFSTPO TIF HFUT  ×  +  × . = . IJUT JO  EBZT GPS B SBUF PG ./ = . DPSSFDU QSFEJDUJPOT QFS EBZ *O DPOUSBTU UIF OFXDPNFS HFUT ×+× =  GPS / = . IJUT QFS EBZ ćF OFXDPNFS XJOT  $PTUT BOE CFOFĕUT #VU JUT OPU IBSE UP ĕOE BOPUIFS DSJUFSJPO PUIFS UIBO SBUF PG DPSSFDU QSFEJDUJPO UIBU NBLFT UIF OFXDPNFS MPPL GPPMJTI "OZ DPOTJEFSBUJPO PG DPTUT BOE CFOFĕUT XJMM TVďDF 4VQQPTF GPS FYBNQMF UIBU ZPV IBUF HFUUJOH DBVHIU JO UIF SBJO CVU ZPV BMTP IBUF DBSSZJOH BO VNCSFMMB -FUT EFĕOF UIF DPTU PG HFUUJOH XFU BT − QPJOUT PG IBQQJOFTT BOE UIF DPTU PG DBSSZJOH BO VNCSFMMB BT − QPJOUT PG IBQQJOFTT 4VQQPTF ZPVS DIBODF PG DBSSZJOH BO VNCSFMMB JT FRVBM UP UIF GPSFDBTU QSPCBCJMJUZ PG SBJO :PVS KPC JT OPX UP NBYJNJ[F ZPVS IBQQJOFTT CZ DIPPTJOH B XFBUIFSQFSTPO )FSF BSF ZPVS QPJOUT GPMMPXJOH FJUIFS UIF DVSSFOU XFBUIFSQFSTPO PS UIF OFXDPNFS %BZ           0CTFSWFE 1PJOUT $VSSFOU − − − −. −. −. −. −. −. −. /FXDPNFS − − −       
  20. Information theory • Information: Reduction in uncertainty caused by learning

    an outcome. • How to quantify uncertainty? Should be: 1. Continuous 2. Increasing with number of possible events 3. Additive • These criteria intuitive, but effectiveness is why we keep using them • Like Bayes: intuitive, but effectiveness is reason to use
  21. Information entropy • 1948, Claude Shannon derived information entropy: Shannon

    (1916–2001) Uncertainty in a probability distribution is average (minus) log-probability of an event. IFO UIF VODFSUBJOUZ BCPVU IPU PS DPME  EJČFSFOU QPTTJCMF FWFOUT ODFSUBJOUZ PWFS UIF GPVS DPNCJOBUJPOT PG UIFTF FWFOUT‰SBJOIPU PME TIJOFIPU TIJOFDPME‰TIPVME CF UIF TVN PG UIF TFQBSBUF VO OUJFT OF GVODUJPO UIBU TBUJTĕFT UIFTF EFTJEFSBUB ćJT GVODUJPO JT VTVBMMZ ĿĺĮŁĶļĻ IJĻŁĿļĽņ BOE IBT B TVSQSJTJOHMZ TJNQMF EFĕOJUJPO *G FSFOU QPTTJCMF FWFOUT BOE FBDI FWFOU J IBT QSPCBCJMJUZ QJ BOE XF SPCBCJMJUJFT Q UIFO UIF VOJRVF NFBTVSF PG VODFSUBJOUZ XF TFFL JT )(Q) = − & MPH(QJ) = − O J= QJ MPH(QJ).  ET DFSUBJOUZ DPOUBJOFE JO B QSPCBCJMJUZ EJTUSJCVUJPO JT UIF BWFS QSPCBCJMJUZ PG BO FWFOU JHIU SFGFS UP B UZQF PG XFBUIFS MJLF SBJO PS TIJOF PS B QBSUJDVMBS PS FWFO B QBSUJDVMBS OVDMFPUJEF JO B %/" TFRVFODF 8IJMF JUT OPU UP UIF EFUBJMT PG UIF EFSJWBUJPO PG ) JU JT XPSUI QPJOUJOH PVU UIBU
  22. Entropy to accuracy • Two probability distributions: p, q •

    How accurate is q, for describing p? • Distance from q to p: Divergence  */'03."5*0/ 5)&03: "/% .0%&- 1&3'03."/$&  PS FYBNQMF UIBU UIF USVF EJTUSJCVUJPO PG FWFOUT JT Q = ., Q = . OTUFBE UIBU UIFTF FWFOUT IBQQFO XJUI QSPCBCJMJUJFT R = ., R = DI BEEJUJPOBM VODFSUBJOUZ IBWF XF JOUSPEVDFE BT B DPOTFRVFODF PG , R} UP BQQSPYJNBUF Q = {Q, Q} ćF GPSNBM BOTXFS UP UIJT RVFT VQPO ) BOE IBT B TJNJMBSMZ TJNQMF GPSNVMB %,-(Q, R) = J QJ MPH(QJ) − MPH(RJ) . HVBHF UIF EJWFSHFODF JT UIF BWFSBHF EJČFSFODF JO MPH QSPCBCJMJUZ CF FU Q BOE NPEFM R  ćJT EJWFSHFODF JT KVTU UIF EJČFSFODF CFUXFFO  ćF FOUSPQZ PG UIF UBSHFU EJTUSJCVUJPO Q BOE UIF FOUSPQZ BSJTJOH UP QSFEJDU Q 8IFO Q = R XF LOPX UIF BDUVBM QSPCBCJMJUJFT PG UIF U DBTF Distance from q to p is the average difference in log-probability.
  23. Entropy to accuracy   OTUFBE UIBU UIFTF FWFOUT IBQQFO

    XJUI QSPCBCJMJUJFT R = ., R = DI BEEJUJPOBM VODFSUBJOUZ IBWF XF JOUSPEVDFE BT B DPOTFRVFODF PG , R} UP BQQSPYJNBUF Q = {Q, Q} ćF GPSNBM BOTXFS UP UIJT RVFT VQPO ) BOE IBT B TJNJMBSMZ TJNQMF GPSNVMB %,-(Q, R) = J QJ MPH(QJ) − MPH(RJ) . HVBHF UIF EJWFSHFODF JT UIF BWFSBHF EJČFSFODF JO MPH QSPCBCJMJUZ CF FU Q BOE NPEFM R  ćJT EJWFSHFODF JT KVTU UIF EJČFSFODF CFUXFFO  ćF FOUSPQZ PG UIF UBSHFU EJTUSJCVUJPO Q BOE UIF FOUSPQZ BSJTJOH UP QSFEJDU Q 8IFO Q = R XF LOPX UIF BDUVBM QSPCBCJMJUJFT PG UIF U DBTF ,-(Q, R) = %,-(Q, Q) = J QJ MPH(QJ) − MPH(QJ) = . EJUJPOBM VODFSUBJOUZ JOEVDFE XIFO XF VTF B QSPCBCJMJUZ EJTUSJCVUJPO TFMG ćBUT TPNFIPX B DPNGPSUJOH UIPVHIU #VU NPSF JNQPSUBOUMZ PSF EJČFSFOU GSPN Q UIF EJWFSHFODF %,- BMTP HSPXT FSHFODF DBO EP GPS VT OPX JT IFMQ VT DPOUSBTU EJČFSFOU BQQSPYJNB  */'03."5*0/ 5)&0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.5 1.0 1.5 2.0 2.5 q[1] Divergence of q from p q = p ' J Q U Q Q J U p <- c(0.3,0.7) DKL <- function(p,q) sum(p*(log(p)-log(q))) q1seq <- seq(from=0.01,to=0.99,by=0.01) DKLseq <- sapply(q1seq, function(q1) DKL(p,c(q1,1-q1)) ) plot( q1seq , DKLseq )
  24. Estimating divergence • How to estimate DKL ?  */'03."5*0/

    5)&03: "/% .0%&- 1&3'03."/$&  4VQQPTF GPS FYBNQMF UIBU UIF USVF EJTUSJCVUJPO PG FWFOUT JT Q = ., Q = . G XF CFMJFWF JOTUFBE UIBU UIFTF FWFOUT IBQQFO XJUI QSPCBCJMJUJFT R = ., R = . IPX NVDI BEEJUJPOBM VODFSUBJOUZ IBWF XF JOUSPEVDFE BT B DPOTFRVFODF PG VTJOH R = {R, R} UP BQQSPYJNBUF Q = {Q, Q} ćF GPSNBM BOTXFS UP UIJT RVFT JPO JT CBTFE VQPO ) BOE IBT B TJNJMBSMZ TJNQMF GPSNVMB %,-(Q, R) = J QJ MPH(QJ) − MPH(RJ) . O QMBJOFS MBOHVBHF UIF EJWFSHFODF JT UIF BWFSBHF EJČFSFODF JO MPH QSPCBCJMJUZ CF XFFO UIF UBSHFU Q BOE NPEFM R  ćJT EJWFSHFODF JT KVTU UIF EJČFSFODF CFUXFFO XP FOUSPQJFT ćF FOUSPQZ PG UIF UBSHFU EJTUSJCVUJPO Q BOE UIF FOUSPQZ BSJTJOH SPN VTJOH R UP QSFEJDU Q 8IFO Q = R XF LOPX UIF BDUVBM QSPCBCJMJUJFT PG UIF FWFOUT *O UIBU DBTF %,-(Q, R) = %,-(Q, Q) = J QJ MPH(QJ) − MPH(QJ) = . ćFSF JT OP BEEJUJPOBM VODFSUBJOUZ JOEVDFE XIFO XF VTF B QSPCBCJMJUZ EJTUSJCVUJPO “truth” model
  25. Estimating divergence • Don’t know p! Don’t need it. Focus

    on difference between two approximating models:  */'03."5*0/ 5)&03: "/% .0%&- 1&3'03."/$&  EJWFSHFODF PG CPUI R BOE S ćJT UFSN IBT OP FČFDU PO UIF EJTUBODF PG R BOE S GSPN POF BOPUIFS 4P XIJMF XF EPOU LOPX XIFSF Q JT XF DBO FTUJNBUF IPX GBS BQBSU R BOE S BSF BOE XIJDI JT DMPTFS UP UIF UBSHFU *UT BT JG XF DBOU UFMM IPX GBS BOZ QBSUJDVMBS BSDIFS JT GSPN UIF UBSHFU CVU XF DBO UFMM XIJDI BSDIFS JT DMPTFS BOE CZ IPX NVDI %,-(Q, R) − %,-(Q, S) = − J QJ(MPH RJ − MPH QJ) − − J QJ(MPH SJ − MPH QJ) = − J QJ(MPH RJ − MPH SJ) = −(& MPH RJ − & MPH SJ) "MM PG UIJT BMTP NFBOT UIBU BMM XF OFFE UP LOPX JT B NPEFMT BWFSBHF MPH QSPC BCJMJUZ & MPH(RJ) GPS R BOE & MPH(SJ) GPS S ćFTF FYQSFTTJPOT MPPL B MPU MJLF MPH QSPCBCJMJUJFT PG PVUDPNFT MJLF UIF MPHMJLFMJIPPET ZPVWF CFFO VTJOH BMSFBEZ *O EFFE KVTU TVNNJOH UIF MPHMJLFMJIPPET PG FBDI DBTF QSPWJEFT BO BQQSPYJNBUJPO PG & MPH(RJ) 8F EPOU IBWF UP LOPX UIF Q JOTJEF UIF FYQFDUBUJPO CFDBVTF OBUVSF UBLFT DBSF PG QSFTFOUJOH UIF FWFOUT GPS VT model A model B
  26. Estimating divergence • Don’t know p! Don’t need it. Focus

    on difference between two approximating models:  */'03."5*0/ 5)&03: "/% .0%&- 1&3'03."/$&  EJWFSHFODF PG CPUI R BOE S ćJT UFSN IBT OP FČFDU PO UIF EJTUBODF PG R BOE S GSPN POF BOPUIFS 4P XIJMF XF EPOU LOPX XIFSF Q JT XF DBO FTUJNBUF IPX GBS BQBSU R BOE S BSF BOE XIJDI JT DMPTFS UP UIF UBSHFU *UT BT JG XF DBOU UFMM IPX GBS BOZ QBSUJDVMBS BSDIFS JT GSPN UIF UBSHFU CVU XF DBO UFMM XIJDI BSDIFS JT DMPTFS BOE CZ IPX NVDI %,-(Q, R) − %,-(Q, S) = − J QJ(MPH RJ − MPH QJ) − − J QJ(MPH SJ − MPH QJ) = − J QJ(MPH RJ − MPH SJ) = −(& MPH RJ − & MPH SJ) "MM PG UIJT BMTP NFBOT UIBU BMM XF OFFE UP LOPX JT B NPEFMT BWFSBHF MPH QSPC BCJMJUZ & MPH(RJ) GPS R BOE & MPH(SJ) GPS S ćFTF FYQSFTTJPOT MPPL B MPU MJLF MPH QSPCBCJMJUJFT PG PVUDPNFT MJLF UIF MPHMJLFMJIPPET ZPVWF CFFO VTJOH BMSFBEZ *O EFFE KVTU TVNNJOH UIF MPHMJLFMJIPPET PG FBDI DBTF QSPWJEFT BO BQQSPYJNBUJPO PG & MPH(RJ) 8F EPOU IBWF UP LOPX UIF Q JOTJEF UIF FYQFDUBUJPO CFDBVTF OBUVSF UBLFT DBSF PG QSFTFOUJOH UIF FWFOUT GPS VT model A model B • log-probability scores (deviance e.g.) provide estimate of E log qi
  27. Deviance (classic estimate) • How bad the model is, not

    how good • Compute it: • Compute log probability of each observation • Sum all of these log probabilities • Multiply by –2 • Common to use MAP estimates for probabilities, but can use entire posterior • Will do so later, when compute WAIC as estimate of deviance UBSHFU Q "MM PG UIJT EFMJWFST VT UP B WFSZ DPNNPO NFBTVSF PG NPEFM ĕU POF UIBU BMTP UVSOT PVU UP CF BO BQQSPYJNBUJPO PG ,- EJWFSHFODF UIF ıIJŃĶĮĻİIJ XIJDI JT EF ĕOFE BT %(R) = − J MPH(RJ) XIFSF J JOEFYFT FBDI PCTFSWBUJPO DBTF BOE FBDI RJ JT KVTU UIF MJLFMJIPPE PG DBTF J ćF − JO GSPOU EPFTOU EP BOZUIJOH JNQPSUBOU *UT UIFSF GPS IJTUPSJDBM SFBTPOT :PV DBO DPNQVUF UIF EFWJBODF GPS BOZ NPEFM ZPVWF ĕU BMSFBEZ JO UIJT CPPL KVTU CZ VTJOH UIF ."1 FTUJNBUFT UP DPNQVUF B MPHQSPCBCJMJUZ PG UIF PCTFSWFE EBUB GPS FBDI SPX ćFTF QSPCBCJMJUJFT BSF UIF R WBMVFT ćFO ZPV BEE UIFTF MPH QSPCBCJMJUJFT UPHFUIFS BOE NVMUJQMZ CZ − )FSFT B RVJDL FYBNQMF VTJOH UIF IP NJOJO CSBJO EBUB BHBJO *ǃǑDž ʆǦ *-ǯ )&01ǯ /&+ ʍ !+,/*ǯ *2 ǒ 0&$* ǰ ǒ *2 ʍ  ʀ ǹ*00 ǰ ǒ !1ʅ! ǒ 01/1ʅ)&01ǯʅ*"+ǯ!ɢ/&+ǰǒʅƽǒ0&$*ʅ0!ǯ!ɢ/&+ǰǰǒ
  28. The road to AIC/DIC/WAIC ✓ What’s a good prediction? ✓

    How far is the model from the target? ✓ How can we estimate that distance? • How can we adjust that estimate to account for overfitting?
  29. Everybody overfits • A meta-model of forecasting: • Two samples:

    training and testing, size N • Fit model to training sample, get Dtrain • Use fit to training to compute Dtest • Difference Dtest – Dtrain is overfitting
  30. NFBTVSFE JO BOE PVU PG TBNQMF VTJOH B TJNQMF QSFEJDUJPO

    TDFOBSJP 5P WJTVBMJ[F UIF SFTVMUT PG UIF UIPVHIU FYQFSJNFOU XIBU XFMM EP OPX JT DPOE UIPVHIU FYQFSJNFOU UIPVTBOE UJNFT GPS FBDI PG  EJČFSFOU MJOFBS SFHSFTTJPO NPEFM UIBU HFOFSBUFT UIF EBUB JT ZJ ∼ /PSNBM(µJ, ) µJ = (.)Y,J − (.)Y,J ćJT DPSSFTQPOET UP B (BVTTJBO PVUDPNF Z GPS XIJDI UIF JOUFSDFQU JT α =  B GPS FBDI PG UXP QSFEJDUPST BSF β = . BOE β = −. ćF NPEFMT GPS EBUB BSF MJOFBS SFHSFTTJPOT XJUI CFUXFFO  BOE  GSFF QBSBNFUFST ćF ĕSTU NPE QBSBNFUFS UP FTUJNBUF JT KVTU B MJOFBS SFHSFTTJPO XJUI BO VOLOPXO NFBO BOE &BDI QBSBNFUFS BEEFE UP UIF NPEFM BEET B QSFEJDUPS WBSJBCMF BOE JUT CFUBDPFď UIF iUSVFw NPEFM IBT OPO[FSP DPFďDJFOUT GPS POMZ UIF ĕSTU UXP QSFEJDUPST XF UIF USVF NPEFM IBT  QBSBNFUFST #Z ĕUUJOH BMM ĕWF NPEFMT XJUI CFUXFFO  BOE UP USBJOJOH TBNQMFT GSPN UIF TBNF QSPDFTTFT XF DBO HFU BO JNQSFTTJPO GPS I CFIBWFT 'ĶĴłĿIJ ƎƏ TIPXT UIF SFTVMUT PG UIPVTBOE TJNVMBUJPOT GPS FBDI NPEFM UZQ GFSFOU TBNQMF TJ[FT ćF GVODUJPO UIBU DPOEVDUT UIF TJNVMBUJPOT JT .$(Ǐ/-$) Data generating model: Models fit to data: µJ = α µJ = α + β Y,J µJ = α + β Y,J + β Y,J µJ = α + β Y,J + β Y,J + β Y,J µJ = α + β Y,J + β Y,J + β Y,J + β Y,J MQQE = MPH PG QSPEVDU PG BWFSBHF MJLFMJIPPET = TVN PG MPHT PG BWFSBHF MJLFMJIPPET / (flat priors) Everybody overfits
  31.   07&3'*55*/( 3&(6-"3*;"5*0/ "/% */'03."5*0/ $3*5&3*" 1 2 3

    4 5 45 50 55 60 65 number of parameters deviance N = 20 in out +1SD –1SD 1 2 3 4 5 250 260 270 280 290 300 number of parameters deviance N = 100 in out 'ĶĴłĿIJ ƎƏ %FWJBODF JO BOE PVU PG TBNQMF *O FBDI QMPU NPEFMT XJUI EJG GFSFOU OVNCFST PG QSFEJDUPS WBSJBCMFT BSF TIPXO PO UIF IPSJ[POUBM BYJT %F WJBODF BDSPTT UIPVTBOE TJNVMBUJPOT JT TIPXO PO UIF WFSUJDBM #MVF TIPXT Data generating model Everybody overfits
  32.   07&3'*55*/( 3&(6-"3*;"5*0/ "/% */'03."5*0/ $3*5&3*" 1 2 3

    4 5 45 50 55 60 65 number of parameters deviance N = 20 in out +1SD –1SD 1 2 3 4 5 250 260 270 280 290 300 number of parameters deviance N = 100 in out 'ĶĴłĿIJ ƎƏ %FWJBODF JO BOE PVU PG TBNQMF *O FBDI QMPU NPEFMT XJUI EJG GFSFOU OVNCFST PG QSFEJDUPS WBSJBCMFT BSF TIPXO PO UIF IPSJ[POUBM BYJT %F WJBODF BDSPTT UIPVTBOE TJNVMBUJPOT JT TIPXO PO UIF WFSUJDBM #MVF TIPXT Everybody overfits
  33.   07&3'*55*/( 3&(6-"3*;"5*0/ "/% */'03."5*0/ $3*5&3*" 1 2 3

    4 5 45 50 55 60 65 number of parameters deviance N = 20 in out +1SD –1SD 1 2 3 4 5 250 260 270 280 290 300 number of parameters deviance N = 100 in out 'ĶĴłĿIJ ƎƏ %FWJBODF JO BOE PVU PG TBNQMF *O FBDI QMPU NPEFMT XJUI EJG GFSFOU OVNCFST PG QSFEJDUPS WBSJBCMFT BSF TIPXO PO UIF IPSJ[POUBM BYJT %F WJBODF BDSPTT UIPVTBOE TJNVMBUJPOT JT TIPXO PO UIF WFSUJDBM #MVF TIPXT Everybody overfits