Upgrade to Pro — share decks privately, control downloads, hide ads and more …

L07 Statistical Rethinking Winter 2019

L07 Statistical Rethinking Winter 2019

Lecture 07 of the Dec 2018 through March 2019 edition of Statistical Rethinking. Covers back-door criterion and introduction overfitting/cross-validation/information criteria.

Richard McElreath

January 14, 2019
Tweet

More Decks by Richard McElreath

Other Decks in Education

Transcript

  1. Shutting the back door • What ties these examples together:

    • The back-door criterion: Confounding caused by existence of open back door paths from X to Y • If you know your elements, you know how to open/close each of them #VU XIBU FYBDUMZ JT DPOGPVOEJOH "OE XIJDI QSJODJQMFT FYQMBJO XIZ TPN WBSJBCMFT BOE TPNFUJNFT BEEJOH UIFN DBO QSPEVDF UIF TBNF QIFOPNFOP S DBVTBM NPOTUFST MVSLJOH PVU UIFSF IBVOUJOH PVS HSBQIT 8F SFRVJSF TPN $POGPVOEJOH JT BOZ DPOUFYU JO XIJDI UIF BTTPDJBUJPO CFUXFFO BO PVUDPN PS PG JOUFSFTU 9 JT OPU UIF TBNF BT JU XPVME CF JG XF IBE FYQFSJNFOUBMMZ E FT PG 9 'PS FYBNQMF JO UIF QSFWJPVT FYBNQMF UIF BTTPDJBUJPO CFUXFFO T JT DPOGPVOEFE CZ UIF VOPCTFSWFE WBSJBCMF 6 *G XF IBE BTTJHOFE FEVD MF XFE HFU B EJČFSFOU FTUJNBUF GPS UIF BTTPDJBUJPO %JSFDUMZ NBOJQVMBU T UIF HSBQI PO UIF MFę JOUP UIF HSBQI PO UIF SJHIU E U W E U W EPFT JU EP UIJT *O UIF HSBQI PO UIF MFę UIFSF BSF UXP QBUIT DPOOFDU
  2. X Z Y The Pipe X Z Y The Fork

    X Z Y The Collider X Z Y The Descendant A Open unless you condition on Z Open unless you condition on Z Closed until you condition on Z Conditioning on A is like conditioning on Z
  3. QMF XFE HFU B EJČFSFOU FTUJNBUF GPS UIF BTTPDJBUJPO %JSFDUMZ

    NBOJQVMB OT UIF HSBQI PO UIF MFę JOUP UIF HSBQI PO UIF SJHIU E U W E U W X EPFT JU EP UIJT *O UIF HSBQI PO UIF MFę UIFSF BSF UXP QBUIT DPOOFD & → 8 BOE  & ← 6 → 8 " iQBUIw IFSF KVTU NFBOT BOZ TFSJFT P E XBML UISPVHI UP HFU GSPN POF WBSJBCMF UP BOPUIFS JHOPSJOH UIF EJSFDUJPO OJQVMBUJPO SFNPWFT UIF JOĘVFODF PG 6 PO & ćJT UIFO TUPQT JOGPSNBUJPO XFFO & BOE 8 UISPVHI 6 *U CMPDLT UIF TFDPOE QBUI 0ODF UIF QBUI JT CM Z POF XBZ GPS JOGPSNBUJPO UP HP CFUXFFO & BOE 8 BOE UIFO NFBTVSJOH XFFO & BOE 8 DPVME ZJFME B VTFGVM NFBTVSF PG DBVTBM JOĘVFODF .BOJQV DPOGPVOEJOH CFDBVTF JU CMPDLT UIF PUIFS QBUI CFUXFFO & BOE 8 /PX DPOTJEFS UIBU UIFSF BSF TUBUJTUJDBM XBZT UP BDIJFWF UIF TBNF SFTVMU X OJQVMBUJOH & )PX ćF NPTU PCWJPVT JT UP BEE 6 UP UIF NPEFM UP DPOEJU T UIJT BMTP SFNPWF UIF DPOGPVOEJOH #FDBVTF JU BMTP CMPDLT UIF ĘPX PG J Two paths from E to W: (1) E → W (2) E ← U → W Close 2nd path by conditioning on U, closing the pipe.
  4. C G P U FRVFODF PG ( BOE 6 TP

    JG XF DPOEJUJPO PO 1 JU XJMM CJBT JOGFSFODF OFWFS HFU UP NFBTVSF 6 * EPOU FYQFDU UIBU GBDU UP CF JNNFEJBUFMZ PVHI B RVBOUJUBUJWF FYBNQMF  USJBET PG HSBOEQBSFOUT QBSFOUT BOE DIJMESFO ćJT TJNVMBUJPO SPKFDU PVS %"( BT B TFSJFT PG JNQMJFE GVODUJPOBM SFMBUJPOTIJQT ćF PO PG ( BOE 6 PO PG ( 1 BOE 6 GVODUJPOT PG BOZ PUIFS LOPXO WBSJBCMFT 3 paths from G to C: (1) G → C (2) G → P → C (3) G → P ← U → C Condition on P: Closes (2) but opens (3)
  5. Something more interesting • Which variables, if any, should you

    condition on to infer X → Y? • Procedure: (1) Find all paths. (2) Open/close as necessary. PDL UIF QBUI GSPN 9 UP : ćF TBNF IPMET GPS DPMMJEFST *G ZPV DPO FTDFOEFOU PG B DPMMJEFS JUMM TUJMM CF MJLF XFBLMZ DPOEJUJPOJOH PO B DP UUFS IPX DPNQMJDBUFE B DBVTBM %"( BQQFBST JU JT BMXBZT CVJMU PVU P BUJPOT "OE TJODF ZPV LOPX IPX UP PQFO BOE DMPTF FBDI ZPV PS ZPV PVU XIJDI WBSJBCMFT ZPV OFFE UP DPOUSPM‰PS OPU‰JO PSEFS UP TIVU UI FS TPNF FYBNQMFT SPBET ćF %"( CFMPX DPOUBJOT BO FYQPTVSF PG JOUFSFTU 9 BO PVUDPN TFSWFE WBSJBCMF 6 BOE UISFF PCTFSWFE DPWBSJBUFT " # BOE $  A B C U X Y
  6. Something more interesting • Which variables, if any, should you

    condition on to infer X → Y? • Condition on A or C. Do not condition on B. DPMMJEFS JUMM TUJMM CF MJLF XFBLMZ DPOEJUJPOJOH PO B DPMMJEFS MJDBUFE B DBVTBM %"( BQQFBST JU JT BMXBZT CVJMU PVU PG UIFTF GPVS DF ZPV LOPX IPX UP PQFO BOE DMPTF FBDI ZPV PS ZPVS DPNQVUFS CMFT ZPV OFFE UP DPOUSPM‰PS OPU‰JO PSEFS UP TIVU UIF CBDLEPPS QMFT "( CFMPX DPOUBJOT BO FYQPTVSF PG JOUFSFTU 9 BO PVUDPNF PG JOUFSFTU 6 BOE UISFF PCTFSWFE DPWBSJBUFT " # BOE $  A B C U X Y VF QBUI UIF DBVTBM FČFDU PG 9 PO : 8IJDI PG UIF PCTFSWFE DPWBSJ U X 8F BSF JOUFSFTUFE JO UIF CMVF QBUI UIF BUFT EP XF OFFE UP BEE UP UIF NPEFM J CBDLEPPS QBUIT "TJEF GSPN UIF EJSFD  9 ← 6 ← " → $ → :  9 ← 6 → # ← $ → : : BO VOPCTFSWFE WBSJBCMF 6 BOE UISF U X 8F BSF JOUFSFTUFE JO UIF CMVF QBUI UIF BUFT EP XF OFFE UP BEE UP UIF NPEFM J CBDLEPPS QBUIT "TJEF GSPN UIF EJSFD  9 ← 6 ← " → $ → :  9 ← 6 → # ← $ → : This path is open. This path is closed.
  7. Waffles Requiem • Remember the waffles. • Which to control

    to infer W → D?  $0/'30/5*/( $0/'06/%*/( A D M S W QI 4 JT XIFUIFS PS OPU B 4UBUF JT JO UIF TPVUIFSO 6OJUFE 4UBUFT " JT South Waffle Houses Marriage Divorce Age at marriage
  8. Waffles Requiem • Remember the waffles. • Which to control

    to infer W → D?  $0/'30/5*/( $0/'06/%*/( A D M S W QI 4 JT XIFUIFS PS OPU B 4UBUF JT JO UIF TPVUIFSO 6OJUFE 4UBUFT " JT South Waffle Houses Marriage Divorce Age at marriage
  9. Implied conditional independence • Given DAG, can test some implications

     $0/'30/5*/( $0/'06/%*/(  A D M S W FS PS OPU B 4UBUF JT JO UIF TPVUIFSO 6OJUFE 4UBUFT " JT NFEJBO BHF UJPOBM JOEFQFOEFODJFT QBJST PG WBSJBCMFT UIBU BSF OPU BTTPDJBUFE PODF XF DPOEJUJPO PO TPNF TFU PG PUIFS WBSJBCMFT #Z MJTUJOH UIFTF JNQMJFE DPOEJUJPOBM JOEFQFOEFODJFT BOE BTTFTTJOH FBDI XF DBO BU MFBTU UFTU TPNF PG UIF GFBUVSFT PG B HSBQI :PV DBO ĕOE DPOEJUJPOBM JOEFQFOEFODJFT VTJOH UIF TBNF QBUI MPHJD ZPV MFBSOFE GPS ĕOE JOH BOE DMPTJOH CBDLEPPST :PV KVTU IBWF UP GPDVT PO B QBJS PG WBSJBCMFT ĕOE BMM QBUIT DPO OFDUJOH UIFN BOE ĕHVSF PVU JG UIFSF JT BOZ TFU PG WBSJBCMFT ZPV DPVME DPOEJUJPO PO UP DMPTF UIFN BMM *O B MBSHF HSBQI UIJT JT RVJUF B DIPSF CFDBVTF UIFSF BSF NBOZ QBJST PG WBSJBCMFT BOE QPTTJCMZ NBOZ QBUIT #VU ZPVS DPNQVUFS JT HPPE BU TVDI DIPSFT *O UIJT DBTF UIFSF BSF UISFF JNQMJFE DPOEJUJPOBM JOEFQFOEFODJFT 3 DPEF  $(+'$ *)$/$*)' ) + ) )$ .ǿ "ǾǓǡǏ Ȁ  ǾȆȆǾ  Ȇ   ǾȆȆǾ  Ȇ Ǣ Ǣ   ǾȆȆǾ  Ȇ  (1) A and W independent, conditioning on S (2) D and S independent, conditioning on A, M, & W (3) M and W independent, conditioning on S
  10. Causal inference hard but possible • Demonstrate capable of inferring

    cause • Experiments not required! • Experiments not always practical or ethical • Disease, evolution, development, dynamics of popular music, global climate, war • Experiments must choose an intervention • Interventions influence many variables at once • Experimentally manipulate obesity? David Hume (1711–1776) rates your DAG 12/10
  11. More than the Back Door • Closing back doors is

    not the only option • Front-door criterion • Instrumental variables E FEVDBUJPO XF FYQFDU UIF JOGFSFODF UP CF CJBTFE CZ GBDUPST UIBU JOĘVFODF EVDBUJPO 'PS FYBNQMF JOEVTUSJPVT QFPQMF NBZ CPUI DPNQMFUF NPSF FEV JHIFS XBHFT HFOFSBUJOH B DPSSFMBUJPO CFUXFFO FEVDBUJPO BOE XBHFT #VU TBSJMZ NFBO UIBU FEVDBUJPO DBVTFT IJHIFS XBHFT V DBOU NFBTVSF UIF DPNNPO DPOGPVOET JU NJHIU CF QPTTJCMF UP HFU B HPPE JOĘVFODF PG FEVDBUJPO PO XBHFT 8IBU JT OFFEFE JT BO ĶĻŀŁĿłĺIJĻŁĮĹ KVTU BO PQBRVF UFSN GPS B EJSFDU JOĘVFODF PO FEVDBUJPO UIBU DBOOPU EJSFDUMZ VDI B WBSJBCMF NBLFT FEVDBUJPO JOUP B DPMMJEFS PG JUTFMG BOE UIF VONFBTVSFE ODF FEVDBUJPO JT B DPMMJEFS UIBU NFBOT MFBSOJOH BCPVU POF PG UIF QBUIT JOUP BUJPO BCPVU UIF PUIFS LF B MJHIU TXJUDI ćF MJHIU CFJOH PO EFQFOET VQPO CPUI UIF TXJUDI CFJOH XPSLJOH MJHIU CVMC -JHIU JT B DPMMJEFS PG UIF TXJUDI BOE UIF CVMC *G XF TFF Č BOE UIFO MFBSO UIBU UIF TXJUDI JT PO XF DBO JOGFS UIBU UIF CVMC NVTU CF O ćJT JT UIF TFOTF JO XIJDI B DPMMJEFS DBO CF VTFGVM 0ODF XF MFBSO POF PG OGPSNBUJPO BCPVU UIF PUIFS %"( CFMPX ćF DFOUSBM QSPCMFN JT UIBU FEVDBUJPO & BOE XBHFT 8 BSF FODFE CZ BO VOPCTFSWFE DBVTF 6 E Q U W U X Y Z
  12. Directed Acyclic Gaffes • Don’t get cocky • DAGs are

    small world constructs • Residual confounding: • Misclassification • Measurement error • Missingness • DAGs can accommodate these problems, but maybe tell us there are no solutions • We will see some solutions in later week • Eventually need *real* models of the system
  13.   07&3'*55*/( 3&(6-"3*;"5*0/ "/% */'03."5*0/ $3*5&3*" Ptolemaic Model Copernican

    Model Earth Earth Sun Sun 'ĶĴłĿIJ ƎƉ 1UPMFNBJD MFę BOE $PQFSOJDBO SJHIU NPEFMT PG UIF TPMBS TZTUFN #PUI NPEFMT VTF FQJDZDMFT DJSDMFT PO DJSDMFT BOE CPUI NPEFMT QSPEVDF FYBDUMZ UIF TBNF QSFEJDUJPOT )PXFWFS UIF $PQFSOJDBO NPEFM SF
  14. Ockham’s Razor? William of Ockham (c.1288–c.1348) Numquam ponenda est pluralitas

    sine necessitate. (Plurality should never be posited without necessity.)
  15. Stargazing • Stargazing: Using asterisks (p < 0.05) to select

    variables • Arbitrary 5% is arbitrary • p-values do not regulate accuracy Coefficients: Estimate Std. Error z value Pr(z) a 1.5699e+02 9.3802e-16 1.6736e+17 < 2.2e-16 *** b1 1.6540e-01 6.6628e-14 2.4825e+12 < 2.2e-16 *** b2 -4.7063e-02 3.2586e-13 -1.4443e+11 < 2.2e-16 *** b3 1.9168e-03 5.6805e-11 3.3743e+07 < 2.2e-16 *** b4 -1.4002e-05 6.6694e-11 -2.0994e+05 < 2.2e-16 *** b5 -4.7965e-07 4.7818e-08 -1.0031e+01 < 2.2e-16 *** b6 6.6002e-09 9.5819e-10 6.8882e+00 5.651e-12 *** tau 1.2132e-01 5.2829e-20 2.2965e+18 < 2.2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 * * *
  16. Goals • Understand overfitting and underfitting • Introduce regularization •

    Cross-validation & information criteria: • estimate predictive accuracy • estimate overfitting risk • understand how overfitting relates to complexity • identify influential observations • See that prediction and causal inference are different objectives AIC LOO WAIC
  17. The Problem with Parameters • What should a model learn

    from a sample? • Underfitting: Learning too little from the data. Too simple models both fit and predict poorly. • Overfitting: Learning too much from the data. Complex models tend to fit better, predict worse. • Want to find a model that navigates between underfitting and overfitting • Problem: Fit to sample always* improves as we add parameters *Not true of multilevel models & other types
  18. The Problem with Parameters Figure 7.2 afarensis sapiens  

    07&3'*55*/( 3&(6-"3*;"5*0/ "/% */'03."5*0/ $3*5&3*" 30 40 50 60 70 600 800 1000 1200 body mass (kg) brain volume (cc) afarensis africanus habilis boisei rudolfensis ergaster sapiens 'ĶĴłĿIJ ƎƊ "WFSBHF CSBJO WPMVNF UJNFUFST BHBJOTU CPEZ NBTT JO LJMPH IPNJOJO TQFDJFT 8IBU NPEFM CFTU SFMBUJPOTIJQ CFUXFFO CSBJO TJ[F BOE iWBSJBODF FYQMBJOFE w 3 JT EFĕOFE BT
  19. Variance “explained” • Most common & misused measure of model

    fit is R-squared: • Interpretation: Proportion of variance explained • How does R-squared behave? 0 40 50 60 70 body mass (kg) afarensis africanus habilis boisei TDSJCFT UIF SFMBUJPOTIJQ CFUXFFO CSBJO CPEZ TJ[F FE BT iWBSJBODF FYQMBJOFE w 3 JT EFĕOFE BT 3 = WBS(PVUDPNF) − WBS(SFTJEVBMT) WBS(PVUDPNF) =  − WBS(SFTJEVBMT) WBS(PVUDPNF) BTZ UP DPNQVUF 3 JT QPQVMBS #VU MJLF PUIFS NFBTVSFT PG ĕU UP TBNQMF 3 JODS FEJDUPS WBSJBCMFT BSF BEEFE ćJT JT USVF FWFO XIFO UIF WBSJBCMFT ZPV BEE UP SBOEPN OVNCFST XJUI OP SFMBUJPO UP UIF PVUDPNF 4P JUT OP HPPE UP DIPPTF VTJOH POMZ ĕU UP UIF EBUB POE XIJMF NPSF DPNQMFY NPEFMT ĕU UIF EBUB CFUUFS UIFZ PęFO QSFEJDU O .PEFMT UIBU IBWF NBOZ QBSBNFUFST UFOE UP PWFSĕU NPSF UIBO TJNQMFS NPEF IBU B DPNQMFY NPEFM XJMM CF WFSZ TFOTJUJWF UP UIF FYBDU TBNQMF VTFE UP ĕU JU
  20. Hominin brains • Simplest model: F JU NFBO [FSP BOE

    TUBOEBSE EFWJBUJPO POF‰BOE SFTDBMF UIF PVUDPNF IF MBSHFTU PCTFSWFE WBMVF JT  8IZ OPU TUBOEBSEJ[F CSBJO WPMVNF BT UP QSFTFSWF [FSP BT B SFGFSFODF QPJOU OP CSBJO BU BMM :PV DBOU IBWF IJOL 3 DPEF  . Ζ ( )ΰс(..ααζ.ΰс(..α $) ζ (3ΰс-$)α IFNBUJDBM WFSTJPO PG UIF ĕSTU MJOFBS NPEFM CJ ∼ /PSNBM(µJ, σ) µJ = α + βNJ α ∼ /PSNBM(., ) β ∼ /PSNBM(, ) σ ∼ -PH/PSNBM(, )  5)& 130#-&. 8*5) 1 body mass (kg) brain volume (cc) 35 47 60 450 900 1300 m7.1: R^2 = 0.51 brain volume (cc) 450 900 1300 ) 1300 m7.3: R^2 = 0.69 ) 1300
  21. Hominin brains • Why not parabola? ΁( )α Ζ с-$)ά./

    -͡ΰс-$)ά./α IFS NPEFMT UP DPNQBSF UP (ͦ΀͠ 8FMM DPOTJEFS ĕWF NPSF NPEFMT FBDI IF MBTU &BDI PG UIFTF NPEFMT XJMM KVTU CF B QPMZOPNJBM PG IJHIFS EFHSFF OEEFHSFF QPMZOPNJBM UIBU SFMBUFT CPEZ TJ[F UP CSBJO TJ[F JT B QBSBCPMB CJ ∼ /PSNBM(µJ, σ) µJ = α + β NJ + β N J α ∼ /PSNBM(., ) βK ∼ /PSNBM(, ) GPS K = .. σ ∼ -PH/PSNBM(, ) ET POF NPSF QBSBNFUFS β CVU VTFT BMM PG UIF TBNF EBUB BT (ͦ΀͠ 5P EP XF DBO EFĕOF β BT B WFDUPS ćF POMZ USJDL SFRVJSFE JT UP UFMM ,0+ IPX VTJOH B ./-/ MJTU 3 DPEF  5)& 130#-&. 8*5) 1"3".&5&34 body mass (kg) brain volume (cc) 35 47 60 450 900 1300 m7.1: R^2 = 0.51 body mass (kg) brain volume (cc) 35 47 60 450 900 1300 m7.2: R^2 = 0.54 c) 1300 m7.3: R^2 = 0.69 c) 1300 m7.4: R^2 = 0.82
  22. Hominin brains • Why not higher order polynomials? BSPVOE XJMEMZ

    JO UIJT JOUFSWBM *O 'ĶĴłĿIJ ƎƋ G FTQFDJBMMZ UIF TXJOH JT TP FYUSFN UIBU * IBE UP FYUFOE UIF SBOHF PG UIF WFSUJDBM BYJT UP EJTQMBZ UIF EFQUI BU XIJDI UI QSFEJDUFE NFBO ĕOBMMZ UVSOT CBDL BSPVOE "U BSPVOE LH UIF NPEFM QSFEJD B OFHBUJWF CSBJO TJ[F ćF NPEFM QBZT OP QSJDF ZFU GPS UIJT BCTVSEJUZ CFDBVT UIFSF BSF OP DBTFT JO UIF EBUB XJUI CPEZ NBTT OFBS LH 8IZ EPFT UIF TJYUIEFHSFF QPMZOPNJBM ĕU QFSGFDUMZ #FDBVTF JU IBT FOPVH QBSBNFUFST UP BTTJHO POF UP FBDI QPJOU PG EBUB ćF NPEFMT FRVBUJPO GPS UIF NFB IBT  QBSBNFUFST µJ = α + β NJ + β N J + β N J + β N J + β N J + β N J , BOE UIFSF BSF  TQFDJFT UP QSFEJDU CSBJO TJ[FT GPS 4P FČFDUJWFMZ UIJT NPEFM BTTJHO POF QBSBNFUFS UP KVTU SFJUFSBUF FBDI PCTFSWFE CSBJO TJ[F ćJT JT B HFOFSBM QIF OPNFOPO *G ZPV BEPQU B NPEFM GBNJMZ XJUI FOPVHI QBSBNFUFST ZPV DBO ĕU UI EBUB FYBDUMZ #VU TVDI B NPEFM XJMM NBLF SBUIFS BCTVSE QSFEJDUJPOT GPS ZFUUPCF PCTFSWFE DBTFT 3FUIJOLJOH .PEFM ĕUUJOH BT DPNQSFTTJPO "OPUIFS QFSTQFDUJWF PO UIF BCTVSE NPE KVTU BCPWF JT UP DPOTJEFS UIBU NPEFM ĕUUJOH DBO CF DPOTJEFSFE B GPSN PG ıĮŁĮ İļĺĽĿIJŀ ŀĶļĻ 1BSBNFUFST TVNNBSJ[F SFMBUJPOTIJQT BNPOH UIF EBUB ćFTF TVNNBSJFT DPNQSF UIF EBUB JOUP B TJNQMFS GPSN BMUIPVHI XJUI MPTT PG JOGPSNBUJPO iMPTTZw DPNQSFTTJPO
  23. Figure 7.3 body mass (kg) brain volume (cc) 35 47

    60 450 900 1300 m7.1: R^2 = 0.51 body mass (kg) brain volume (cc) 35 47 60 450 900 1300 m7.2: R^2 = 0.54 body mass (kg) brain volume (cc) 35 47 60 450 900 1300 m7.3: R^2 = 0.69 body mass (kg) brain volume (cc) 35 47 60 450 900 1300 m7.4: R^2 = 0.82 m7.5: R^2 = 0.99 m7.6: R^2 = 1 body mass (kg) 35 47 60 body mass (kg) brain volume (cc) 35 47 60 450 900 1300 m7.3: R^2 = 0.69 body mass (kg) brain volume (cc) 35 47 60 450 900 1300 m7.5: R^2 = 0.99 'ĶĴłĿIJ ƏƋ 1PMZOPNJBM MJOFBS NPEF body mass (kg) brain volume (cc) 35 47 60 450 900 1300 m7.2: R^2 = 0.54 body mass (kg) brain volume (cc) 35 47 60 450 900 1300 m7.4: R^2 = 0.82 m7.6: R^2 = 1 body mass (kg) 35 47 60 body mass (kg) 35 47 60 body mass (kg) brain volume (cc) 35 47 60 450 900 1300 m7.3: R^2 = 0.69 body mass (kg) brain volume (cc) 35 47 60 450 900 1300 m7.4: R^2 = 0.82 body mass (kg) brain volume (cc) 35 47 60 450 900 1300 m7.5: R^2 = 0.99 body mass (kg) brain volume (cc) 35 47 60 0 450 1300 m7.6: R^2 = 1 'ĶĴłĿIJ ƏƋ 1PMZOPNJBM MJOFBS NPEFMT PG JODSFBTJOH EFHSFF GPS UIF IPNJOJO
  24. Figure 7.3 body mass (kg) brain volume (cc) 35 47

    60 450 900 1300 m7.1: R^2 = 0.51 body mass (kg) brain volume (cc) 35 47 60 450 900 1300 m7.2: R^2 = 0.54 body mass (kg) brain volume (cc) 35 47 60 450 900 1300 m7.3: R^2 = 0.69 body mass (kg) brain volume (cc) 35 47 60 450 900 1300 m7.4: R^2 = 0.82 m7.5: R^2 = 0.99 m7.6: R^2 = 1 body mass (kg) 35 47 60 body mass (kg) brain volume (cc) 35 47 60 450 900 1300 m7.3: R^2 = 0.69 body mass (kg) brain volume (cc) 35 47 60 450 900 1300 m7.5: R^2 = 0.99 'ĶĴłĿIJ ƏƋ 1PMZOPNJBM MJOFBS NPEF body mass (kg) brain volume (cc) 35 47 60 450 900 1300 m7.2: R^2 = 0.54 body mass (kg) brain volume (cc) 35 47 60 450 900 1300 m7.4: R^2 = 0.82 m7.6: R^2 = 1 body mass (kg) 35 47 60 body mass (kg) 35 47 60 body mass (kg) brain volume (cc) 35 47 60 450 900 1300 m7.3: R^2 = 0.69 body mass (kg) brain volume (cc) 35 47 60 450 900 1300 m7.4: R^2 = 0.82 body mass (kg) brain volume (cc) 35 47 60 450 900 1300 m7.5: R^2 = 0.99 body mass (kg) brain volume (cc) 35 47 60 0 450 1300 m7.6: R^2 = 1 'ĶĴłĿIJ ƏƋ 1PMZOPNJBM MJOFBS NPEFMT PG JODSFBTJOH EFHSFF GPS UIF IPNJOJO
  25. Figure 7.3 body mass (kg) brain volume (cc) 35 47

    60 450 900 1300 m7.1: R^2 = 0.51 body mass (kg) brain volume (cc) 35 47 60 450 900 1300 m7.2: R^2 = 0.54 body mass (kg) brain volume (cc) 35 47 60 450 900 1300 m7.3: R^2 = 0.69 body mass (kg) brain volume (cc) 35 47 60 450 900 1300 m7.4: R^2 = 0.82 m7.5: R^2 = 0.99 m7.6: R^2 = 1 body mass (kg) 35 47 60 body mass (kg) brain volume (cc) 35 47 60 450 900 1300 m7.3: R^2 = 0.69 body mass (kg) brain volume (cc) 35 47 60 450 900 1300 m7.5: R^2 = 0.99 'ĶĴłĿIJ ƏƋ 1PMZOPNJBM MJOFBS NPEF body mass (kg) brain volume (cc) 35 47 60 450 900 1300 m7.2: R^2 = 0.54 body mass (kg) brain volume (cc) 35 47 60 450 900 1300 m7.4: R^2 = 0.82 m7.6: R^2 = 1 body mass (kg) 35 47 60 body mass (kg) 35 47 60 body mass (kg) brain volume (cc) 35 47 60 450 900 1300 m7.3: R^2 = 0.69 body mass (kg) brain volume (cc) 35 47 60 450 900 1300 m7.4: R^2 = 0.82 body mass (kg) brain volume (cc) 35 47 60 450 900 1300 m7.5: R^2 = 0.99 body mass (kg) brain volume (cc) 35 47 60 0 450 1300 m7.6: R^2 = 1 'ĶĴłĿIJ ƏƋ 1PMZOPNJBM MJOFBS NPEFMT PG JODSFBTJOH EFHSFF GPS UIF IPNJOJO
  26. Figure 7.3 body mass (kg) brain volume (cc) 35 47

    60 450 900 1300 m7.1: R^2 = 0.51 body mass (kg) brain volume (cc) 35 47 60 450 900 1300 m7.2: R^2 = 0.54 body mass (kg) brain volume (cc) 35 47 60 450 900 1300 m7.3: R^2 = 0.69 body mass (kg) brain volume (cc) 35 47 60 450 900 1300 m7.4: R^2 = 0.82 m7.5: R^2 = 0.99 m7.6: R^2 = 1 body mass (kg) 35 47 60 body mass (kg) brain volume (cc) 35 47 60 450 900 1300 m7.3: R^2 = 0.69 body mass (kg) brain volume (cc) 35 47 60 450 900 1300 m7.5: R^2 = 0.99 'ĶĴłĿIJ ƏƋ 1PMZOPNJBM MJOFBS NPEF body mass (kg) brain volume (cc) 35 47 60 450 900 1300 m7.2: R^2 = 0.54 body mass (kg) brain volume (cc) 35 47 60 450 900 1300 m7.4: R^2 = 0.82 m7.6: R^2 = 1 body mass (kg) 35 47 60 body mass (kg) 35 47 60 body mass (kg) brain volume (cc) 35 47 60 450 900 1300 m7.3: R^2 = 0.69 body mass (kg) brain volume (cc) 35 47 60 450 900 1300 m7.4: R^2 = 0.82 body mass (kg) brain volume (cc) 35 47 60 450 900 1300 m7.5: R^2 = 0.99 body mass (kg) brain volume (cc) 35 47 60 0 450 1300 m7.6: R^2 = 1 'ĶĴłĿIJ ƏƋ 1PMZOPNJBM MJOFBS NPEFMT PG JODSFBTJOH EFHSFF GPS UIF IPNJOJO
  27. Figure 7.3 body mass (kg) brain volume (cc) 35 47

    60 450 900 1300 m7.1: R^2 = 0.51 body mass (kg) brain volume (cc) 35 47 60 450 900 1300 m7.2: R^2 = 0.54 body mass (kg) brain volume (cc) 35 47 60 450 900 1300 m7.3: R^2 = 0.69 body mass (kg) brain volume (cc) 35 47 60 450 900 1300 m7.4: R^2 = 0.82 m7.5: R^2 = 0.99 m7.6: R^2 = 1 body mass (kg) 35 47 60 body mass (kg) brain volume (cc) 35 47 60 450 900 1300 m7.3: R^2 = 0.69 body mass (kg) brain volume (cc) 35 47 60 450 900 1300 m7.5: R^2 = 0.99 'ĶĴłĿIJ ƏƋ 1PMZOPNJBM MJOFBS NPEF body mass (kg) brain volume (cc) 35 47 60 450 900 1300 m7.2: R^2 = 0.54 body mass (kg) brain volume (cc) 35 47 60 450 900 1300 m7.4: R^2 = 0.82 m7.6: R^2 = 1 body mass (kg) 35 47 60 body mass (kg) 35 47 60 body mass (kg) brain volume (cc) 35 47 60 450 900 1300 m7.3: R^2 = 0.69 body mass (kg) brain volume (cc) 35 47 60 450 900 1300 m7.4: R^2 = 0.82 body mass (kg) brain volume (cc) 35 47 60 450 900 1300 m7.5: R^2 = 0.99 body mass (kg) brain volume (cc) 35 47 60 0 450 1300 m7.6: R^2 = 1 'ĶĴłĿIJ ƏƋ 1PMZOPNJBM MJOFBS NPEFMT PG JODSFBTJOH EFHSFF GPS UIF IPNJOJO
  28. Figure 7.5 Underfitting Insensitive to exact data Overfitting Very sensitive

    to exact data   6-:44&4 $0.1"44 body mass (kg) brain volume (cc) 35 47 60 450 900 1300 m7.1 body mass (kg) brain volume (cc) 35 47 60 0 900 2000 m7.4 'ĶĴłĿIJ Əƍ 6OEFSĕUUJOH BOE PWFSĕUUJOH BT VOEFSTFOTJUJWJUZ BOE PWFS TFOTJUJWJUZ UP TBNQMF *O CPUI QMPUT B SFHSFTTJPO JT ĕU UP UIF TFWFO TFUT PG EBUB NBEF CZ ESPQQJOH POF SPX GSPN UIF PSJHJOBM EBUB -Fę "O VOEFSĕU NPEFM JT JOTFOTJUJWF UP UIF TBNQMF DIBOHJOH MJUUMF BT JOEJWJEVBM QPJOUT BSF
  29. Figure 7.5 Underfitting Insensitive to exact data Overfitting Very sensitive

    to exact data   6-:44&4 $0.1"44 body mass (kg) brain volume (cc) 35 47 60 450 900 1300 m7.1 body mass (kg) brain volume (cc) 35 47 60 0 900 2000 m7.4 'ĶĴłĿIJ Əƍ 6OEFSĕUUJOH BOE PWFSĕUUJOH BT VOEFSTFOTJUJWJUZ BOE PWFS TFOTJUJWJUZ UP TBNQMF *O CPUI QMPUT B SFHSFTTJPO JT ĕU UP UIF TFWFO TFUT PG EBUB NBEF CZ ESPQQJOH POF SPX GSPN UIF PSJHJOBM EBUB -Fę "O VOEFSĕU NPEFM JT JOTFOTJUJWF UP UIF TBNQMF DIBOHJOH MJUUMF BT JOEJWJEVBM QPJOUT BSF
  30. Importance of being regular • Want the regular features of

    the sample • Strategies • Regularizing priors (penalized likelihood) • Cross-validation • Information criteria • Science! • Proper approach depends upon purpose • Answers are never only in the data, but they do usually require data
  31. The road to CV & WAIC • How to measure

    accuracy? • How measure distance from the target? • How can we estimate that distance? • How can we predict accuracy on new data?
  32. Information theory • Machine prediction obeys information theory • Information:

    Reduction in uncertainty caused by learning an outcome. Today Tomorrow ?
  33. Information theory • Information: Reduction in uncertainty caused by learning

    an outcome. • How to quantify uncertainty? Should be: 1. Continuous 2. Increasing with number of possible events 3. Additive • These criteria intuitive, but effectiveness is why we keep using them
  34. Information entropy • 1948, Claude Shannon derived information entropy: Shannon

    (1916–2001) Uncertainty in a probability distribution is average (minus) log-probability of an event. IFO UIF VODFSUBJOUZ BCPVU IPU PS DPME  EJČFSFOU QPTTJCMF FWFOUT ODFSUBJOUZ PWFS UIF GPVS DPNCJOBUJPOT PG UIFTF FWFOUT‰SBJOIPU PME TIJOFIPU TIJOFDPME‰TIPVME CF UIF TVN PG UIF TFQBSBUF VO OUJFT OF GVODUJPO UIBU TBUJTĕFT UIFTF EFTJEFSBUB ćJT GVODUJPO JT VTVBMMZ ĿĺĮŁĶļĻ IJĻŁĿļĽņ BOE IBT B TVSQSJTJOHMZ TJNQMF EFĕOJUJPO *G FSFOU QPTTJCMF FWFOUT BOE FBDI FWFOU J IBT QSPCBCJMJUZ QJ BOE XF SPCBCJMJUJFT Q UIFO UIF VOJRVF NFBTVSF PG VODFSUBJOUZ XF TFFL JT )(Q) = − & MPH(QJ) = − O J= QJ MPH(QJ).  ET DFSUBJOUZ DPOUBJOFE JO B QSPCBCJMJUZ EJTUSJCVUJPO JT UIF BWFS QSPCBCJMJUZ PG BO FWFOU JHIU SFGFS UP B UZQF PG XFBUIFS MJLF SBJO PS TIJOF PS B QBSUJDVMBS PS FWFO B QBSUJDVMBS OVDMFPUJEF JO B %/" TFRVFODF 8IJMF JUT OPU UP UIF EFUBJMT PG UIF EFSJWBUJPO PG ) JU JT XPSUI QPJOUJOH PVU UIBU
  35. Entropy to accuracy • Two probability distributions: p, q •

    p is true, q is model • How accurate is q, for describing p? • Distance from q to p: Divergence  */'03."5*0/ 5)&03: "/% .0%&- 1&3'03."/$&  PS FYBNQMF UIBU UIF USVF EJTUSJCVUJPO PG FWFOUT JT Q = ., Q = . OTUFBE UIBU UIFTF FWFOUT IBQQFO XJUI QSPCBCJMJUJFT R = ., R = DI BEEJUJPOBM VODFSUBJOUZ IBWF XF JOUSPEVDFE BT B DPOTFRVFODF PG , R} UP BQQSPYJNBUF Q = {Q, Q} ćF GPSNBM BOTXFS UP UIJT RVFT VQPO ) BOE IBT B TJNJMBSMZ TJNQMF GPSNVMB %,-(Q, R) = J QJ MPH(QJ) − MPH(RJ) . HVBHF UIF EJWFSHFODF JT UIF BWFSBHF EJČFSFODF JO MPH QSPCBCJMJUZ CF FU Q BOE NPEFM R  ćJT EJWFSHFODF JT KVTU UIF EJČFSFODF CFUXFFO  ćF FOUSPQZ PG UIF UBSHFU EJTUSJCVUJPO Q BOE UIF FOUSPQZ BSJTJOH UP QSFEJDU Q 8IFO Q = R XF LOPX UIF BDUVBM QSPCBCJMJUJFT PG UIF U DBTF Distance from q to p is the average difference in log-probability.