p is true, q is model • How accurate is q, for describing p? • Distance from q to p: Divergence */'03."5*0/ 5)&03: "/% .0%&- 1&3'03."/$& PS FYBNQMF UIBU UIF USVF EJTUSJCVUJPO PG FWFOUT JT Q = ., Q = . OTUFBE UIBU UIFTF FWFOUT IBQQFO XJUI QSPCBCJMJUJFT R = ., R = DI BEEJUJPOBM VODFSUBJOUZ IBWF XF JOUSPEVDFE BT B DPOTFRVFODF PG , R} UP BQQSPYJNBUF Q = {Q, Q} ćF GPSNBM BOTXFS UP UIJT RVFT VQPO ) BOE IBT B TJNJMBSMZ TJNQMF GPSNVMB %,-(Q, R) = J QJ MPH(QJ) − MPH(RJ) . HVBHF UIF EJWFSHFODF JT UIF BWFSBHF EJČFSFODF JO MPH QSPCBCJMJUZ CF FU Q BOE NPEFM R ćJT EJWFSHFODF JT KVTU UIF EJČFSFODF CFUXFFO ćF FOUSPQZ PG UIF UBSHFU EJTUSJCVUJPO Q BOE UIF FOUSPQZ BSJTJOH UP QSFEJDU Q 8IFO Q = R XF LOPX UIF BDUVBM QSPCBCJMJUJFT PG UIF U DBTF Distance from q to p is the average difference in log-probability.
“Deviance” • Smaller values are better • A meta-model of forecasting: • Two samples: training and testing, size N • Fit model to training sample, get Dtrain • Use posterior from training to compute Dtest • Difference Dtest – Dtrain is overfitting
informative, conservative priors to reduce overfitting => model learns less from sample • But if too skeptical, model learns too little • Such priors are regularizing
50 52 54 56 58 60 number of parameters deviance N = 20 N(0,1) N(0,0.5) N(0,0.2) 1 2 3 4 5 260 265 270 275 280 285 number of parameters deviance N = 100 'ĶĴłĿIJ ƎƑ 3FHVMBSJ[JOH QSJPST BOE PVUPGTBNQMF EFWJBODF ćF QPJOUT JO in sample out of sample in sample out of sample Figure 7.9
volume (cc) 35 47 60 450 900 1300 m7.1 body mass (kg) brain volume (cc) 35 47 60 0 900 2000 m7.4 Cross-validation • Leave out some observations • Train on remaining; score on those left out • Average over many leave-out sets is estimate of out- of-sample accuracy
Useful approximation: Importance sampling (IS) • More useful: Pareto-smoothed importance sampling (PSIS) • PSIS-LOO accurate, lots of useful diagnostics • LOO function in rethinking • See also loo package Prof Aki Vehtari (Helsinki), smooth estimator
year ending in digit “0” died in office • W. H. Harrison first, “Old Tippecanoe” • Lincoln, Garfield, McKinley, Harding, FD Roosevelt • J. F. Kennedy last, assassinated in 1963 • Reagan broke the curse! • Trying all possible models: A formula for overfitting • Be thoughtful • Be honest: Admit data exploration