16 32 48 small tanks medium tanks large tanks 'ĶĴłĿIJ ƉƊƉ &NQJSJDBM QSPQPSUJPOT PG TVSWJWPST JO FBDI UBEQPMF UBOL TIPXO CZ UIF ĕMMFE CMVF QPJOUT QMPUUFE XJUI UIF QFSUBOL FTUJNBUFT GSPN UIF NVMUJMFWFM NPEFM TIPXO CZ UIF CMBDL DJSDMFT ćF EBTIFE MJOF MPDBUFT UIF PWFSBMM BWFSBHF QSPQPSUJPO PG TVSWJWPST BDSPTT BMM UBOLT ćF WFSUJDBM Population mean not equal to raw empirical mean. Why? Imbalance in amount of evidence across tanks. Fixed estimate Multilevel estimate raw mean pop mean
change and become more uncertain • Meaning of parameter changes: no longer mean of data, but rather mean of distribution of intercepts • Uncertainty larger, because many combinations of alpha, sigma, a[tank]’s can produce same empirical mean of data 0.5 1.0 1.5 2.0 2.5 0 1 2 3 4 5 6 estimate Density alpha in fixed model alpha in vary intercept model
Further from mean, more shrinkage • Fewer data in cluster, more shrinkage • Same as regression to the mean, really 0.2 0.4 0.6 0.8 1.0 tank probability of survival in tank 1 16 32 10 25 25
estimates of other tanks • The model doesn’t have amnesia! • Effect of pooling influenced by • amount of data in cluster • amount of variation among clusters (sigma) Pool, or the terrorists win
• Result struck many as paradoxical • Proof was non-Bayesian • Suggested estimator similar to Bayes’ suggestion (Bayes’ is better) • Following in Wald’s footsteps INADMISSIBILITY OF THE USUAL ESTI- MATOR FOR THE MEAN OF A MULTI- VARIATE NORMAL DISTRIBUTION CHARLES STEIN STANFORD UNIVERSITY 1. Introduction If one observes the real random variables Xi, X,, independently normally dis- tributed with unknown means ti, *, {n and variance 1, it is customary to estimate (i by Xi. If the loss is the sum of squares of the errors, this estimator is admissible for n < 2, but inadmissible for n _ 3. Since the usual estimator is best among those which transform correctly under translation, any admissible estimator for n _ 3 involves an arbitrary choice. While the results of this paper are not in a form suitable for immediate practical application, the possible improvement over the usual estimator seems to be large enough to be of practical importance if n is large. Let X be a random n-vector whose expected value is the completely unknown vec- tor t and whose components are independently normally distributed with variance 1. We consider the problem of estimating t with the loss function L given by (1) L(t, d) = ( -d)I = 2(ti-dj2 where d is the vector of estimates. In section 2 we give a short proof of the inadmissi- bility of the usual estimator (2) d =t(X) = X, for n 2 3. For n = 2, the admissibility of 4, is proved in section 4. For n = 1 the ad- missibility of t, is well known (see, for example, [1], [2], [3]) and also follows from the result for n = 2. Of course, all of the results concerning this problem apply with obvious modifications if the assumption that the components of X are independently distributed with variance 1 is replaced by the condition that the covariance matrix 2 of X is known and nonsingular and the loss function (1) is replaced by (3) L (, d) = ( -d)'2-' ( -d). Charles Stein (1920–)
more accurate than fixed effects (no pooling)? • Grand mean: maximum underfitting • Fixed effects: maximum overfitting • Varying effects: adaptive regularization
0.20 0.30 pond absolute error 1 10 20 30 40 50 60 tiny (5) small (10) medium (25) large (35) 'ĶĴłĿIJ ƉƊƋ &SSPS PG OPQPPMJOH BOE QBSUJBM QPPMJOH FTUJNBUFT GPS UIF TJN VMBUFE UBEQPMF QPOET ćF IPSJ[POUBM BYJT EJTQMBZT QPOE OVNCFS ćF WFSUJ DBM BYJT NFBTVSFT UIF BCTPMVUF FSSPS JO UIF QSFEJDUFE QSPQPSUJPO PG TVSWJWPST DPNQBSFE UP UIF USVF WBMVF VTFE JO UIF TJNVMBUJPO ćF IJHIFS UIF QPJOU When can raw estimate be more accurate than multilevel estimate? Sometimes outliers are really outliers. Can use student-t or Cauchy (fat tails) to reduce shrinkage
clusters? • Same clusters: proceed as usual • New clusters: should average over distribution of varying effects • In this case: • Same clusters: Predictions for these chimpanzees • New clusters: Prediction for a new chimpanzee or rather for population of chimpanzees
as before: varying effects are just parameters; you know the model; push samples back through the model • link() and sim() obey this rule • New actors (counterfactual): • which actor (cluster) to use for counterfactual predictions? • average actor • marginal of actor • show sample of actors from posterior