Richard McElreath
February 18, 2019
2.3k

L17 Statistical Rethinking Winter 2019

Lecture 17 of the Dec 2018 through March 2019 edition of Statistical Rethinking. Covers Chapter 14, varying slopes and other covariance models.

Richard McElreath

February 18, 2019

Transcript

Week 9
2. Kinds of varying effects • Varying intercepts: means differ by

cluster • Varying slopes: effects of predictors vary by cluster • Any parameter can be made into a varying effect • (1) split into vector of parameters by cluster • (2) define population distribution -6 -4 -2 0 2 4 6 -6 -4 -2 0 2 4 6 -6 -4 -2 0 2 4 6 -6 -4 -2 0 2 4 6
3. Varying slopes • Why varying slopes? • drugs affect people

differently • after school programs don’t work for everyone • not every unit has same relationship to predictor • variation is important, whether for intervention or inference • Average effect misleading? • Pooling, shrinkage, mnesia
4. Population pooling • Major innovation with varying slopes is pooling

across parameters (intercepts & slopes) • Features of units have correlation structure • Learn one feature —> info about other features • e.g. Higher intercepts associated with smaller slopes
5. Café Robot • Robot programmed to visit cafés, order coffee,

record wait time • Visits in morning and afternoon • Intercepts: avg morning wait • Slopes: avg difference btw afternoon and morning • Are intercepts and slopes related? • Yes => pooling across parameter types!   .6-5* 2 4 6 8 wait time (minutes) M A M A M A M A M A 2 4 6 8 wait time (minutes) M A M A M A M A M A Café A Café B
6. Population of Cafés -3 -2 -1 0 1 2 3

0.0 0.2 0.4 intercept Density -3 -2 -1 0 1 2 3 0.0 0.4 0.8 slope Density -3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3 intercept slope intercepts slopes population
7. Population of Cafés • 2-dimensional Gaussian distribution • vector of

means • variance-covariance matrix -3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3 intercept slope  .6-5*-&7&- .0%&-4 **  JT UIF NFBO JOUFSDFQU UIF XBJU JO UIF NPSOJOH "OE UIF WBMVF JO  JT ČFSFODF JO XBJU CFUXFFO BęFSOPPO BOE NPSOJOH BODFT BOE DPWBSJBODFT JT BSSBOHFE MJLF UIJT σ α σασβρ σασβρ σ β QUT JT σ α BOE UIF WBSJBODF JO TMPQFT JT σ β  ćFTF BSF GPVOE BMPOH UIF ćF PUIFS UXP FMFNFOUT PG UIF NBUSJY BSF UIF TBNF σασβρ ćJT JT UIF FSDFQUT BOE TMPQFT *UT KVTU UIF QSPEVDU PG UIF UXP TUBOEBSE EFWJBUJPOT intercepts variance slopes variance covariance correlation
8. Simulated Cafés   .6-5*-&7&- .0%&-4 ** 2 3 4

5 6 -2.0 -1.5 -1.0 -0.5 intercepts (a_cafe) slopes (b_cafe) 'ĶĴłĿĲ ƉƋƊ  DBGÏT TBN UJTUJDBM QPQVMBUJPO ćF UIF JOUFSDFQU BWFSBHF NPSO DBGF ćF WFSUJDBM BYJT JT EJČFSFODF CFUXFFO BęFSOP XBJU GPS FBDI DBGÏ ćF HSB UIF NVMUJWBSJBUF (BVTTJBO Q DFQUT BOE TMPQFT 20 cafés 5 days morning & afternoon 200 observations
9. Varying slopes model   "%7&/563&4 */ \$07"3*"/\$& WBSZJOH JOUFSDFQUT

ćJT JT UIF WBSZJOH TMPQFT NPEFM XJUI FYQMBOBUJPO UP GPMMPX 8J ∼ /PSNBM(µJ, σ) µJ = αİĮĳ˦[J] + βİĮĳ˦[J] "J αİĮĳ˦ βİĮĳ˦ ∼ .7/PSNBM α β , 4 [population o 4 = σα   σβ 3 σα   σβ [construct co α ∼ /PSNBM(, ) [prior for a β ∼ /PSNBM(−, .) [prior fo σ ∼ &YQPOFOUJBM() [prior std σα ∼ &YQPOFOUJBM() [prior stddev a σβ ∼ &YQPOFOUJBM() [prior stdde 3 ∼ -,+DPSS() [prior for c ćF MJLFMJIPPE BOE MJOFBS NPEFM OFFE OP FYQMBOBUJPO BU UIJT QPJOU JO UIF CPPL
10. WBSZJOH JOUFSDFQUT ćJT JT UIF WBSZJOH TMPQFT NPEFM XJUI FYQMBO

8J ∼ /PSNBM(µJ, σ) µJ = αİĮĳ˦[J] + βİĮĳ˦[J] "J αİĮĳ˦ βİĮĳ˦ ∼ .7/PSNBM α β , 4 4 = σα   σβ 3 σα   σβ α ∼ /PSNBM(, ) β ∼ /PSNBM(−, .) σ ∼ &YQPOFOUJBM() σα ∼ &YQPOFOUJBM() σβ ∼ &YQPOFOUJBM() 3 ∼ -,+DPSS() ćF MJLFMJIPPE BOE MJOFBS NPEFM OFFE OP FYQMBOBUJPO BU UIJT QPJO MJOF XIJDI EFĕOFT UIF QPQVMBUJPO PG WBSZJOH JOUFSDFQUT BOE TMPQ varying intercepts varying slopes
11. WBSZJOH JOUFSDFQUT ćJT JT UIF WBSZJOH TMPQFT NPEFM XJUI FYQMBO

8J ∼ /PSNBM(µJ, σ) µJ = αİĮĳ˦[J] + βİĮĳ˦[J] "J αİĮĳ˦ βİĮĳ˦ ∼ .7/PSNBM α β , 4 4 = σα   σβ 3 σα   σβ α ∼ /PSNBM(, ) β ∼ /PSNBM(−, .) σ ∼ &YQPOFOUJBM() σα ∼ &YQPOFOUJBM() σβ ∼ &YQPOFOUJBM() 3 ∼ -,+DPSS() ćF MJLFMJIPPE BOE MJOFBS NPEFM OFFE OP FYQMBOBUJPO BU UIJT QPJO MJOF XIJDI EFĕOFT UIF QPQVMBUJPO PG WBSZJOH JOUFSDFQUT BOE TMPQ multivariate prior
12. WBSZJOH JOUFSDFQUT ćJT JT UIF WBSZJOH TMPQFT NPEFM XJUI FYQMBO

8J ∼ /PSNBM(µJ, σ) µJ = αİĮĳ˦[J] + βİĮĳ˦[J] "J αİĮĳ˦ βİĮĳ˦ ∼ .7/PSNBM α β , 4 4 = σα   σβ 3 σα   σβ α ∼ /PSNBM(, ) β ∼ /PSNBM(−, .) σ ∼ &YQPOFOUJBM() σα ∼ &YQPOFOUJBM() σβ ∼ &YQPOFOUJBM() 3 ∼ -,+DPSS() ćF MJLFMJIPPE BOE MJOFBS NPEFM OFFE OP FYQMBOBUJPO BU UIJT QPJO MJOF XIJDI EFĕOFT UIF QPQVMBUJPO PG WBSZJOH JOUFSDFQUT BOE TMPQ pop avg intercept pop avg slope covariance matrix
13. Covariance matrix shuffle • m-by-m covariance matrix requires estimating •

m standard deviations (or variances) • (m2 – m)/2 correlations (for covariances) • total of m(m + 1)/2 parameters • Several ways specify priors • Conjugate: inverse-Wishart (inv_wishart) • inverse-Wishart cannot pull apart stddev and correlations • Better to decompose: α ∼ /PSNBM(, ) βN ∼ /PSNBM(, ) αK ∼ /PSNBM(, σ) K = ... σ ∼ \$BVDIZ(, ) "JK ∼ #JOPNJBM(OJ, QJK) MPHJU QJK = α + αK + (βN + βNK)NJK α ∼ /PSNBM(, ) βN ∼ /PSNBM(, ) αK βNK ∼ .7/PSNBM   , Σ K = ... Σ = σ α ρσα σβ ρσα σβ σ β = σα   σβ  ρ ρ  σα   σβ = 434 S R {
14. Covariance matrix shuffle • m-by-m covariance matrix requires estimating •

m standard deviations (or variances) • (m2 – m)/2 correlations (for covariances) • total of m(m + 1)/2 parameters • Several ways specify priors • Conjugate: inverse-Wishart (inv_wishart) • inverse-Wishart cannot pull apart stddev and correlations • Better to decompose: α ∼ /PSNBM(, ) βN ∼ /PSNBM(, ) αK ∼ /PSNBM(, σ) K = ... σ ∼ \$BVDIZ(, ) "JK ∼ #JOPNJBM(OJ, QJK) MPHJU QJK = α + αK + (βN + βNK)NJK α ∼ /PSNBM(, ) βN ∼ /PSNBM(, ) αK βNK ∼ .7/PSNBM   , Σ K = ... Σ = σ α ρσα σβ ρσα σβ σ β = σα   σβ  ρ ρ  σα   σβ = 434 S R {
15. Matrixes are nice • Matrix algebra just shortcuts for working

with lists of numbers • A few simple rules • Can you make an omelet? You can multiply matrixes.
16. a b c d A B C D = αıĲĽŁ[J]

+ βıĲĽŁ[J] NJ ∼ .7/PSNBM α β , 4 = σα   σβ 3 σα   σβ ∼ /PSNBM(, ) βıĲĽŁ β 4 = σα   σβ 3 σ  α ∼ /PSNBM(, ) β ∼ /PSNBM(, ) (σα, σβ) ∼ )BMG\$BVDIZ(, ) MPHJU(QJ) = αıĲĽŁ[J] + β αıĲĽŁ βıĲĽŁ ∼ .7/PSNBM 4 = σα   σβ α ∼ /PSNBM(, Ł β 4 = σα   σβ 3 σα  α ∼ /PSNBM(, ) β ∼ /PSNBM(, ) β) ∼ )BMG\$BVDIZ(, )
17. a b c d A B C D = αıĲĽŁ[J]

+ βıĲĽŁ[J] NJ ∼ .7/PSNBM α β , 4 = σα   σβ 3 σα   σβ ∼ /PSNBM(, ) βıĲĽŁ β 4 = σα   σβ 3 σ  α ∼ /PSNBM(, ) β ∼ /PSNBM(, ) (σα, σβ) ∼ )BMG\$BVDIZ(, ) MPHJU(QJ) = αıĲĽŁ[J] + β αıĲĽŁ βıĲĽŁ ∼ .7/PSNBM 4 = σα   σβ α ∼ /PSNBM(, Ł β 4 = σα   σβ 3 σα  α ∼ /PSNBM(, ) β ∼ /PSNBM(, ) β) ∼ )BMG\$BVDIZ(, )
18. a b c d A B C D = αıĲĽŁ[J]

+ βıĲĽŁ[J] NJ ∼ .7/PSNBM α β , 4 = σα   σβ 3 σα   σβ ∼ /PSNBM(, ) βıĲĽŁ β 4 = σα   σβ 3 σ  α ∼ /PSNBM(, ) β ∼ /PSNBM(, ) (σα, σβ) ∼ )BMG\$BVDIZ(, ) MPHJU(QJ) = αıĲĽŁ[J] + β αıĲĽŁ βıĲĽŁ ∼ .7/PSNBM 4 = σα   σβ α ∼ /PSNBM(, Ł β 4 = σα   σβ 3 σα  α ∼ /PSNBM(, ) β ∼ /PSNBM(, ) β) ∼ )BMG\$BVDIZ(, )
19. a b c d A B Aa + Bc C

D = αıĲĽŁ[J] + βıĲĽŁ[J] NJ ∼ .7/PSNBM α β , 4 = σα   σβ 3 σα   σβ ∼ /PSNBM(, ) βıĲĽŁ β 4 = σα   σβ 3 σ  α ∼ /PSNBM(, ) β ∼ /PSNBM(, ) (σα, σβ) ∼ )BMG\$BVDIZ(, ) MPHJU(QJ) = αıĲĽŁ[J] + β αıĲĽŁ βıĲĽŁ ∼ .7/PSNBM 4 = σα   σβ α ∼ /PSNBM(, Ł β 4 = σα   σβ 3 σα  α ∼ /PSNBM(, ) β ∼ /PSNBM(, ) β) ∼ )BMG\$BVDIZ(, )
20. a b c d A B Aa + Bc Ab

+ Bd C D Ca + Dc Cb + Dd = αıĲĽŁ[J] + βıĲĽŁ[J] NJ ∼ .7/PSNBM α β , 4 = σα   σβ 3 σα   σβ ∼ /PSNBM(, ) βıĲĽŁ β 4 = σα   σβ 3 σ  α ∼ /PSNBM(, ) β ∼ /PSNBM(, ) (σα, σβ) ∼ )BMG\$BVDIZ(, ) MPHJU(QJ) = αıĲĽŁ[J] + β αıĲĽŁ βıĲĽŁ ∼ .7/PSNBM 4 = σα   σβ α ∼ /PSNBM(, Ł β 4 = σα   σβ 3 σα  α ∼ /PSNBM(, ) β ∼ /PSNBM(, ) β) ∼ )BMG\$BVDIZ(, )
21. Matrixes are nice N αK βNK ∼ .7/PSNBM  

, Σ K = ... Σ = σ α ρσα σβ ρσα σβ σ β = σα   σβ  ρ ρ  σα   σβ = 434 S
22. Matrixes are nice N αK βNK ∼ .7/PSNBM  

, Σ K = ... Σ = σ α ρσα σβ ρσα σβ σ β = σα   σβ  ρ ρ  σα   σβ = 434 S N αK ∼ /PSNBM(, σ) K = ... σ ∼ \$BVDIZ(, ) "JK ∼ #JOPNJBM(OJ, QJK) JU QJK = α + αK + (βN + βNK)NJK α ∼ /PSNBM(, ) βN ∼ /PSNBM(, ) αK NK ∼ .7/PSNBM   , Σ K = ... ρσα σβ σβ σ β = σα   σβ  ρ ρ  σα   σβ = 434 "JK ∼ #JOPNJBM(OJ, QJK) MPHJU QJK = α + αK + (βN + βNK)NJK α ∼ /PSNBM(, ) βN ∼ /PSNBM(, ) αK βNK ∼ .7/PSNBM   , Σ K = ... σ α ρσα σβ ρσα σβ σ β = σα   σβ  ρ ρ  σα   σβ = 434 ?
23. Matrixes are nice N αK βNK ∼ .7/PSNBM  

, Σ K = ... Σ = σ α ρσα σβ ρσα σβ σ β = σα   σβ  ρ ρ  σα   σβ = 434 S N αK ∼ /PSNBM(, σ) K = ... σ ∼ \$BVDIZ(, ) "JK ∼ #JOPNJBM(OJ, QJK) JU QJK = α + αK + (βN + βNK)NJK α ∼ /PSNBM(, ) βN ∼ /PSNBM(, ) αK NK ∼ .7/PSNBM   , Σ K = ... ρσα σβ σβ σ β = σα   σβ  ρ ρ  σα   σβ = 434 "JK ∼ #JOPNJBM(OJ, QJK) MPHJU QJK = α + αK + (βN + βNK)NJK α ∼ /PSNBM(, ) βN ∼ /PSNBM(, ) αK βNK ∼ .7/PSNBM   , Σ K = ... σ α ρσα σβ ρσα σβ σ β = σα   σβ  ρ ρ  σα   σβ = 434 σα ρσα ρσβ σβ
24. Matrixes are nice N αK βNK ∼ .7/PSNBM  

, Σ K = ... Σ = σ α ρσα σβ ρσα σβ σ β = σα   σβ  ρ ρ  σα   σβ = 434 S "JK ∼ #JOPNJBM(OJ, QJK) MPHJU QJK = α + αK + (βN + βNK)NJK α ∼ /PSNBM(, ) βN ∼ /PSNBM(, ) αK βNK ∼ .7/PSNBM   , Σ K = ... Σ = σ α ρσα σβ ρσα σβ σ β = σα   σβ  ρ ρ  σα   σβ = 434 σα ρσα ρσβ σβ ?
25. Matrixes are nice N αK βNK ∼ .7/PSNBM  

, Σ K = ... Σ = σ α ρσα σβ ρσα σβ σ β = σα   σβ  ρ ρ  σα   σβ = 434 S "JK ∼ #JOPNJBM(OJ, QJK) MPHJU QJK = α + αK + (βN + βNK)NJK α ∼ /PSNBM(, ) βN ∼ /PSNBM(, ) αK βNK ∼ .7/PSNBM   , Σ K = ... Σ = σ α ρσα σβ ρσα σβ σ β = σα   σβ  ρ ρ  σα   σβ = 434 σα ρσα ρσβ σβ σα ρσα ρσβ σβ σ α ρσα σβ ρσα σβ σ β
26. WBSZJOH JOUFSDFQUT ćJT JT UIF WBSZJOH TMPQFT NPEFM XJUI FYQMBO

8J ∼ /PSNBM(µJ, σ) µJ = αİĮĳ˦[J] + βİĮĳ˦[J] "J αİĮĳ˦ βİĮĳ˦ ∼ .7/PSNBM α β , 4 4 = σα   σβ 3 σα   σβ α ∼ /PSNBM(, ) β ∼ /PSNBM(−, .) σ ∼ &YQPOFOUJBM() σα ∼ &YQPOFOUJBM() σβ ∼ &YQPOFOUJBM() 3 ∼ -,+DPSS() ćF MJLFMJIPPE BOE MJOFBS NPEFM OFFE OP FYQMBOBUJPO BU UIJT QPJO MJOF XIJDI EFĕOFT UIF QPQVMBUJPO PG WBSZJOH JOUFSDFQUT BOE TMPQ build cov matrix
27. WBSZJOH JOUFSDFQUT ćJT JT UIF WBSZJOH TMPQFT NPEFM XJUI FYQMBO

8J ∼ /PSNBM(µJ, σ) µJ = αİĮĳ˦[J] + βİĮĳ˦[J] "J αİĮĳ˦ βİĮĳ˦ ∼ .7/PSNBM α β , 4 4 = σα   σβ 3 σα   σβ α ∼ /PSNBM(, ) β ∼ /PSNBM(−, .) σ ∼ &YQPOFOUJBM() σα ∼ &YQPOFOUJBM() σβ ∼ &YQPOFOUJBM() 3 ∼ -,+DPSS() ćF MJLFMJIPPE BOE MJOFBS NPEFM OFFE OP FYQMBOBUJPO BU UIJT QPJO MJOF XIJDI EFĕOFT UIF QPQVMBUJPO PG WBSZJOH JOUFSDFQUT BOE TMPQ fixed (non-adaptive) priors
28. WBSZJOH JOUFSDFQUT ćJT JT UIF WBSZJOH TMPQFT NPEFM XJUI FYQMBO

8J ∼ /PSNBM(µJ, σ) µJ = αİĮĳ˦[J] + βİĮĳ˦[J] "J αİĮĳ˦ βİĮĳ˦ ∼ .7/PSNBM α β , 4 4 = σα   σβ 3 σα   σβ α ∼ /PSNBM(, ) β ∼ /PSNBM(−, .) σ ∼ &YQPOFOUJBM() σα ∼ &YQPOFOUJBM() σβ ∼ &YQPOFOUJBM() 3 ∼ -,+DPSS() ćF MJLFMJIPPE BOE MJOFBS NPEFM OFFE OP FYQMBOBUJPO BU UIJT QPJO MJOF XIJDI EFĕOFT UIF QPQVMBUJPO PG WBSZJOH JOUFSDFQUT BOE TMPQ correlation matrix prior
29. LKJ Correlation prior • After Lewandowski, Kurowicka, and Joe (LKJ)

2009 • One parameter, eta, specifies concentration or dispersion from identity matrix (zero correlations) • eta = 1, uniform correlation matrices • eta > 1, stomps on extreme correlations • eta < 1, elevates extreme correlations  7"3:*/( 4-0 -1.0 -0.5 0.0 0.5 1.0 0.0 0.2 0.4 0.6 0.8 1.0 correlation Density eta=4 eta=2 eta=1 PG JU BT B SFHVMBSJ[JOH QSJPS GPS DPSSFMBUJPOT DPOUSPMT IPX TLFQUJDBM UIF QSJPS JT PG MBSHF
30. Varying slopes estimation m14.1 <- ulam( alist( wait ~ normal(

mu , sigma ), mu <- a_cafe[cafe] + b_cafe[cafe]*afternoon, c(a_cafe,b_cafe)[cafe] ~ multi_normal( c(a,b) , Rho , sigma_cafe ), a ~ normal(5,2), b ~ normal(-1,0.5), sigma_cafe ~ exponential(1), sigma ~ exponential(1), Rho ~ lkj_corr(2) ) , data=d , chains=4 , cores=4 )
31. Varying slopes estimation m14.1 <- ulam( alist( wait ~ normal(

mu , sigma ), mu <- a_cafe[cafe] + b_cafe[cafe]*afternoon, c(a_cafe,b_cafe)[cafe] ~ multi_normal( c(a,b) , Rho , sigma_cafe ), a ~ normal(5,2), b ~ normal(-1,0.5), sigma_cafe ~ exponential(1), sigma ~ exponential(1), Rho ~ lkj_corr(2) ) , data=d , chains=4 , cores=4 )
32. Varying slopes estimation m14.1 <- ulam( alist( wait ~ normal(

mu , sigma ), mu <- a_cafe[cafe] + b_cafe[cafe]*afternoon, c(a_cafe,b_cafe)[cafe] ~ multi_normal( c(a,b) , Rho , sigma_cafe ), a ~ normal(5,2), b ~ normal(-1,0.5), sigma_cafe ~ exponential(1), sigma ~ exponential(1), Rho ~ lkj_corr(2) ) , data=d , chains=4 , cores=4 )
33. Varying slopes estimation m14.1 <- ulam( alist( wait ~ normal(

mu , sigma ), mu <- a_cafe[cafe] + b_cafe[cafe]*afternoon, c(a_cafe,b_cafe)[cafe] ~ multi_normal( c(a,b) , Rho , sigma_cafe ), a ~ normal(5,2), b ~ normal(-1,0.5), sigma_cafe ~ exponential(1), sigma ~ exponential(1), Rho ~ lkj_corr(2) ) , data=d , chains=4 , cores=4 )
34. Posterior correlation QFFL BU UIF SBX 4UBO DPEF XJUI ./)*

ǿ(ǎǑǡǎȀ ćF OBNF (0'/\$Ǿ)*-(' J VTFT JO JUT SBX DPEF ćF TJNJMBS 3 GVODUJPOT BSF (1)*-( BOE (1)*-(Ǐ /PX JOTUFBE PG MPPLJOH BU UIF NBSHJOBM QPTUFSJPS EJTUSJCVUJPOT JO UIF +- \$. HP TUSBJHIU UP JOTQFDUJOH UIF QPTUFSJPS EJTUSJCVUJPO PG WBSZJOH FČFDUT 'JSTU MFUT F QPTUFSJPS DPSSFMBUJPO CFUXFFO JOUFSDFQUT BOE TMPQFT 3 DPEF  +*./ ʚǶ 3/-/ǡ.(+' .ǿ(ǎǑǡǎȀ  ).ǿ +*./ɶ#*ȁǢǎǢǏȂ Ȁ ćF SFTVMU JT TIPXO JO 'ĶĴłĿĲ Ɖƌƌ XJUI TPNF BEEJUJPOBM EFDPSBUJPO BOE UIF UIF QSJPS GPS DPNQBSJTPO ćF CMVF EFOTJUZ JT UIF QPTUFSJPS EJTUSJCVUJPO PG UIF CFUXFFO JOUFSDFQUT BOE TMPQFT ćF QPTUFSJPS JT DPODFOUSBUFE PO OFHBUJWF WBMVFT NPEFM IBT MFBSOFE UIF OFHBUJWF DPSSFMBUJPO ZPV DBO TFF JO 'ĶĴłĿĲ ƉƌƊ ,FFQ JO UIF NPEFM EJE OPU HFU UP TFF UIF USVF JOUFSDFQUT BOE TMPQFT "MM JU IBE UP XPSL GS PCTFSWFE XBJU UJNFT JO NPSOJOH BOE BęFSOPPO *G ZPV BSF DVSJPVT BCPVU UIF JNQBDU PG UIF QSJPS UIFO ZPV TIPVME DIBOHF UI SFQFBU UIF BOBMZTJT * TVHHFTU USZJOH B ĘBU QSJPS -,+DPSS  BOE UIFO B NPSF TU MBSJ[JOH QSJPS MJLF -,+DPSS  PS -,+DPSS   /FYU DPOTJEFS UIF TISJOLBHF ćF NVMUJMFWFM NPEFM FTUJNBUFT QPTUFSJPS EJTUS JOUFSDFQUT BOE TMPQFT PG FBDI DBGÏ ćF JOGFSSFE DPSSFMBUJPO CFUXFFO UIFTF WBS XBT VTFE UP QPPM JOGPSNBUJPO BDSPTT UIFN ćJT JT KVTU BT UIF JOGFSSFE WBSJBUJPO UFSDFQUT QPPMT JOGPSNBUJPO BNPOH UIFN BT XFMM BT IPX UIF JOGFSSFE WBSJBUJPO BN QPPMT JOGPSNBUJPO BNPOH UIFN "MM UPHFUIFS UIF WBSJBODFT BOE DPSSFMBUJPO E GFSSFE NVMUJWBSJBUF (BVTTJBO QSJPS GPS UIF WBSZJOH FČFDUT "OE UIJT QSJPS MFBSO   "%7&/563&4 */ \$07"3*"/\$& -1.0 -0.5 0.0 0.5 1.0 0.0 0.5 1.0 1.5 2.0 2.5 correlation Density posterior prior 'ĶĴłĿĲ Ɖƌƌ 1PTUFSJP SFMBUJPO CFUXFFO JOUFS 1PTUFSJPS EJTUSJCVUJPO BCMZ CFMPX [FSP %BT UIF -,+DPSS  EFOTJUZ
35. Posterior shrinkage  "%7&/56 -1.0 -0.5 0.0 0.5 1.0 0.0

0.5 1.0 1.5 2.0 2.5 correlation Density posterior prior QFFL BU UIF SBX 4UBO DPEF XJUI ./)* ǿ   "%7&/563&4 */ \$07"3*"/\$& 2 3 4 5 6 -2.0 -1.5 -1.0 -0.5 intercept slope 2 3 4 5 6 1 2 3 4 morning wait afternoon wait 'ĶĴłĿĲ Ɖƌƍ 4ISJOLBHF JO UXP EJNFOTJPOT -Fę 3BX VOQPPMFE JOUFSDFQUT BOE TMPQFT ĕMMFE CMVF DPNQBSFE UP QBSUJBMMZ QPPMFE QPTUFSJPS NFBOT PQFO DJSDMFT  ćF HSBZ DPOUPVST TIPX UIF JOGFSSFE QPQVMBUJPO PG WBSZJOH FČFDUT
36. 2 3 4 5 6 -2.0 -1.5 -1.0 -0.5 intercept

slope 1 2 3 4 afternoon wait 10 30 50 80 99
37. parameter scale outcome scale   "%7&/563&4 */ \$07"3*"/\$& 2

3 4 5 6 -2.0 -1.5 -1.0 -0.5 intercept slope 2 3 4 5 6 1 2 3 4 morning wait afternoon wait 'ĶĴłĿĲ Ɖƌƍ 4ISJOLBHF JO UXP EJNFOTJPOT -Fę 3BX VOQPPMFE JOUFSDFQUT BOE TMPQFT ĕMMFE CMVF DPNQBSFE UP QBSUJBMMZ QPPMFE QPTUFSJPS NFBOT PQFO   "%7&/563&4 */ \$07"3*"/\$& 2 3 4 5 6 -2.0 -1.5 -1.0 -0.5 intercept slope 2 3 4 5 6 1 2 3 4 morning wait afternoon wait 'ĶĴłĿĲ Ɖƌƍ 4ISJOLBHF JO UXP EJNFOTJPOT -Fę 3BX VOQPPMFE JOUFSDFQUT BOE TMPQFT ĕMMFE CMVF DPNQBSFE UP QBSUJBMMZ QPPMFE QPTUFSJPS NFBOT PQFO
38. Multi-dimensional shrinkage • Joint distribution of varying effects pools information

across intercepts & slopes • Correlation btw effects => shrinkage in one dimension induces shrinkage in others • Improved accuracy, just like varying intercepts   "%7&/563&4 */ 2 3 4 5 6 -2.0 -1.5 -1.0 -0.5 intercept slope afternoon wait 'ĶĴłĿĲ Ɖƌƍ 4ISJOLBHF JO UXP EJNFOTJP BOE TMPQFT ĕMMFE CMVF DPNQBSFE UP QBSU DJSDMFT  ćF HSBZ DPOUPVST TIPX UIF JOGF 3JHIU ćF TBNF FTUJNBUFT PO UIF PVUDPN
39. Many effects, many clusters • Let’s try something more ambitious

• Chimpanzees data • 4 treatments • Each can vary by actor or block  "%7&/563&4 */ \$07"3*"/\$& BOE JUT MJOFBS NPEFM MPPLT MJLF -J ∼ #JOPNJBM(, QJ) MPHJU(QJ) = γŁĶı[J] + αĮİŁļĿ[J],ŁĶı[J] + βįĹļİĸ[J],ŁĶı[J] NPEFM GPS MPHJU(QJ) DPOUBJOT BO BWFSBHF MPHPEET GPS FBDI USFBUNFOU γŁ BDI BDUPS JO FBDI USFBUNFOU αĮİŁļĿ[J],ŁĶı[J] BOE ĕOBMMZ BO FČFDU GPS FBDI C NFOU βįĹļİĸ[J],ŁĶı[J]  ćJT JT FTTFOUJBMMZ BO JOUFSBDUJPO NPEFM UIBU BMMPXT UIF F NFOU UP WBSZ CZ FBDI BDUPS BOE FBDI CMPDL ćJT JT UP TBZ UIBU UIF BWFSBHF USF Mean treatment effects Each actor in each treatment Each block in each treatment
40. Covariance matrixes • One matrix for each cluster (actor, block)

• 7*4 = 28 actor parameters • 6*4 = 24 block parameters • Each covariance matrix: 6 correlations + 4 sigmas • Total: 28 + 24 + 20 + 4 = 76 parameters GPS MPHJU(QJ) DPOUBJOT BO BWFSBHF MPHPEET GPS FBDI USFBUNFOU γŁĶı[J] BO PS JO FBDI USFBUNFOU αĮİŁļĿ[J],ŁĶı[J] BOE ĕOBMMZ BO FČFDU GPS FBDI CMPDL JO Ĺļİĸ[J],ŁĶı[J]  ćJT JT FTTFOUJBMMZ BO JOUFSBDUJPO NPEFM UIBU BMMPXT UIF FČFDU PG WBSZ CZ FBDI BDUPS BOE FBDI CMPDL ćJT JT UP TBZ UIBU UIF BWFSBHF USFBUNFOU CMPDL BOE UIF FBDI JOEJWJEVBM DIJNQBO[FF DBO BMTP SFTQPOE BDSPTT CMPDLT EJČFSFOUMZ ćJT ZJFMET B UPUBM PG +×+× =  QBSBNFUFST 1PPMJOH FSF NF QPPMJOH ćF OFYU QBSU PG UIF NPEFM BSF UIF BEBQUJWF QSJPST 4JODF UFS UZQFT BDUPST BOE CMPDLT UIFSF BSF UXP NVMUJWBSJBUF (BVTTJBO QSJPST (BVTTJBO QSJPST BSF CPUI EJNFOTJPOBM JO UIJT FYBNQMF CFDBVTF UIFSF BSF JO HFOFSBM ZPV DBO DIPPTF UP IBWF EJČFSFOU WBSZJOH FČFDUT JO EJČFSFOU F BSF UIF UXP QSJPST JO UIJT DBTF ⎡ ⎢ ⎢ ⎣ αK, αK, αK, αK, ⎤ ⎥ ⎥ ⎦ ∼ .7/PSNBM ⎛ ⎜ ⎜ ⎝ ⎡ ⎢ ⎢ ⎣     ⎤ ⎥ ⎥ ⎦ , 4ĮİŁļĿ ⎞ ⎟ ⎟ ⎠ ⎡ ⎢ ⎢ ⎣ βK, βK, βK, βK, ⎤ ⎥ ⎥ ⎦ ∼ .7/PSNBM ⎛ ⎜ ⎜ ⎝ ⎡ ⎢ ⎢ ⎣     ⎤ ⎥ ⎥ ⎦ , 4įĹļİĸ ⎞ ⎟ ⎟ ⎠ 1 ρ12 ρ13 ρ14 ρ12 1 ρ23 ρ24 ρ13 ρ23 1 ρ34 ρ14 ρ24 ρ34 1
41. m14.2 <- ulam( alist( L ~ binomial(1,p), logit(p) <- g[tid]

+ alpha[actor,tid] + beta[block_id,tid], # adaptive priors vector[4]:alpha[actor] ~ multi_normal(0,Rho_actor,sigma_actor), vector[4]:beta[block_id] ~ multi_normal(0,Rho_block,sigma_block), # fixed priors g[tid] ~ dnorm(0,1), sigma_actor ~ dexp(1), Rho_actor ~ dlkjcorr(4), sigma_block ~ dexp(1), Rho_block ~ dlkjcorr(4) ) , data=dat , chains=4 , cores=4 ) Maximally random chimps
42. m14.2 <- ulam( alist( L ~ binomial(1,p), logit(p) <- g[tid]

+ alpha[actor,tid] + beta[block_id,tid], # adaptive priors vector[4]:alpha[actor] ~ multi_normal(0,Rho_actor,sigma_actor), vector[4]:beta[block_id] ~ multi_normal(0,Rho_block,sigma_block), # fixed priors g[tid] ~ dnorm(0,1), sigma_actor ~ dexp(1), Rho_actor ~ dlkjcorr(4), sigma_block ~ dexp(1), Rho_block ~ dlkjcorr(4) ) , data=dat , chains=4 , cores=4 ) Maximally random chimps
43. m14.2 <- ulam( alist( L ~ binomial(1,p), logit(p) <- g[tid]

+ alpha[actor,tid] + beta[block_id,tid], # adaptive priors vector[4]:alpha[actor] ~ multi_normal(0,Rho_actor,sigma_actor), vector[4]:beta[block_id] ~ multi_normal(0,Rho_block,sigma_block), # fixed priors g[tid] ~ dnorm(0,1), sigma_actor ~ dexp(1), Rho_actor ~ dlkjcorr(4), sigma_block ~ dexp(1), Rho_block ~ dlkjcorr(4) ) , data=dat , chains=4 , cores=4 ) Maximally random chimps

45. Non-centered form • Non-centered easy for uni-variate priors: Just factor

out sigma • But now need to factor correlation matrix out of the prior and smuggle into linear model • Can be done: Cholesky factor André-Louis Cholesky (1875–1918)
46. I. NOTICES SCIENTIFIQUES Commandant BENOIT '. NOTE SUR UNE IVIETHODE

DE RESOLUTION DES EflUA- TIONS NOR~IALES PROVENANT DE L'APPLICATION BE LA METHODE DES IVIOINDRES CARRIES A UN SYSTEME D'EQUATIONS LINP.AIRES EN NOI~IBRE INF~.RIEUR A CELUI DES INCONNUES. -- APPLICATION DE LA MP~- THODE A LA RF.SOLUTION D'UN Su DEFINI D'P.QUATIONS LINEAIRES. (Procdd~ du Commandant CnoLl~S~'~-.) Le Commandant d' krtillerie Cholesky, du Service g6ographi- que de l':krm6e, tu6 pendant la grande guerre, a imagin6, au cours de recherches sur la compensation des r6seaux g6od6si- ques, un proc6d6 tr~s ing,~nieux de r6solution des 6quations dites normales, obtenues par application de la m6thode des moindres carr6s '~ des 6quations lin6aires en hombre inf6rieur h celui des inconnues, lien a conclu une m6thode g6n6rale de r6solution des 6quations lin6aires. Nous suivrons, pour la d6monstration de cette m6thode, la progression m6me qui a ser~i au Commandant Cholesky pour l'imaginer. x. De l'~krtillerie coloniale, ancien officier g6od6sien au Service g~ographique de l'Xrm@ et au Service g6ographique de l'Indo-Chine, Membre du Comit6 national franc ais de Gdoddsie et G6oph3-sique. 2. Sur le Commandant Cholesky, tu6 h l'ennemi le 3~ ao6t I9~8, voir la notice biographique insdr6e dans le volume du B~dlelin g~o- d~sique de x922 intitul6 : Unior~ 9~od~siq~le et ggophysique inlernalio- hale, Premibre Assemblde 9~n~rale, Rome, mai 1929, Seclion de G~o- dgsie, Toulouse, Privat, I9~, in-8 ~ 2~I p., pp. x59 ~ x6i. 5 D'EQUATIONS LINP.AIRES EN NOI~IBRE INF~.RIEUR A CELUI DES INCONNUES. -- APPLICATION DE LA MP~- THODE A LA RF.SOLUTION D'UN Su DEFINI D'P.QUATIONS LINEAIRES. (Procdd~ du Commandant CnoLl~S~'~-.) Le Commandant d' krtillerie Cholesky, du Service g6ographi- que de l':krm6e, tu6 pendant la grande guerre, a imagin6, au cours de recherches sur la compensation des r6seaux g6od6si- ques, un proc6d6 tr~s ing,~nieux de r6solution des 6quations dites normales, obtenues par application de la m6thode des moindres carr6s '~ des 6quations lin6aires en hombre inf6rieur h celui des inconnues, lien a conclu une m6thode g6n6rale de r6solution des 6quations lin6aires. Nous suivrons, pour la d6monstration de cette m6thode, la progression m6me qui a ser~i au Commandant Cholesky pour l'imaginer. x. De l'~krtillerie coloniale, ancien officier g6od6sien au Service g~ographique de l'Xrm@ et au Service g6ographique de l'Indo-Chine, Membre du Comit6 national franc ais de Gdoddsie et G6oph3-sique.
47. Cholesky magic N <- 1e4 sigma1 <- 2 sigma2 <-

0.5 rho <- 0.6 z1 <- rnorm( N ) z2 <- rnorm( N ) a1 <- z1 * sigma1 a2 <- ( rho*z1 + sqrt( 1-rho^2 )*z2 )*sigma2 > cor(z1,z2) [1] -0.0005542644 > cor(a1,a2) [1] 0.5999334 > sd(a1) [1] 1.997036 > sd(a2) [1] 0.4989456
48. Non-centered random chimps m14.3 <- ulam( alist( L ~ binomial(1,p),

logit(p) <- g[tid] + alpha[actor,tid] + beta[block_id,tid], # adaptive priors - non-centered transpars> matrix[actor,4]:alpha <- compose_noncentered( sigma_actor , L_Rho_actor , z_actor ), transpars> matrix[block_id,4]:beta <- compose_noncentered( sigma_block , L_Rho_block , z_block ), matrix[4,actor]:z_actor ~ normal( 0 , 1 ), matrix[4,block_id]:z_block ~ normal( 0 , 1 ), # fixed priors g[tid] ~ normal(0,1), vector[4]:sigma_actor ~ dexp(1), cholesky_factor_corr[4]:L_Rho_actor ~ lkj_corr_cholesky( 2 ), vector[4]:sigma_block ~ dexp(1), cholesky_factor_corr[4]:L_Rho_block ~ lkj_corr_cholesky( 2 ) ) , data=dat , chains=4 , cores=4 , log_lik=TRUE )
49. Non-centered random chimps m14.3 <- ulam( alist( L ~ binomial(1,p),

logit(p) <- g[tid] + alpha[actor,tid] + beta[block_id,tid], # adaptive priors - non-centered transpars> matrix[actor,4]:alpha <- compose_noncentered( sigma_actor , L_Rho_actor , z_actor ), transpars> matrix[block_id,4]:beta <- compose_noncentered( sigma_block , L_Rho_block , z_block ), matrix[4,actor]:z_actor ~ normal( 0 , 1 ), matrix[4,block_id]:z_block ~ normal( 0 , 1 ), # fixed priors g[tid] ~ normal(0,1), vector[4]:sigma_actor ~ dexp(1), cholesky_factor_corr[4]:L_Rho_actor ~ lkj_corr_cholesky( 2 ), vector[4]:sigma_block ~ dexp(1), cholesky_factor_corr[4]:L_Rho_block ~ lkj_corr_cholesky( 2 ) ) , data=dat , chains=4 , cores=4 , log_lik=TRUE )
50. Non-centered random chimps m14.3 <- ulam( alist( L ~ binomial(1,p),

logit(p) <- g[tid] + alpha[actor,tid] + beta[block_id,tid], # adaptive priors - non-centered transpars> matrix[actor,4]:alpha <- compose_noncentered( sigma_actor , L_Rho_actor , z_actor ), transpars> matrix[block_id,4]:beta <- compose_noncentered( sigma_block , L_Rho_block , z_block ), matrix[4,actor]:z_actor ~ normal( 0 , 1 ), matrix[4,block_id]:z_block ~ normal( 0 , 1 ), # fixed priors g[tid] ~ normal(0,1), vector[4]:sigma_actor ~ dexp(1), cholesky_factor_corr[4]:L_Rho_actor ~ lkj_corr_cholesky( 2 ), vector[4]:sigma_block ~ dexp(1), cholesky_factor_corr[4]:L_Rho_block ~ lkj_corr_cholesky( 2 ) ) , data=dat , chains=4 , cores=4 , log_lik=TRUE )
51. Non-centered random chimps   "%7&/563&4 */ \$07"3*"/\$& 200 400

600 800 1000 1200 1000 1500 2000 centered (default) non-centered (cholesky) 'ĶĴłĿĲ ƉƌƎ %JTUS TBNQMFT )Ǿ !! G OPODFOUFSFE QBSBNF DMBTTJĕFE WBSZJOH TMP (ǎǑǡǐ SFTQFDUJWFMZ FRVJWBMFOU JOGFSFODFT WFSTJPO TBNQMFT NVDI number of effective parameters
52. Random chimpanzees  "%7"/\$&% 7"3:*/( 4-01&4  8F DBO JOTQFDU

UIF TUBOEBSE EFWJBUJPO QBSBNFUFST UP HFU B TFOTF PG IPX BHHSFTTJWFMZ UIF WBSZJOH FČFDUT BSF CFJOH SFHVMBSJ[FE 3 DPEF  +- \$.ǿ (ǎǑǡǐ Ǣ  +/#ʙǏ Ǣ +-.ʙǿǫ.\$"(Ǿ/*-ǫǢǫ.\$"(Ǿ'*&ǫȀ Ȁ ( ) . ǒǡǒʉ ǖǑǡǒʉ )Ǿ !! #/ .\$"(Ǿ/*-ȁǎȂ ǎǡǐǖ ǍǡǑǖ ǍǡǕǍ ǏǡǏǑ ǖǍǓ ǎ .\$"(Ǿ/*-ȁǏȂ ǍǡǖǏ ǍǡǐǕ ǍǡǑǑ ǎǡǓǑ ǎǍǓǍ ǎ .\$"(Ǿ/*-ȁǐȂ ǎǡǕǓ Ǎǡǒǔ ǎǡǎǑ ǏǡǕǖ ǎǎǖǎ ǎ .\$"(Ǿ/*-ȁǑȂ ǎǡǒǖ ǍǡǓǓ ǍǡǕǓ ǏǡǕǎ ǎǎǑǕ ǎ .\$"(Ǿ'*&ȁǎȂ ǍǡǑǍ ǍǡǐǏ ǍǡǍǐ ǍǡǖǕ ǎǍǐǐ ǎ .\$"(Ǿ'*&ȁǏȂ ǍǡǑǑ ǍǡǐǓ ǍǡǍǑ ǎǡǎǍ ǖǑǑ ǎ .\$"(Ǿ'*&ȁǐȂ ǍǡǐǍ ǍǡǏǔ ǍǡǍǏ Ǎǡǔǖ ǎǓǍǓ ǎ .\$"(Ǿ'*&ȁǑȂ ǍǡǑǔ ǍǡǐǕ ǍǡǍǐ ǎǡǎǒ ǎǍǔǐ ǎ 8IJMF UIFTF BSF KVTU QPTUFSJPS NFBOT BOE UIF BNPVOU PG TISJOLBHF BWFSBHFT PWFS UIF FOUJSF QPTUFSJPS ZPV DBO HFU B TFOTF GSPN UIF TNBMM WBMVFT UIBU TISJOLBHF JT QSFUUZ BHHSFTTJWF IFSF FTQFDJBMMZ JO UIF DBTF PG UIF CMPDLT ćJT JT XIBU UBLFT UIF NPEFM GSPN  BDUVBM QBSBNFUFST UP  FČFDUJWF QBSBNFUFST BT NFBTVSFE CZ 8"*\$ PS 14*4-00JU BHSFFT JO UIJT DBTF  ćJT JT B HPPE FYBNQMF PG IPX WBSZJOH FČFDUT BEBQU UP UIF EBUB ćF PWFSĕUUJOH SJTL JT NVDI NJMEFS IFSF UIBO JU XPVME CF XJUI PSEJOBSZ ĕYFE FČFDUT *U DBO PG DPVSTF CF DIBMMFOHJOH UP EFĕOF BOE ĕU UIFTF NPEFMT #VU JG ZPV EPOU DIFDL GPS WBSJBUJPO JO TMPQFT ZPV NBZ OFWFS
53. Correlations Rho_actor[4,4] Rho_actor[4,3] Rho_actor[4,2] Rho_actor[4,1] Rho_actor[3,4] Rho_actor[3,3] Rho_actor[3,2] Rho_actor[3,1] Rho_actor[2,4]

Rho_actor[2,3] Rho_actor[2,2] Rho_actor[2,1] Rho_actor[1,4] Rho_actor[1,3] Rho_actor[1,2] Rho_actor[1,1] -1.0 -0.5 0.0 0.5 1.0 Value
54. proportion left lever 0 0.5 1 actor 1 actor 2

actor 3 actor 4 actor 5 actor 6 actor 7 R/N L/N R/P L/P 'ĶĴłĿĲ ƉƌƏ 1PTUFSJPS QSFEJDUJPOT JO CMBDL BHBJOTU UIF SBX EBUB JO CMVF GPS NPEFM (ǎǑǡǐ UIF DSPTTDMBTTJĕFE WBSZJOH FČFDUT NPEFM ćF MJOF TFH NFOUT BSF  DPNQBUJCJMJUZ JOUFSWBMT 0QFO DJSDMFT BSF USFBUNFOUT XJUIPVU B QBSUOFS 'JMMFE DJSDMFT BSF USFBUNFOUT XJUI B QBSUOFS ćF QSPTPDJBM MPDB UJPO BMUFSOBUFT SJHIUMFęSJHIUMFę BT MBCFMFE JO BDUPS  ȕ -2 / !*- ǿ % \$) ǿǎǣǔȀȁǶǏȂ Ȁ ȃ '\$) .ǿ ǿ%ǶǎȀȉǑʔǿǎǢǐȀǶ3* Ǣ +'ȁ%ǢǿǎǢǐȀȂ Ǣ '2ʙǏ Ǣ *'ʙ-)"\$Ǐ Ȁ  "%7"/\$&% 7"3:*/( 4-01&4 8F DBO JOTQFDU UIF TUBOEBSE EFWJBUJPO QBSBNFUFST UP HFU B TFOTF PG IP WBSZJOH FČFDUT BSF CFJOH SFHVMBSJ[FE +- \$.ǿ (ǎǑǡǐ Ǣ  +/#ʙǏ Ǣ +-.ʙǿǫ.\$"(Ǿ/*-ǫǢǫ.\$"(Ǿ'*&ǫȀ ( ) . ǒǡǒʉ ǖǑǡǒʉ )Ǿ !! #/ .\$"(Ǿ/*-ȁǎȂ ǎǡǐǖ ǍǡǑǖ ǍǡǕǍ ǏǡǏǑ ǖǍǓ ǎ .\$"(Ǿ/*-ȁǏȂ ǍǡǖǏ ǍǡǐǕ ǍǡǑǑ ǎǡǓǑ ǎǍǓǍ ǎ .\$"(Ǿ/*-ȁǐȂ ǎǡǕǓ Ǎǡǒǔ ǎǡǎǑ ǏǡǕǖ ǎǎǖǎ ǎ .\$"(Ǿ/*-ȁǑȂ ǎǡǒǖ ǍǡǓǓ ǍǡǕǓ ǏǡǕǎ ǎǎǑǕ ǎ .\$"(Ǿ'*&ȁǎȂ ǍǡǑǍ ǍǡǐǏ ǍǡǍǐ ǍǡǖǕ ǎǍǐǐ ǎ .\$"(Ǿ'*&ȁǏȂ ǍǡǑǑ ǍǡǐǓ ǍǡǍǑ ǎǡǎǍ ǖǑǑ ǎ .\$"(Ǿ'*&ȁǐȂ ǍǡǐǍ ǍǡǏǔ ǍǡǍǏ Ǎǡǔǖ ǎǓǍǓ ǎ