L17 Statistical Rethinking Winter 2019

L17 Statistical Rethinking Winter 2019

Lecture 17 of the Dec 2018 through March 2019 edition of Statistical Rethinking. Covers Chapter 14, varying slopes and other covariance models.

A0f2f64b2e58f3bfa48296fb9ed73853?s=128

Richard McElreath

February 18, 2019
Tweet

Transcript

  1. Adventures in Covariance Statistical Rethinking Winter 2019 Lecture 17 /

    Week 9
  2. Kinds of varying effects • Varying intercepts: means differ by

    cluster • Varying slopes: effects of predictors vary by cluster • Any parameter can be made into a varying effect • (1) split into vector of parameters by cluster • (2) define population distribution -6 -4 -2 0 2 4 6 -6 -4 -2 0 2 4 6 -6 -4 -2 0 2 4 6 -6 -4 -2 0 2 4 6
  3. Varying slopes • Why varying slopes? • drugs affect people

    differently • after school programs don’t work for everyone • not every unit has same relationship to predictor • variation is important, whether for intervention or inference • Average effect misleading? • Pooling, shrinkage, mnesia
  4. Population pooling • Major innovation with varying slopes is pooling

    across parameters (intercepts & slopes) • Features of units have correlation structure • Learn one feature —> info about other features • e.g. Higher intercepts associated with smaller slopes
  5. Café Robot • Robot programmed to visit cafés, order coffee,

    record wait time • Visits in morning and afternoon • Intercepts: avg morning wait • Slopes: avg difference btw afternoon and morning • Are intercepts and slopes related? • Yes => pooling across parameter types!   .6-5* 2 4 6 8 wait time (minutes) M A M A M A M A M A 2 4 6 8 wait time (minutes) M A M A M A M A M A Café A Café B
  6. Population of Cafés -3 -2 -1 0 1 2 3

    0.0 0.2 0.4 intercept Density -3 -2 -1 0 1 2 3 0.0 0.4 0.8 slope Density -3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3 intercept slope intercepts slopes population
  7. Population of Cafés • 2-dimensional Gaussian distribution • vector of

    means • variance-covariance matrix -3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3 intercept slope  .6-5*-&7&- .0%&-4 **  JT UIF NFBO JOUFSDFQU UIF XBJU JO UIF NPSOJOH "OE UIF WBMVF JO  JT ČFSFODF JO XBJU CFUXFFO BęFSOPPO BOE NPSOJOH BODFT BOE DPWBSJBODFT JT BSSBOHFE MJLF UIJT σ α σασβρ σασβρ σ β QUT JT σ α BOE UIF WBSJBODF JO TMPQFT JT σ β  ćFTF BSF GPVOE BMPOH UIF ćF PUIFS UXP FMFNFOUT PG UIF NBUSJY BSF UIF TBNF σασβρ ćJT JT UIF FSDFQUT BOE TMPQFT *UT KVTU UIF QSPEVDU PG UIF UXP TUBOEBSE EFWJBUJPOT intercepts variance slopes variance covariance correlation
  8. Simulated Cafés   .6-5*-&7&- .0%&-4 ** 2 3 4

    5 6 -2.0 -1.5 -1.0 -0.5 intercepts (a_cafe) slopes (b_cafe) 'ĶĴłĿIJ ƉƋƊ  DBGÏT TBN UJTUJDBM QPQVMBUJPO ćF UIF JOUFSDFQU BWFSBHF NPSO DBGF ćF WFSUJDBM BYJT JT EJČFSFODF CFUXFFO BęFSOP XBJU GPS FBDI DBGÏ ćF HSB UIF NVMUJWBSJBUF (BVTTJBO Q DFQUT BOE TMPQFT 20 cafés 5 days morning & afternoon 200 observations
  9. Varying slopes model   "%7&/563&4 */ $07"3*"/$& WBSZJOH JOUFSDFQUT

    ćJT JT UIF WBSZJOH TMPQFT NPEFM XJUI FYQMBOBUJPO UP GPMMPX 8J ∼ /PSNBM(µJ, σ) µJ = αİĮij˦[J] + βİĮij˦[J] "J αİĮij˦ βİĮij˦ ∼ .7/PSNBM α β , 4 [population o 4 = σα   σβ 3 σα   σβ [construct co α ∼ /PSNBM(, ) [prior for a β ∼ /PSNBM(−, .) [prior fo σ ∼ &YQPOFOUJBM() [prior std σα ∼ &YQPOFOUJBM() [prior stddev a σβ ∼ &YQPOFOUJBM() [prior stdde 3 ∼ -,+DPSS() [prior for c ćF MJLFMJIPPE BOE MJOFBS NPEFM OFFE OP FYQMBOBUJPO BU UIJT QPJOU JO UIF CPPL
  10. WBSZJOH JOUFSDFQUT ćJT JT UIF WBSZJOH TMPQFT NPEFM XJUI FYQMBO

    8J ∼ /PSNBM(µJ, σ) µJ = αİĮij˦[J] + βİĮij˦[J] "J αİĮij˦ βİĮij˦ ∼ .7/PSNBM α β , 4 4 = σα   σβ 3 σα   σβ α ∼ /PSNBM(, ) β ∼ /PSNBM(−, .) σ ∼ &YQPOFOUJBM() σα ∼ &YQPOFOUJBM() σβ ∼ &YQPOFOUJBM() 3 ∼ -,+DPSS() ćF MJLFMJIPPE BOE MJOFBS NPEFM OFFE OP FYQMBOBUJPO BU UIJT QPJO MJOF XIJDI EFĕOFT UIF QPQVMBUJPO PG WBSZJOH JOUFSDFQUT BOE TMPQ varying intercepts varying slopes
  11. WBSZJOH JOUFSDFQUT ćJT JT UIF WBSZJOH TMPQFT NPEFM XJUI FYQMBO

    8J ∼ /PSNBM(µJ, σ) µJ = αİĮij˦[J] + βİĮij˦[J] "J αİĮij˦ βİĮij˦ ∼ .7/PSNBM α β , 4 4 = σα   σβ 3 σα   σβ α ∼ /PSNBM(, ) β ∼ /PSNBM(−, .) σ ∼ &YQPOFOUJBM() σα ∼ &YQPOFOUJBM() σβ ∼ &YQPOFOUJBM() 3 ∼ -,+DPSS() ćF MJLFMJIPPE BOE MJOFBS NPEFM OFFE OP FYQMBOBUJPO BU UIJT QPJO MJOF XIJDI EFĕOFT UIF QPQVMBUJPO PG WBSZJOH JOUFSDFQUT BOE TMPQ multivariate prior
  12. WBSZJOH JOUFSDFQUT ćJT JT UIF WBSZJOH TMPQFT NPEFM XJUI FYQMBO

    8J ∼ /PSNBM(µJ, σ) µJ = αİĮij˦[J] + βİĮij˦[J] "J αİĮij˦ βİĮij˦ ∼ .7/PSNBM α β , 4 4 = σα   σβ 3 σα   σβ α ∼ /PSNBM(, ) β ∼ /PSNBM(−, .) σ ∼ &YQPOFOUJBM() σα ∼ &YQPOFOUJBM() σβ ∼ &YQPOFOUJBM() 3 ∼ -,+DPSS() ćF MJLFMJIPPE BOE MJOFBS NPEFM OFFE OP FYQMBOBUJPO BU UIJT QPJO MJOF XIJDI EFĕOFT UIF QPQVMBUJPO PG WBSZJOH JOUFSDFQUT BOE TMPQ pop avg intercept pop avg slope covariance matrix
  13. Covariance matrix shuffle • m-by-m covariance matrix requires estimating •

    m standard deviations (or variances) • (m2 – m)/2 correlations (for covariances) • total of m(m + 1)/2 parameters • Several ways specify priors • Conjugate: inverse-Wishart (inv_wishart) • inverse-Wishart cannot pull apart stddev and correlations • Better to decompose: α ∼ /PSNBM(, ) βN ∼ /PSNBM(, ) αK ∼ /PSNBM(, σ) K = ... σ ∼ $BVDIZ(, ) "JK ∼ #JOPNJBM(OJ, QJK) MPHJU QJK = α + αK + (βN + βNK)NJK α ∼ /PSNBM(, ) βN ∼ /PSNBM(, ) αK βNK ∼ .7/PSNBM   , Σ K = ... Σ = σ α ρσα σβ ρσα σβ σ β = σα   σβ  ρ ρ  σα   σβ = 434 S R {
  14. Covariance matrix shuffle • m-by-m covariance matrix requires estimating •

    m standard deviations (or variances) • (m2 – m)/2 correlations (for covariances) • total of m(m + 1)/2 parameters • Several ways specify priors • Conjugate: inverse-Wishart (inv_wishart) • inverse-Wishart cannot pull apart stddev and correlations • Better to decompose: α ∼ /PSNBM(, ) βN ∼ /PSNBM(, ) αK ∼ /PSNBM(, σ) K = ... σ ∼ $BVDIZ(, ) "JK ∼ #JOPNJBM(OJ, QJK) MPHJU QJK = α + αK + (βN + βNK)NJK α ∼ /PSNBM(, ) βN ∼ /PSNBM(, ) αK βNK ∼ .7/PSNBM   , Σ K = ... Σ = σ α ρσα σβ ρσα σβ σ β = σα   σβ  ρ ρ  σα   σβ = 434 S R {
  15. Matrixes are nice • Matrix algebra just shortcuts for working

    with lists of numbers • A few simple rules • Can you make an omelet? You can multiply matrixes.
  16. a b c d A B C D = αıIJĽŁ[J]

    + βıIJĽŁ[J] NJ ∼ .7/PSNBM α β , 4 = σα   σβ 3 σα   σβ ∼ /PSNBM(, ) βıIJĽŁ β 4 = σα   σβ 3 σ  α ∼ /PSNBM(, ) β ∼ /PSNBM(, ) (σα, σβ) ∼ )BMG$BVDIZ(, ) MPHJU(QJ) = αıIJĽŁ[J] + β αıIJĽŁ βıIJĽŁ ∼ .7/PSNBM 4 = σα   σβ α ∼ /PSNBM(, Ł β 4 = σα   σβ 3 σα  α ∼ /PSNBM(, ) β ∼ /PSNBM(, ) β) ∼ )BMG$BVDIZ(, )
  17. a b c d A B C D = αıIJĽŁ[J]

    + βıIJĽŁ[J] NJ ∼ .7/PSNBM α β , 4 = σα   σβ 3 σα   σβ ∼ /PSNBM(, ) βıIJĽŁ β 4 = σα   σβ 3 σ  α ∼ /PSNBM(, ) β ∼ /PSNBM(, ) (σα, σβ) ∼ )BMG$BVDIZ(, ) MPHJU(QJ) = αıIJĽŁ[J] + β αıIJĽŁ βıIJĽŁ ∼ .7/PSNBM 4 = σα   σβ α ∼ /PSNBM(, Ł β 4 = σα   σβ 3 σα  α ∼ /PSNBM(, ) β ∼ /PSNBM(, ) β) ∼ )BMG$BVDIZ(, )
  18. a b c d A B C D = αıIJĽŁ[J]

    + βıIJĽŁ[J] NJ ∼ .7/PSNBM α β , 4 = σα   σβ 3 σα   σβ ∼ /PSNBM(, ) βıIJĽŁ β 4 = σα   σβ 3 σ  α ∼ /PSNBM(, ) β ∼ /PSNBM(, ) (σα, σβ) ∼ )BMG$BVDIZ(, ) MPHJU(QJ) = αıIJĽŁ[J] + β αıIJĽŁ βıIJĽŁ ∼ .7/PSNBM 4 = σα   σβ α ∼ /PSNBM(, Ł β 4 = σα   σβ 3 σα  α ∼ /PSNBM(, ) β ∼ /PSNBM(, ) β) ∼ )BMG$BVDIZ(, )
  19. a b c d A B Aa + Bc C

    D = αıIJĽŁ[J] + βıIJĽŁ[J] NJ ∼ .7/PSNBM α β , 4 = σα   σβ 3 σα   σβ ∼ /PSNBM(, ) βıIJĽŁ β 4 = σα   σβ 3 σ  α ∼ /PSNBM(, ) β ∼ /PSNBM(, ) (σα, σβ) ∼ )BMG$BVDIZ(, ) MPHJU(QJ) = αıIJĽŁ[J] + β αıIJĽŁ βıIJĽŁ ∼ .7/PSNBM 4 = σα   σβ α ∼ /PSNBM(, Ł β 4 = σα   σβ 3 σα  α ∼ /PSNBM(, ) β ∼ /PSNBM(, ) β) ∼ )BMG$BVDIZ(, )
  20. a b c d A B Aa + Bc Ab

    + Bd C D Ca + Dc Cb + Dd = αıIJĽŁ[J] + βıIJĽŁ[J] NJ ∼ .7/PSNBM α β , 4 = σα   σβ 3 σα   σβ ∼ /PSNBM(, ) βıIJĽŁ β 4 = σα   σβ 3 σ  α ∼ /PSNBM(, ) β ∼ /PSNBM(, ) (σα, σβ) ∼ )BMG$BVDIZ(, ) MPHJU(QJ) = αıIJĽŁ[J] + β αıIJĽŁ βıIJĽŁ ∼ .7/PSNBM 4 = σα   σβ α ∼ /PSNBM(, Ł β 4 = σα   σβ 3 σα  α ∼ /PSNBM(, ) β ∼ /PSNBM(, ) β) ∼ )BMG$BVDIZ(, )
  21. Matrixes are nice N αK βNK ∼ .7/PSNBM  

    , Σ K = ... Σ = σ α ρσα σβ ρσα σβ σ β = σα   σβ  ρ ρ  σα   σβ = 434 S
  22. Matrixes are nice N αK βNK ∼ .7/PSNBM  

    , Σ K = ... Σ = σ α ρσα σβ ρσα σβ σ β = σα   σβ  ρ ρ  σα   σβ = 434 S N αK ∼ /PSNBM(, σ) K = ... σ ∼ $BVDIZ(, ) "JK ∼ #JOPNJBM(OJ, QJK) JU QJK = α + αK + (βN + βNK)NJK α ∼ /PSNBM(, ) βN ∼ /PSNBM(, ) αK NK ∼ .7/PSNBM   , Σ K = ... ρσα σβ σβ σ β = σα   σβ  ρ ρ  σα   σβ = 434 "JK ∼ #JOPNJBM(OJ, QJK) MPHJU QJK = α + αK + (βN + βNK)NJK α ∼ /PSNBM(, ) βN ∼ /PSNBM(, ) αK βNK ∼ .7/PSNBM   , Σ K = ... σ α ρσα σβ ρσα σβ σ β = σα   σβ  ρ ρ  σα   σβ = 434 ?
  23. Matrixes are nice N αK βNK ∼ .7/PSNBM  

    , Σ K = ... Σ = σ α ρσα σβ ρσα σβ σ β = σα   σβ  ρ ρ  σα   σβ = 434 S N αK ∼ /PSNBM(, σ) K = ... σ ∼ $BVDIZ(, ) "JK ∼ #JOPNJBM(OJ, QJK) JU QJK = α + αK + (βN + βNK)NJK α ∼ /PSNBM(, ) βN ∼ /PSNBM(, ) αK NK ∼ .7/PSNBM   , Σ K = ... ρσα σβ σβ σ β = σα   σβ  ρ ρ  σα   σβ = 434 "JK ∼ #JOPNJBM(OJ, QJK) MPHJU QJK = α + αK + (βN + βNK)NJK α ∼ /PSNBM(, ) βN ∼ /PSNBM(, ) αK βNK ∼ .7/PSNBM   , Σ K = ... σ α ρσα σβ ρσα σβ σ β = σα   σβ  ρ ρ  σα   σβ = 434 σα ρσα ρσβ σβ
  24. Matrixes are nice N αK βNK ∼ .7/PSNBM  

    , Σ K = ... Σ = σ α ρσα σβ ρσα σβ σ β = σα   σβ  ρ ρ  σα   σβ = 434 S "JK ∼ #JOPNJBM(OJ, QJK) MPHJU QJK = α + αK + (βN + βNK)NJK α ∼ /PSNBM(, ) βN ∼ /PSNBM(, ) αK βNK ∼ .7/PSNBM   , Σ K = ... Σ = σ α ρσα σβ ρσα σβ σ β = σα   σβ  ρ ρ  σα   σβ = 434 σα ρσα ρσβ σβ ?
  25. Matrixes are nice N αK βNK ∼ .7/PSNBM  

    , Σ K = ... Σ = σ α ρσα σβ ρσα σβ σ β = σα   σβ  ρ ρ  σα   σβ = 434 S "JK ∼ #JOPNJBM(OJ, QJK) MPHJU QJK = α + αK + (βN + βNK)NJK α ∼ /PSNBM(, ) βN ∼ /PSNBM(, ) αK βNK ∼ .7/PSNBM   , Σ K = ... Σ = σ α ρσα σβ ρσα σβ σ β = σα   σβ  ρ ρ  σα   σβ = 434 σα ρσα ρσβ σβ σα ρσα ρσβ σβ σ α ρσα σβ ρσα σβ σ β
  26. WBSZJOH JOUFSDFQUT ćJT JT UIF WBSZJOH TMPQFT NPEFM XJUI FYQMBO

    8J ∼ /PSNBM(µJ, σ) µJ = αİĮij˦[J] + βİĮij˦[J] "J αİĮij˦ βİĮij˦ ∼ .7/PSNBM α β , 4 4 = σα   σβ 3 σα   σβ α ∼ /PSNBM(, ) β ∼ /PSNBM(−, .) σ ∼ &YQPOFOUJBM() σα ∼ &YQPOFOUJBM() σβ ∼ &YQPOFOUJBM() 3 ∼ -,+DPSS() ćF MJLFMJIPPE BOE MJOFBS NPEFM OFFE OP FYQMBOBUJPO BU UIJT QPJO MJOF XIJDI EFĕOFT UIF QPQVMBUJPO PG WBSZJOH JOUFSDFQUT BOE TMPQ build cov matrix
  27. WBSZJOH JOUFSDFQUT ćJT JT UIF WBSZJOH TMPQFT NPEFM XJUI FYQMBO

    8J ∼ /PSNBM(µJ, σ) µJ = αİĮij˦[J] + βİĮij˦[J] "J αİĮij˦ βİĮij˦ ∼ .7/PSNBM α β , 4 4 = σα   σβ 3 σα   σβ α ∼ /PSNBM(, ) β ∼ /PSNBM(−, .) σ ∼ &YQPOFOUJBM() σα ∼ &YQPOFOUJBM() σβ ∼ &YQPOFOUJBM() 3 ∼ -,+DPSS() ćF MJLFMJIPPE BOE MJOFBS NPEFM OFFE OP FYQMBOBUJPO BU UIJT QPJO MJOF XIJDI EFĕOFT UIF QPQVMBUJPO PG WBSZJOH JOUFSDFQUT BOE TMPQ fixed (non-adaptive) priors
  28. WBSZJOH JOUFSDFQUT ćJT JT UIF WBSZJOH TMPQFT NPEFM XJUI FYQMBO

    8J ∼ /PSNBM(µJ, σ) µJ = αİĮij˦[J] + βİĮij˦[J] "J αİĮij˦ βİĮij˦ ∼ .7/PSNBM α β , 4 4 = σα   σβ 3 σα   σβ α ∼ /PSNBM(, ) β ∼ /PSNBM(−, .) σ ∼ &YQPOFOUJBM() σα ∼ &YQPOFOUJBM() σβ ∼ &YQPOFOUJBM() 3 ∼ -,+DPSS() ćF MJLFMJIPPE BOE MJOFBS NPEFM OFFE OP FYQMBOBUJPO BU UIJT QPJO MJOF XIJDI EFĕOFT UIF QPQVMBUJPO PG WBSZJOH JOUFSDFQUT BOE TMPQ correlation matrix prior
  29. LKJ Correlation prior • After Lewandowski, Kurowicka, and Joe (LKJ)

    2009 • One parameter, eta, specifies concentration or dispersion from identity matrix (zero correlations) • eta = 1, uniform correlation matrices • eta > 1, stomps on extreme correlations • eta < 1, elevates extreme correlations  7"3:*/( 4-0 -1.0 -0.5 0.0 0.5 1.0 0.0 0.2 0.4 0.6 0.8 1.0 correlation Density eta=4 eta=2 eta=1 PG JU BT B SFHVMBSJ[JOH QSJPS GPS DPSSFMBUJPOT DPOUSPMT IPX TLFQUJDBM UIF QSJPS JT PG MBSHF
  30. Varying slopes estimation m14.1 <- ulam( alist( wait ~ normal(

    mu , sigma ), mu <- a_cafe[cafe] + b_cafe[cafe]*afternoon, c(a_cafe,b_cafe)[cafe] ~ multi_normal( c(a,b) , Rho , sigma_cafe ), a ~ normal(5,2), b ~ normal(-1,0.5), sigma_cafe ~ exponential(1), sigma ~ exponential(1), Rho ~ lkj_corr(2) ) , data=d , chains=4 , cores=4 )
  31. Varying slopes estimation m14.1 <- ulam( alist( wait ~ normal(

    mu , sigma ), mu <- a_cafe[cafe] + b_cafe[cafe]*afternoon, c(a_cafe,b_cafe)[cafe] ~ multi_normal( c(a,b) , Rho , sigma_cafe ), a ~ normal(5,2), b ~ normal(-1,0.5), sigma_cafe ~ exponential(1), sigma ~ exponential(1), Rho ~ lkj_corr(2) ) , data=d , chains=4 , cores=4 )
  32. Varying slopes estimation m14.1 <- ulam( alist( wait ~ normal(

    mu , sigma ), mu <- a_cafe[cafe] + b_cafe[cafe]*afternoon, c(a_cafe,b_cafe)[cafe] ~ multi_normal( c(a,b) , Rho , sigma_cafe ), a ~ normal(5,2), b ~ normal(-1,0.5), sigma_cafe ~ exponential(1), sigma ~ exponential(1), Rho ~ lkj_corr(2) ) , data=d , chains=4 , cores=4 )
  33. Varying slopes estimation m14.1 <- ulam( alist( wait ~ normal(

    mu , sigma ), mu <- a_cafe[cafe] + b_cafe[cafe]*afternoon, c(a_cafe,b_cafe)[cafe] ~ multi_normal( c(a,b) , Rho , sigma_cafe ), a ~ normal(5,2), b ~ normal(-1,0.5), sigma_cafe ~ exponential(1), sigma ~ exponential(1), Rho ~ lkj_corr(2) ) , data=d , chains=4 , cores=4 )
  34. Posterior correlation QFFL BU UIF SBX 4UBO DPEF XJUI ./)*

    ǿ(ǎǑǡǎȀ ćF OBNF (0'/$Ǿ)*-(' J VTFT JO JUT SBX DPEF ćF TJNJMBS 3 GVODUJPOT BSF (1)*-( BOE (1)*-(Ǐ /PX JOTUFBE PG MPPLJOH BU UIF NBSHJOBM QPTUFSJPS EJTUSJCVUJPOT JO UIF +- $. HP TUSBJHIU UP JOTQFDUJOH UIF QPTUFSJPS EJTUSJCVUJPO PG WBSZJOH FČFDUT 'JSTU MFUT F QPTUFSJPS DPSSFMBUJPO CFUXFFO JOUFSDFQUT BOE TMPQFT 3 DPEF  +*./ ʚǶ 3/-/ǡ.(+' .ǿ(ǎǑǡǎȀ  ).ǿ +*./ɶ#*ȁǢǎǢǏȂ Ȁ ćF SFTVMU JT TIPXO JO 'ĶĴłĿIJ Ɖƌƌ XJUI TPNF BEEJUJPOBM EFDPSBUJPO BOE UIF UIF QSJPS GPS DPNQBSJTPO ćF CMVF EFOTJUZ JT UIF QPTUFSJPS EJTUSJCVUJPO PG UIF CFUXFFO JOUFSDFQUT BOE TMPQFT ćF QPTUFSJPS JT DPODFOUSBUFE PO OFHBUJWF WBMVFT NPEFM IBT MFBSOFE UIF OFHBUJWF DPSSFMBUJPO ZPV DBO TFF JO 'ĶĴłĿIJ ƉƌƊ ,FFQ JO UIF NPEFM EJE OPU HFU UP TFF UIF USVF JOUFSDFQUT BOE TMPQFT "MM JU IBE UP XPSL GS PCTFSWFE XBJU UJNFT JO NPSOJOH BOE BęFSOPPO *G ZPV BSF DVSJPVT BCPVU UIF JNQBDU PG UIF QSJPS UIFO ZPV TIPVME DIBOHF UI SFQFBU UIF BOBMZTJT * TVHHFTU USZJOH B ĘBU QSJPS -,+DPSS  BOE UIFO B NPSF TU MBSJ[JOH QSJPS MJLF -,+DPSS  PS -,+DPSS   /FYU DPOTJEFS UIF TISJOLBHF ćF NVMUJMFWFM NPEFM FTUJNBUFT QPTUFSJPS EJTUS JOUFSDFQUT BOE TMPQFT PG FBDI DBGÏ ćF JOGFSSFE DPSSFMBUJPO CFUXFFO UIFTF WBS XBT VTFE UP QPPM JOGPSNBUJPO BDSPTT UIFN ćJT JT KVTU BT UIF JOGFSSFE WBSJBUJPO UFSDFQUT QPPMT JOGPSNBUJPO BNPOH UIFN BT XFMM BT IPX UIF JOGFSSFE WBSJBUJPO BN QPPMT JOGPSNBUJPO BNPOH UIFN "MM UPHFUIFS UIF WBSJBODFT BOE DPSSFMBUJPO E GFSSFE NVMUJWBSJBUF (BVTTJBO QSJPS GPS UIF WBSZJOH FČFDUT "OE UIJT QSJPS MFBSO   "%7&/563&4 */ $07"3*"/$& -1.0 -0.5 0.0 0.5 1.0 0.0 0.5 1.0 1.5 2.0 2.5 correlation Density posterior prior 'ĶĴłĿIJ Ɖƌƌ 1PTUFSJP SFMBUJPO CFUXFFO JOUFS 1PTUFSJPS EJTUSJCVUJPO BCMZ CFMPX [FSP %BT UIF -,+DPSS  EFOTJUZ
  35. Posterior shrinkage  "%7&/56 -1.0 -0.5 0.0 0.5 1.0 0.0

    0.5 1.0 1.5 2.0 2.5 correlation Density posterior prior QFFL BU UIF SBX 4UBO DPEF XJUI ./)* ǿ   "%7&/563&4 */ $07"3*"/$& 2 3 4 5 6 -2.0 -1.5 -1.0 -0.5 intercept slope 2 3 4 5 6 1 2 3 4 morning wait afternoon wait 'ĶĴłĿIJ Ɖƌƍ 4ISJOLBHF JO UXP EJNFOTJPOT -Fę 3BX VOQPPMFE JOUFSDFQUT BOE TMPQFT ĕMMFE CMVF DPNQBSFE UP QBSUJBMMZ QPPMFE QPTUFSJPS NFBOT PQFO DJSDMFT  ćF HSBZ DPOUPVST TIPX UIF JOGFSSFE QPQVMBUJPO PG WBSZJOH FČFDUT
  36. 2 3 4 5 6 -2.0 -1.5 -1.0 -0.5 intercept

    slope 1 2 3 4 afternoon wait 10 30 50 80 99
  37. parameter scale outcome scale   "%7&/563&4 */ $07"3*"/$& 2

    3 4 5 6 -2.0 -1.5 -1.0 -0.5 intercept slope 2 3 4 5 6 1 2 3 4 morning wait afternoon wait 'ĶĴłĿIJ Ɖƌƍ 4ISJOLBHF JO UXP EJNFOTJPOT -Fę 3BX VOQPPMFE JOUFSDFQUT BOE TMPQFT ĕMMFE CMVF DPNQBSFE UP QBSUJBMMZ QPPMFE QPTUFSJPS NFBOT PQFO   "%7&/563&4 */ $07"3*"/$& 2 3 4 5 6 -2.0 -1.5 -1.0 -0.5 intercept slope 2 3 4 5 6 1 2 3 4 morning wait afternoon wait 'ĶĴłĿIJ Ɖƌƍ 4ISJOLBHF JO UXP EJNFOTJPOT -Fę 3BX VOQPPMFE JOUFSDFQUT BOE TMPQFT ĕMMFE CMVF DPNQBSFE UP QBSUJBMMZ QPPMFE QPTUFSJPS NFBOT PQFO
  38. Multi-dimensional shrinkage • Joint distribution of varying effects pools information

    across intercepts & slopes • Correlation btw effects => shrinkage in one dimension induces shrinkage in others • Improved accuracy, just like varying intercepts   "%7&/563&4 */ 2 3 4 5 6 -2.0 -1.5 -1.0 -0.5 intercept slope afternoon wait 'ĶĴłĿIJ Ɖƌƍ 4ISJOLBHF JO UXP EJNFOTJP BOE TMPQFT ĕMMFE CMVF DPNQBSFE UP QBSU DJSDMFT  ćF HSBZ DPOUPVST TIPX UIF JOGF 3JHIU ćF TBNF FTUJNBUFT PO UIF PVUDPN
  39. Many effects, many clusters • Let’s try something more ambitious

    • Chimpanzees data • 4 treatments • Each can vary by actor or block  "%7&/563&4 */ $07"3*"/$& BOE JUT MJOFBS NPEFM MPPLT MJLF -J ∼ #JOPNJBM(, QJ) MPHJU(QJ) = γŁĶı[J] + αĮİŁļĿ[J],ŁĶı[J] + βįĹļİĸ[J],ŁĶı[J] NPEFM GPS MPHJU(QJ) DPOUBJOT BO BWFSBHF MPHPEET GPS FBDI USFBUNFOU γŁ BDI BDUPS JO FBDI USFBUNFOU αĮİŁļĿ[J],ŁĶı[J] BOE ĕOBMMZ BO FČFDU GPS FBDI C NFOU βįĹļİĸ[J],ŁĶı[J]  ćJT JT FTTFOUJBMMZ BO JOUFSBDUJPO NPEFM UIBU BMMPXT UIF F NFOU UP WBSZ CZ FBDI BDUPS BOE FBDI CMPDL ćJT JT UP TBZ UIBU UIF BWFSBHF USF Mean treatment effects Each actor in each treatment Each block in each treatment
  40. Covariance matrixes • One matrix for each cluster (actor, block)

    • 7*4 = 28 actor parameters • 6*4 = 24 block parameters • Each covariance matrix: 6 correlations + 4 sigmas • Total: 28 + 24 + 20 + 4 = 76 parameters GPS MPHJU(QJ) DPOUBJOT BO BWFSBHF MPHPEET GPS FBDI USFBUNFOU γŁĶı[J] BO PS JO FBDI USFBUNFOU αĮİŁļĿ[J],ŁĶı[J] BOE ĕOBMMZ BO FČFDU GPS FBDI CMPDL JO Ĺļİĸ[J],ŁĶı[J]  ćJT JT FTTFOUJBMMZ BO JOUFSBDUJPO NPEFM UIBU BMMPXT UIF FČFDU PG WBSZ CZ FBDI BDUPS BOE FBDI CMPDL ćJT JT UP TBZ UIBU UIF BWFSBHF USFBUNFOU CMPDL BOE UIF FBDI JOEJWJEVBM DIJNQBO[FF DBO BMTP SFTQPOE BDSPTT CMPDLT EJČFSFOUMZ ćJT ZJFMET B UPUBM PG +×+× =  QBSBNFUFST 1PPMJOH FSF NF QPPMJOH ćF OFYU QBSU PG UIF NPEFM BSF UIF BEBQUJWF QSJPST 4JODF UFS UZQFT BDUPST BOE CMPDLT UIFSF BSF UXP NVMUJWBSJBUF (BVTTJBO QSJPST (BVTTJBO QSJPST BSF CPUI EJNFOTJPOBM JO UIJT FYBNQMF CFDBVTF UIFSF BSF JO HFOFSBM ZPV DBO DIPPTF UP IBWF EJČFSFOU WBSZJOH FČFDUT JO EJČFSFOU F BSF UIF UXP QSJPST JO UIJT DBTF ⎡ ⎢ ⎢ ⎣ αK, αK, αK, αK, ⎤ ⎥ ⎥ ⎦ ∼ .7/PSNBM ⎛ ⎜ ⎜ ⎝ ⎡ ⎢ ⎢ ⎣     ⎤ ⎥ ⎥ ⎦ , 4ĮİŁļĿ ⎞ ⎟ ⎟ ⎠ ⎡ ⎢ ⎢ ⎣ βK, βK, βK, βK, ⎤ ⎥ ⎥ ⎦ ∼ .7/PSNBM ⎛ ⎜ ⎜ ⎝ ⎡ ⎢ ⎢ ⎣     ⎤ ⎥ ⎥ ⎦ , 4įĹļİĸ ⎞ ⎟ ⎟ ⎠ 1 ρ12 ρ13 ρ14 ρ12 1 ρ23 ρ24 ρ13 ρ23 1 ρ34 ρ14 ρ24 ρ34 1
  41. m14.2 <- ulam( alist( L ~ binomial(1,p), logit(p) <- g[tid]

    + alpha[actor,tid] + beta[block_id,tid], # adaptive priors vector[4]:alpha[actor] ~ multi_normal(0,Rho_actor,sigma_actor), vector[4]:beta[block_id] ~ multi_normal(0,Rho_block,sigma_block), # fixed priors g[tid] ~ dnorm(0,1), sigma_actor ~ dexp(1), Rho_actor ~ dlkjcorr(4), sigma_block ~ dexp(1), Rho_block ~ dlkjcorr(4) ) , data=dat , chains=4 , cores=4 ) Maximally random chimps
  42. m14.2 <- ulam( alist( L ~ binomial(1,p), logit(p) <- g[tid]

    + alpha[actor,tid] + beta[block_id,tid], # adaptive priors vector[4]:alpha[actor] ~ multi_normal(0,Rho_actor,sigma_actor), vector[4]:beta[block_id] ~ multi_normal(0,Rho_block,sigma_block), # fixed priors g[tid] ~ dnorm(0,1), sigma_actor ~ dexp(1), Rho_actor ~ dlkjcorr(4), sigma_block ~ dexp(1), Rho_block ~ dlkjcorr(4) ) , data=dat , chains=4 , cores=4 ) Maximally random chimps
  43. m14.2 <- ulam( alist( L ~ binomial(1,p), logit(p) <- g[tid]

    + alpha[actor,tid] + beta[block_id,tid], # adaptive priors vector[4]:alpha[actor] ~ multi_normal(0,Rho_actor,sigma_actor), vector[4]:beta[block_id] ~ multi_normal(0,Rho_block,sigma_block), # fixed priors g[tid] ~ dnorm(0,1), sigma_actor ~ dexp(1), Rho_actor ~ dlkjcorr(4), sigma_block ~ dexp(1), Rho_block ~ dlkjcorr(4) ) , data=dat , chains=4 , cores=4 ) Maximally random chimps
  44. Divergence, my old friend

  45. Non-centered form • Non-centered easy for uni-variate priors: Just factor

    out sigma • But now need to factor correlation matrix out of the prior and smuggle into linear model • Can be done: Cholesky factor André-Louis Cholesky (1875–1918)
  46. I. NOTICES SCIENTIFIQUES Commandant BENOIT '. NOTE SUR UNE IVIETHODE

    DE RESOLUTION DES EflUA- TIONS NOR~IALES PROVENANT DE L'APPLICATION BE LA METHODE DES IVIOINDRES CARRIES A UN SYSTEME D'EQUATIONS LINP.AIRES EN NOI~IBRE INF~.RIEUR A CELUI DES INCONNUES. -- APPLICATION DE LA MP~- THODE A LA RF.SOLUTION D'UN Su DEFINI D'P.QUATIONS LINEAIRES. (Procdd~ du Commandant CnoLl~S~'~-.) Le Commandant d' krtillerie Cholesky, du Service g6ographi- que de l':krm6e, tu6 pendant la grande guerre, a imagin6, au cours de recherches sur la compensation des r6seaux g6od6si- ques, un proc6d6 tr~s ing,~nieux de r6solution des 6quations dites normales, obtenues par application de la m6thode des moindres carr6s '~ des 6quations lin6aires en hombre inf6rieur h celui des inconnues, lien a conclu une m6thode g6n6rale de r6solution des 6quations lin6aires. Nous suivrons, pour la d6monstration de cette m6thode, la progression m6me qui a ser~i au Commandant Cholesky pour l'imaginer. x. De l'~krtillerie coloniale, ancien officier g6od6sien au Service g~ographique de l'Xrm@ et au Service g6ographique de l'Indo-Chine, Membre du Comit6 national franc ais de Gdoddsie et G6oph3-sique. 2. Sur le Commandant Cholesky, tu6 h l'ennemi le 3~ ao6t I9~8, voir la notice biographique insdr6e dans le volume du B~dlelin g~o- d~sique de x922 intitul6 : Unior~ 9~od~siq~le et ggophysique inlernalio- hale, Premibre Assemblde 9~n~rale, Rome, mai 1929, Seclion de G~o- dgsie, Toulouse, Privat, I9~, in-8 ~ 2~I p., pp. x59 ~ x6i. 5 D'EQUATIONS LINP.AIRES EN NOI~IBRE INF~.RIEUR A CELUI DES INCONNUES. -- APPLICATION DE LA MP~- THODE A LA RF.SOLUTION D'UN Su DEFINI D'P.QUATIONS LINEAIRES. (Procdd~ du Commandant CnoLl~S~'~-.) Le Commandant d' krtillerie Cholesky, du Service g6ographi- que de l':krm6e, tu6 pendant la grande guerre, a imagin6, au cours de recherches sur la compensation des r6seaux g6od6si- ques, un proc6d6 tr~s ing,~nieux de r6solution des 6quations dites normales, obtenues par application de la m6thode des moindres carr6s '~ des 6quations lin6aires en hombre inf6rieur h celui des inconnues, lien a conclu une m6thode g6n6rale de r6solution des 6quations lin6aires. Nous suivrons, pour la d6monstration de cette m6thode, la progression m6me qui a ser~i au Commandant Cholesky pour l'imaginer. x. De l'~krtillerie coloniale, ancien officier g6od6sien au Service g~ographique de l'Xrm@ et au Service g6ographique de l'Indo-Chine, Membre du Comit6 national franc ais de Gdoddsie et G6oph3-sique.
  47. Cholesky magic N <- 1e4 sigma1 <- 2 sigma2 <-

    0.5 rho <- 0.6 z1 <- rnorm( N ) z2 <- rnorm( N ) a1 <- z1 * sigma1 a2 <- ( rho*z1 + sqrt( 1-rho^2 )*z2 )*sigma2 > cor(z1,z2) [1] -0.0005542644 > cor(a1,a2) [1] 0.5999334 > sd(a1) [1] 1.997036 > sd(a2) [1] 0.4989456
  48. Non-centered random chimps m14.3 <- ulam( alist( L ~ binomial(1,p),

    logit(p) <- g[tid] + alpha[actor,tid] + beta[block_id,tid], # adaptive priors - non-centered transpars> matrix[actor,4]:alpha <- compose_noncentered( sigma_actor , L_Rho_actor , z_actor ), transpars> matrix[block_id,4]:beta <- compose_noncentered( sigma_block , L_Rho_block , z_block ), matrix[4,actor]:z_actor ~ normal( 0 , 1 ), matrix[4,block_id]:z_block ~ normal( 0 , 1 ), # fixed priors g[tid] ~ normal(0,1), vector[4]:sigma_actor ~ dexp(1), cholesky_factor_corr[4]:L_Rho_actor ~ lkj_corr_cholesky( 2 ), vector[4]:sigma_block ~ dexp(1), cholesky_factor_corr[4]:L_Rho_block ~ lkj_corr_cholesky( 2 ) ) , data=dat , chains=4 , cores=4 , log_lik=TRUE )
  49. Non-centered random chimps m14.3 <- ulam( alist( L ~ binomial(1,p),

    logit(p) <- g[tid] + alpha[actor,tid] + beta[block_id,tid], # adaptive priors - non-centered transpars> matrix[actor,4]:alpha <- compose_noncentered( sigma_actor , L_Rho_actor , z_actor ), transpars> matrix[block_id,4]:beta <- compose_noncentered( sigma_block , L_Rho_block , z_block ), matrix[4,actor]:z_actor ~ normal( 0 , 1 ), matrix[4,block_id]:z_block ~ normal( 0 , 1 ), # fixed priors g[tid] ~ normal(0,1), vector[4]:sigma_actor ~ dexp(1), cholesky_factor_corr[4]:L_Rho_actor ~ lkj_corr_cholesky( 2 ), vector[4]:sigma_block ~ dexp(1), cholesky_factor_corr[4]:L_Rho_block ~ lkj_corr_cholesky( 2 ) ) , data=dat , chains=4 , cores=4 , log_lik=TRUE )
  50. Non-centered random chimps m14.3 <- ulam( alist( L ~ binomial(1,p),

    logit(p) <- g[tid] + alpha[actor,tid] + beta[block_id,tid], # adaptive priors - non-centered transpars> matrix[actor,4]:alpha <- compose_noncentered( sigma_actor , L_Rho_actor , z_actor ), transpars> matrix[block_id,4]:beta <- compose_noncentered( sigma_block , L_Rho_block , z_block ), matrix[4,actor]:z_actor ~ normal( 0 , 1 ), matrix[4,block_id]:z_block ~ normal( 0 , 1 ), # fixed priors g[tid] ~ normal(0,1), vector[4]:sigma_actor ~ dexp(1), cholesky_factor_corr[4]:L_Rho_actor ~ lkj_corr_cholesky( 2 ), vector[4]:sigma_block ~ dexp(1), cholesky_factor_corr[4]:L_Rho_block ~ lkj_corr_cholesky( 2 ) ) , data=dat , chains=4 , cores=4 , log_lik=TRUE )
  51. Non-centered random chimps   "%7&/563&4 */ $07"3*"/$& 200 400

    600 800 1000 1200 1000 1500 2000 centered (default) non-centered (cholesky) 'ĶĴłĿIJ ƉƌƎ %JTUS TBNQMFT )Ǿ !! G OPODFOUFSFE QBSBNF DMBTTJĕFE WBSZJOH TMP (ǎǑǡǐ SFTQFDUJWFMZ FRVJWBMFOU JOGFSFODFT WFSTJPO TBNQMFT NVDI number of effective parameters
  52. Random chimpanzees  "%7"/$&% 7"3:*/( 4-01&4  8F DBO JOTQFDU

    UIF TUBOEBSE EFWJBUJPO QBSBNFUFST UP HFU B TFOTF PG IPX BHHSFTTJWFMZ UIF WBSZJOH FČFDUT BSF CFJOH SFHVMBSJ[FE 3 DPEF  +- $.ǿ (ǎǑǡǐ Ǣ  +/#ʙǏ Ǣ +-.ʙǿǫ.$"(Ǿ/*-ǫǢǫ.$"(Ǿ'*&ǫȀ Ȁ ( ) . ǒǡǒʉ ǖǑǡǒʉ )Ǿ !! #/ .$"(Ǿ/*-ȁǎȂ ǎǡǐǖ ǍǡǑǖ ǍǡǕǍ ǏǡǏǑ ǖǍǓ ǎ .$"(Ǿ/*-ȁǏȂ ǍǡǖǏ ǍǡǐǕ ǍǡǑǑ ǎǡǓǑ ǎǍǓǍ ǎ .$"(Ǿ/*-ȁǐȂ ǎǡǕǓ Ǎǡǒǔ ǎǡǎǑ ǏǡǕǖ ǎǎǖǎ ǎ .$"(Ǿ/*-ȁǑȂ ǎǡǒǖ ǍǡǓǓ ǍǡǕǓ ǏǡǕǎ ǎǎǑǕ ǎ .$"(Ǿ'*&ȁǎȂ ǍǡǑǍ ǍǡǐǏ ǍǡǍǐ ǍǡǖǕ ǎǍǐǐ ǎ .$"(Ǿ'*&ȁǏȂ ǍǡǑǑ ǍǡǐǓ ǍǡǍǑ ǎǡǎǍ ǖǑǑ ǎ .$"(Ǿ'*&ȁǐȂ ǍǡǐǍ ǍǡǏǔ ǍǡǍǏ Ǎǡǔǖ ǎǓǍǓ ǎ .$"(Ǿ'*&ȁǑȂ ǍǡǑǔ ǍǡǐǕ ǍǡǍǐ ǎǡǎǒ ǎǍǔǐ ǎ 8IJMF UIFTF BSF KVTU QPTUFSJPS NFBOT BOE UIF BNPVOU PG TISJOLBHF BWFSBHFT PWFS UIF FOUJSF QPTUFSJPS ZPV DBO HFU B TFOTF GSPN UIF TNBMM WBMVFT UIBU TISJOLBHF JT QSFUUZ BHHSFTTJWF IFSF FTQFDJBMMZ JO UIF DBTF PG UIF CMPDLT ćJT JT XIBU UBLFT UIF NPEFM GSPN  BDUVBM QBSBNFUFST UP  FČFDUJWF QBSBNFUFST BT NFBTVSFE CZ 8"*$ PS 14*4-00‰JU BHSFFT JO UIJT DBTF  ćJT JT B HPPE FYBNQMF PG IPX WBSZJOH FČFDUT BEBQU UP UIF EBUB ćF PWFSĕUUJOH SJTL JT NVDI NJMEFS IFSF UIBO JU XPVME CF XJUI PSEJOBSZ ĕYFE FČFDUT *U DBO PG DPVSTF CF DIBMMFOHJOH UP EFĕOF BOE ĕU UIFTF NPEFMT #VU JG ZPV EPOU DIFDL GPS WBSJBUJPO JO TMPQFT ZPV NBZ OFWFS
  53. Correlations Rho_actor[4,4] Rho_actor[4,3] Rho_actor[4,2] Rho_actor[4,1] Rho_actor[3,4] Rho_actor[3,3] Rho_actor[3,2] Rho_actor[3,1] Rho_actor[2,4]

    Rho_actor[2,3] Rho_actor[2,2] Rho_actor[2,1] Rho_actor[1,4] Rho_actor[1,3] Rho_actor[1,2] Rho_actor[1,1] -1.0 -0.5 0.0 0.5 1.0 Value
  54. proportion left lever 0 0.5 1 actor 1 actor 2

    actor 3 actor 4 actor 5 actor 6 actor 7 R/N L/N R/P L/P 'ĶĴłĿIJ ƉƌƏ 1PTUFSJPS QSFEJDUJPOT JO CMBDL BHBJOTU UIF SBX EBUB JO CMVF GPS NPEFM (ǎǑǡǐ UIF DSPTTDMBTTJĕFE WBSZJOH FČFDUT NPEFM ćF MJOF TFH NFOUT BSF  DPNQBUJCJMJUZ JOUFSWBMT 0QFO DJSDMFT BSF USFBUNFOUT XJUIPVU B QBSUOFS 'JMMFE DJSDMFT BSF USFBUNFOUT XJUI B QBSUOFS ćF QSPTPDJBM MPDB UJPO BMUFSOBUFT SJHIUMFęSJHIUMFę BT MBCFMFE JO BDUPS  ȕ -2 / !*- ǿ % $) ǿǎǣǔȀȁǶǏȂ Ȁ ȃ '$) .ǿ ǿ%ǶǎȀȉǑʔǿǎǢǐȀǶ3* Ǣ +'ȁ%ǢǿǎǢǐȀȂ Ǣ '2ʙǏ Ǣ *'ʙ-)"$Ǐ Ȁ  "%7"/$&% 7"3:*/( 4-01&4 8F DBO JOTQFDU UIF TUBOEBSE EFWJBUJPO QBSBNFUFST UP HFU B TFOTF PG IP WBSZJOH FČFDUT BSF CFJOH SFHVMBSJ[FE +- $.ǿ (ǎǑǡǐ Ǣ  +/#ʙǏ Ǣ +-.ʙǿǫ.$"(Ǿ/*-ǫǢǫ.$"(Ǿ'*&ǫȀ ( ) . ǒǡǒʉ ǖǑǡǒʉ )Ǿ !! #/ .$"(Ǿ/*-ȁǎȂ ǎǡǐǖ ǍǡǑǖ ǍǡǕǍ ǏǡǏǑ ǖǍǓ ǎ .$"(Ǿ/*-ȁǏȂ ǍǡǖǏ ǍǡǐǕ ǍǡǑǑ ǎǡǓǑ ǎǍǓǍ ǎ .$"(Ǿ/*-ȁǐȂ ǎǡǕǓ Ǎǡǒǔ ǎǡǎǑ ǏǡǕǖ ǎǎǖǎ ǎ .$"(Ǿ/*-ȁǑȂ ǎǡǒǖ ǍǡǓǓ ǍǡǕǓ ǏǡǕǎ ǎǎǑǕ ǎ .$"(Ǿ'*&ȁǎȂ ǍǡǑǍ ǍǡǐǏ ǍǡǍǐ ǍǡǖǕ ǎǍǐǐ ǎ .$"(Ǿ'*&ȁǏȂ ǍǡǑǑ ǍǡǐǓ ǍǡǍǑ ǎǡǎǍ ǖǑǑ ǎ .$"(Ǿ'*&ȁǐȂ ǍǡǐǍ ǍǡǏǔ ǍǡǍǏ Ǎǡǔǖ ǎǓǍǓ ǎ