Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Statistical Rethinking 2022 Lecture 14

Statistical Rethinking 2022 Lecture 14

Richard McElreath

February 14, 2022
Tweet

More Decks by Richard McElreath

Other Decks in Education

Transcript

  1. Varying effects as confounds Varying effect strategy: Unmeasured features of

    clusters leave an imprint on the data that can be measured by (1) repeat observations of each cluster and (2) partial pooling among clusters Predictive perspective: Important source of cluster-level variation, regularize Causal perspective: Competing causes or unobserved confounds
  2. Varying effects as confounds Causal perspective: Competing causes or actual

    confounds Advantage over “fixed effect” approach: Can include other cluster- level (time invariant) causes Fixed effects: Varying effects with variance fixed at infinity, no pooling Don’t panic: Make a generative model and draw the owl W G1 N2 N1 G2 X1 X2 War Nation Nation Geography Geography
  3. 2 4 6 8 10 12 1 2 3 4

    5 6 7 Story Response 0 10 20 30 40 50 1 2 3 4 5 6 7 Participant Response Add clusters: More index variables, more population priors (previous lecture) Add features: More parameters, more dimensions in each population prior (this lecture) Cluster
 tanks
 stories
 individuals
 departments Features
 survival
 treatment effect
 average response
 admission rate, bias
  4. Adding Features One prior distribution for each cluster One feature:

    One-dimensional distribution Two features: Two-D distribution N features: N-dimensional distribution Hard part: Learning associations α j ∼ Normal( ¯ α, σ) [α j , β j ] ∼ MVNormal([ ¯ α, ¯ β], Σ) α j,1..N ∼ MVNormal( ¯ α, Σ)
  5.  #*/0. Prosocial chimpanzees data(chimpanzees) 504 trials, 7 actors, 6

    blocks 4 treatments: (1) right, no partner
 (2) left, no partner
 (3) right, partner
 (4) left, partner
  6. -2 -1 0 1 2 treatment log-odds R/N L/N R/P

    L/P 0 1 2 3 4 5 0.0 0.4 0.8 sigma_A Density 0 1 2 3 4 5 0.0 1.5 sigma_B Density 0.0 0.2 0.4 0.6 0.8 1.0 actor probability pull left 1 2 3 4 5 6 7 actor variation treatment/block variation
  7. P i ∼ Bernoulli(p i ) mean + Actor-treatment +

    Block-treatment P T A B block actor pulled left side condition logit(p i ) = ¯ α A[i] + α A[i],T[i] + ¯ β B[i] + β B[i],T[i]
  8. P i ∼ Bernoulli(p i ) logit(p i ) =

    ¯ α A[i] + α A[i],T[i] + ¯ β B[i] + β B[i],T[i] mean for each actor actor offset for each treatment mean for each block block offset for each treatment
  9. P i ∼ Bernoulli(p i ) α j ∼ MVNormal([0,0,0,0],

    R A , S A ) for j ∈ 1..7 mean + Actor-treatment + Block-treatment prior for covarying actor-treatment effects a vector of 4 parameters for each actor j logit(p i ) = ¯ α A[i] + α A[i],T[i] + ¯ β B[i] + β B[i],T[i]
  10. P i ∼ Bernoulli(p i ) logit(p i ) =

    ¯ α A[i] + α A[i],T[i] + ¯ β B[i] + β B[i],T[i] α j ∼ MVNormal([0,0,0,0], R A , S A ) β k ∼ MVNormal([0,0,0,0], R B , S B ) for j ∈ 1..7 for k ∈ 1..6 mean + Actor-treatment + Block-treatment prior for covarying actor-treatment effects a vector of 4 parameters for each actor j prior for covarying block-treatment effects a vector of 4 parameters for each block k
  11. P i ∼ Bernoulli(p i ) α j ∼ MVNormal([0,0,0,0],

    R A , S A ) β k ∼ MVNormal([0,0,0,0], R B , S B ) ¯ α j ∼ Normal(0,τ A ) S A,j , S B,j , τ A , τ B ∼ Exponential(1) R A , R B ∼ LKJcorr(4) for j ∈ 1..7 for k ∈ 1..6 for j ∈ 1..4 mean + Actor-treatment + Block-treatment prior for covarying actor-treatment effects a vector of 4 parameters for each actor j prior for covarying block-treatment effects a vector of 4 parameters for each block k each standard deviation gets same prior one standard deviation for each treatment correlation matrix prior logit(p i ) = ¯ α A[i] + α A[i],T[i] + ¯ β B[i] + β B[i],T[i] ¯ β k ∼ Normal(0,τ B ) priors for actor and treatment means
  12. P i ∼ Bernoulli(p i ) α j ∼ MVNormal([0,0,0,0],

    R A , S A ) β k ∼ MVNormal([0,0,0,0], R B , S B ) ¯ α j ∼ Normal(0,τ A ) S A,j , S B,j , τ A , τ B ∼ Exponential(1) R A , R B ∼ LKJcorr(4) for j ∈ 1..7 for k ∈ 1..6 for j ∈ 1..4 logit(p i ) = ¯ α A[i] + α A[i],T[i] + ¯ β B[i] + β B[i],T[i] ¯ β k ∼ Normal(0,τ B )
  13. data(chimpanzees) d <- chimpanzees dat <- list( P = d$pulled_left,

    A = as.integer(d$actor), B = as.integer(d$block), T = 1L + d$prosoc_left + 2L*d$condition) m14.2 <- ulam( alist( P ~ bernoulli(p), logit(p) <- abar[A] + a[A,T] + bbar[B] + b[B,T], # adaptive priors vector[4]:a[A] ~ multi_normal(0,Rho_A,sigma_A), vector[4]:b[B] ~ multi_normal(0,Rho_B,sigma_B), abar[A] ~ normal(0,tau_A), bbar[B] ~ normal(0,tau_B), # fixed priors c(tau_A,tau_B) ~ exponential(1), sigma_A ~ exponential(1), Rho_A ~ dlkjcorr(4), sigma_B ~ exponential(1), Rho_B ~ dlkjcorr(4) ) , data=dat , chains=4 , cores=4 ) P i ∼ Bernoulli(p i ) α j ∼ MVNormal(0,R A , S A ) β k ∼ MVNormal(0,R B , S B ) R A , R B ∼ LKJcorr(4) logit(p i ) = ¯ α A[i] + α A[i],T[i] + ¯ β B[i] + β B ¯ α j ∼ Normal(0,τ A ) ¯ β k ∼ Normal(0,τ B ) S A,j , S B,j , τ A , τ B ∼ Exponential(1)
  14. data(chimpanzees) d <- chimpanzees dat <- list( P = d$pulled_left,

    A = as.integer(d$actor), B = as.integer(d$block), T = 1L + d$prosoc_left + 2L*d$condition) m14.2 <- ulam( alist( P ~ bernoulli(p), logit(p) <- abar[A] + a[A,T] + bbar[B] + b[B,T], # adaptive priors vector[4]:a[A] ~ multi_normal(0,Rho_A,sigma_A), vector[4]:b[B] ~ multi_normal(0,Rho_B,sigma_B), abar[A] ~ normal(0,tau_A), bbar[B] ~ normal(0,tau_B), # fixed priors c(tau_A,tau_B) ~ exponential(1), sigma_A ~ exponential(1), Rho_A ~ dlkjcorr(4), sigma_B ~ exponential(1), Rho_B ~ dlkjcorr(4) ) , data=dat , chains=4 , cores=4 ) P i ∼ Bernoulli(p i ) α j ∼ MVNormal(0,R A , S A ) β k ∼ MVNormal(0,R B , S B ) R A , R B ∼ LKJcorr(4) logit(p i ) = ¯ α A[i] + α A[i],T[i] + ¯ β B[i] + β B ¯ α j ∼ Normal(0,τ A ) ¯ β k ∼ Normal(0,τ B ) S A,j , S B,j , τ A , τ B ∼ Exponential(1)
  15. data(chimpanzees) d <- chimpanzees dat <- list( P = d$pulled_left,

    A = as.integer(d$actor), B = as.integer(d$block), T = 1L + d$prosoc_left + 2L*d$condition) m14.2 <- ulam( alist( P ~ bernoulli(p), logit(p) <- abar[A] + a[A,T] + bbar[B] + b[B,T], # adaptive priors vector[4]:a[A] ~ multi_normal(0,Rho_A,sigma_A), vector[4]:b[B] ~ multi_normal(0,Rho_B,sigma_B), abar[A] ~ normal(0,tau_A), bbar[B] ~ normal(0,tau_B), # fixed priors c(tau_A,tau_B) ~ exponential(1), sigma_A ~ exponential(1), Rho_A ~ dlkjcorr(4), sigma_B ~ exponential(1), Rho_B ~ dlkjcorr(4) ) , data=dat , chains=4 , cores=4 ) P i ∼ Bernoulli(p i ) α j ∼ MVNormal(0,R A , S A ) β k ∼ MVNormal(0,R B , S B ) R A , R B ∼ LKJcorr(4) logit(p i ) = ¯ α A[i] + α A[i],T[i] + ¯ β B[i] + β B ¯ α j ∼ Normal(0,τ A ) ¯ β k ∼ Normal(0,τ B ) S A,j , S B,j , τ A , τ B ∼ Exponential(1)
  16. data(chimpanzees) d <- chimpanzees dat <- list( P = d$pulled_left,

    A = as.integer(d$actor), B = as.integer(d$block), T = 1L + d$prosoc_left + 2L*d$condition) m14.2 <- ulam( alist( P ~ bernoulli(p), logit(p) <- abar[A] + a[A,T] + bbar[B] + b[B,T], # adaptive priors vector[4]:a[A] ~ multi_normal(0,Rho_A,sigma_A), vector[4]:b[B] ~ multi_normal(0,Rho_B,sigma_B), abar[A] ~ normal(0,tau_A), bbar[B] ~ normal(0,tau_B), # fixed priors c(tau_A,tau_B) ~ exponential(1), sigma_A ~ exponential(1), Rho_A ~ dlkjcorr(4), sigma_B ~ exponential(1), Rho_B ~ dlkjcorr(4) ) , data=dat , chains=4 , cores=4 ) P i ∼ Bernoulli(p i ) α j ∼ MVNormal(0,R A , S A ) β k ∼ MVNormal(0,R B , S B ) R A , R B ∼ LKJcorr(4) logit(p i ) = ¯ α A[i] + α A[i],T[i] + ¯ β B[i] + β B ¯ α j ∼ Normal(0,τ A ) ¯ β k ∼ Normal(0,τ B ) S A,j , S B,j , τ A , τ B ∼ Exponential(1)
  17. 500 1000 1500 2000 1.00 1.04 1.08 number of effective

    samples Rhat 220 240 260 280 300 320 340 0.000 0.010 0.020 HMC energy Density 37 Divergent transitions Check yourself before you wreck yourself n_eff = 113 log-probability
  18. v ∼ Normal(0,3) z ∼ Normal(0,1) x = z exp(v)

    v ∼ Normal(0,3) x ∼ Normal(0, exp(v)) “Centered” “Non-centered”
  19. P i ∼ Bernoulli(p i ) α j ∼ Normal(

    ¯ α, σ A ) σ A , σ B ∼ Exponential(1) logit(p i ) = β T[i],B[i] + α A[i] β j,k ∼ Normal(0,σ B ) ¯ α ∼ Normal(0,1.5) Centered
  20. P i ∼ Bernoulli(p i ) α j ∼ Normal(

    ¯ α, σ A ) σ A , σ B ∼ Exponential(1) logit(p i ) = β T[i],B[i] + α A[i] β j,k ∼ Normal(0,σ B ) P i ∼ Bernoulli(p i ) logit(p i ) = ¯ α + (z α,A[i] )σ A + (z β,T[i],B[i] )σ B z α,j ∼ Normal(0,1) z β,j ∼ Normal(0,1) σ A , σ B ∼ Exponential(1) ¯ α ∼ Normal(0,1.5) ¯ α ∼ Normal(0,1.5) Centered Non-centered
  21. P i ∼ Bernoulli(p i ) α j ∼ Normal(

    ¯ α, σ A ) σ A , σ B ∼ Exponential(1) logit(p i ) = β T[i],B[i] + α A[i] β j,k ∼ Normal(0,σ B ) P i ∼ Bernoulli(p i ) logit(p i ) = ¯ α + (z α,A[i] )σ A + (z β,T[i],B[i] )σ B z α,j ∼ Normal(0,1) z β,j ∼ Normal(0,1) σ A , σ B ∼ Exponential(1) ¯ α ∼ Normal(0,1.5) ¯ α ∼ Normal(0,1.5) Centered Non-centered
  22. Centered How can we factor R and S out of

    the priors? P i ∼ Bernoulli(p i ) α j ∼ MVNormal([0,0,0,0], R A , S A ) β k ∼ MVNormal([0,0,0,0], R B , S B ) ¯ α j ∼ Normal(0,τ A ) S A,j , S B,j , τ A , τ B ∼ Exponential(1) R A , R B ∼ LKJcorr(4) logit(p i ) = ¯ α A[i] + α A[i],T[i] + ¯ β B[i] + β B[i],T[i] ¯ β k ∼ Normal(0,τ B )
  23. P i ∼ Bernoulli(p i ) α = S A,1

    0 0 0 0 S A,2 0 0 0 0 S A,3 0 0 0 0 S A,4 L A Z T,A ⊺ diagonal matrix of standard deviations transpose!
 flips rows and columns a 7-by-4 matrix Cholesky factor of correlation matrix across treatments matrix of treatment- actor z-scores logit(p i ) = ¯ α A[i] + α A[i],T[i] + ¯ β B[i] + β B[i],T[i]
  24. α = S A,1 0 0 0 0 S A,2

    0 0 0 0 S A,3 0 0 0 0 S A,4 L A Z T,A ⊺ α = (diag(S A )L A Z T,A) ⊺ P i ∼ Bernoulli(p i ) logit(p i ) = ¯ α A[i] + α A[i],T[i] + ¯ β B[i] + β B[i],T[i] Cholesky factor of correlation matrix across treatments
  25. I. NOTICES SCIENTIFIQUES Commandant BENOIT '. NOTE SUR UNE IVIETHODE

    DE RESOLUTION DES EflUA- TIONS NOR~IALES PROVENANT DE L'APPLICATION BE LA METHODE DES IVIOINDRES CARRIES A UN SYSTEME D'EQUATIONS LINP.AIRES EN NOI~IBRE INF~.RIEUR A CELUI DES INCONNUES. -- APPLICATION DE LA MP~- THODE A LA RF.SOLUTION D'UN Su DEFINI D'P.QUATIONS LINEAIRES. (Procdd~ du Commandant CnoLl~S~'~-.) Le Commandant d' krtillerie Cholesky, du Service g6ographi- que de l':krm6e, tu6 pendant la grande guerre, a imagin6, au cours de recherches sur la compensation des r6seaux g6od6si- ques, un proc6d6 tr~s ing,~nieux de r6solution des 6quations dites normales, obtenues par application de la m6thode des moindres carr6s '~ des 6quations lin6aires en hombre inf6rieur h celui des inconnues, lien a conclu une m6thode g6n6rale de r6solution des 6quations lin6aires. Nous suivrons, pour la d6monstration de cette m6thode, la progression m6me qui a ser~i au Commandant Cholesky pour l'imaginer. André-Louis Cholesky (1875–1918)
  26. I. NOTICES SCIENTIFIQUES Commandant BENOIT '. NOTE SUR UNE IVIETHODE

    DE RESOLUTION DES EflUA- TIONS NOR~IALES PROVENANT DE L'APPLICATION BE LA METHODE DES IVIOINDRES CARRIES A UN SYSTEME D'EQUATIONS LINP.AIRES EN NOI~IBRE INF~.RIEUR A CELUI DES INCONNUES. -- APPLICATION DE LA MP~- THODE A LA RF.SOLUTION D'UN Su DEFINI D'P.QUATIONS LINEAIRES. (Procdd~ du Commandant CnoLl~S~'~-.) Le Commandant d' krtillerie Cholesky, du Service g6ographi- que de l':krm6e, tu6 pendant la grande guerre, a imagin6, au cours de recherches sur la compensation des r6seaux g6od6si- ques, un proc6d6 tr~s ing,~nieux de r6solution des 6quations dites normales, obtenues par application de la m6thode des moindres carr6s '~ des 6quations lin6aires en hombre inf6rieur h celui des inconnues, lien a conclu une m6thode g6n6rale de r6solution des 6quations lin6aires. Nous suivrons, pour la d6monstration de cette m6thode, la progression m6me qui a ser~i au Commandant Cholesky pour l'imaginer. NOTE SUR UNE IVIETHODE DE RESOLUTION DES EflUA- TIONS NOR~IALES PROVENANT DE L'APPLICATION BE LA METHODE DES IVIOINDRES CARRIES A UN SYSTEME D'EQUATIONS LINP.AIRES EN NOI~IBRE INF~.RIEUR A CELUI DES INCONNUES. -- APPLICATION DE LA MP~- THODE A LA RF.SOLUTION D'UN Su DEFINI D'P.QUATIONS LINEAIRES. (Procdd~ du Commandant CnoLl~S~'~-.) Le Commandant d' krtillerie Cholesky, du Service g6ographi- que de l':krm6e, tu6 pendant la grande guerre, a imagin6, au cours de recherches sur la compensation des r6seaux g6od6si- ques, un proc6d6 tr~s ing,~nieux de r6solution des 6quations dites normales, obtenues par application de la m6thode des moindres carr6s '~ des 6quations lin6aires en hombre inf6rieur h celui des inconnues, lien a conclu une m6thode g6n6rale de r6solution des 6quations lin6aires. André-Louis Cholesky (1875–1918)
  27. I. NOTICES SCIENTIFIQUES Commandant BENOIT '. NOTE SUR UNE IVIETHODE

    DE RESOLUTION DES EflUA- TIONS NOR~IALES PROVENANT DE L'APPLICATION BE LA METHODE DES IVIOINDRES CARRIES A UN SYSTEME D'EQUATIONS LINP.AIRES EN NOI~IBRE INF~.RIEUR A CELUI DES INCONNUES. -- APPLICATION DE LA MP~- THODE A LA RF.SOLUTION D'UN Su DEFINI D'P.QUATIONS LINEAIRES. (Procdd~ du Commandant CnoLl~S~'~-.) Le Commandant d' krtillerie Cholesky, du Service g6ographi- que de l':krm6e, tu6 pendant la grande guerre, a imagin6, au cours de recherches sur la compensation des r6seaux g6od6si- ques, un proc6d6 tr~s ing,~nieux de r6solution des 6quations dites normales, obtenues par application de la m6thode des moindres carr6s '~ des 6quations lin6aires en hombre inf6rieur h celui des inconnues, lien a conclu une m6thode g6n6rale de r6solution des 6quations lin6aires. Nous suivrons, pour la d6monstration de cette m6thode, la progression m6me qui a ser~i au Commandant Cholesky pour l'imaginer. NOTE SUR UNE IVIETHODE DE RESOLUTION DES EflUA- TIONS NOR~IALES PROVENANT DE L'APPLICATION BE LA METHODE DES IVIOINDRES CARRIES A UN SYSTEME D'EQUATIONS LINP.AIRES EN NOI~IBRE INF~.RIEUR A CELUI DES INCONNUES. -- APPLICATION DE LA MP~- THODE A LA RF.SOLUTION D'UN Su DEFINI D'P.QUATIONS LINEAIRES. (Procdd~ du Commandant CnoLl~S~'~-.) Le Commandant d' krtillerie Cholesky, du Service g6ographi- que de l':krm6e, tu6 pendant la grande guerre, a imagin6, au cours de recherches sur la compensation des r6seaux g6od6si- ques, un proc6d6 tr~s ing,~nieux de r6solution des 6quations dites normales, obtenues par application de la m6thode des moindres carr6s '~ des 6quations lin6aires en hombre inf6rieur h celui des inconnues, lien a conclu une m6thode g6n6rale de r6solution des 6quations lin6aires. André-Louis Cholesky (1875–1918) The artillery commander Cholesky, of the Geographical Service of the army, killed during the Great War, imagined, during research on the compensation of the geodesic networks, a very ingenious process of solving the equations known as normal, obtained by application of the method of least squares to linear equations in lower number than that of the unknowns.
  28. # define 2D Gaussian with correlation 0.6 N <- 1e4

    sigma1 <- 2 sigma2 <- 0.5 rho <- 0.6 # independent z-scores z1 <- rnorm( N ) z2 <- rnorm( N ) # use Cholesky to blend in correlation a1 <- z1 * sigma1 a2 <- ( rho*z1 + sqrt( 1-rho^2 )*z2 )*sigma2 que de l':krm6e, tu6 pendant la grande guerre, a imagin6, au cours de recherches sur la compensation des r6seaux g6od6si- ques, un proc6d6 tr~s ing,~nieux de r6solution des 6quations dites normales, obtenues par application de la m6thode des moindres carr6s '~ des 6quations lin6aires en hombre inf6rieur h celui des inconnues, lien a conclu une m6thode g6n6rale de r6solution des 6quations lin6aires. Nous suivrons, pour la d6monstration de cette m6thode, la progression m6me qui a ser~i au Commandant Cholesky pour l'imaginer. x. De l'~krtillerie coloniale, ancien officier g6od6sien au Service g~ographique de l'Xrm@ et au Service g6ographique de l'Indo-Chine, Membre du Comit6 national franc ais de Gdoddsie et G6oph3-sique. 2. Sur le Commandant Cholesky, tu6 h l'ennemi le 3~ ao6t I9~8, voir la notice biographique insdr6e dans le volume du B~dlelin g~o- d~sique de x922 intitul6 : Unior~ 9~od~siq~le et ggophysique inlernalio- hale, Premibre Assemblde 9~n~rale, Rome, mai 1929, Seclion de G~o- dgsie, Toulouse, Privat, I9~, in-8 ~ 2~I p., pp. x59 ~ x6i. 5
  29. # define 2D Gaussian with correlation 0.6 N <- 1e4

    sigma1 <- 2 sigma2 <- 0.5 rho <- 0.6 # independent z-scores z1 <- rnorm( N ) z2 <- rnorm( N ) # use Cholesky to blend in correlation a1 <- z1 * sigma1 a2 <- ( rho*z1 + sqrt( 1-rho^2 )*z2 )*sigma2 que de l':krm6e, tu6 pendant la grande guerre, a imagin6, au cours de recherches sur la compensation des r6seaux g6od6si- ques, un proc6d6 tr~s ing,~nieux de r6solution des 6quations dites normales, obtenues par application de la m6thode des moindres carr6s '~ des 6quations lin6aires en hombre inf6rieur h celui des inconnues, lien a conclu une m6thode g6n6rale de r6solution des 6quations lin6aires. Nous suivrons, pour la d6monstration de cette m6thode, la progression m6me qui a ser~i au Commandant Cholesky pour l'imaginer. x. De l'~krtillerie coloniale, ancien officier g6od6sien au Service g~ographique de l'Xrm@ et au Service g6ographique de l'Indo-Chine, Membre du Comit6 national franc ais de Gdoddsie et G6oph3-sique. 2. Sur le Commandant Cholesky, tu6 h l'ennemi le 3~ ao6t I9~8, voir la notice biographique insdr6e dans le volume du B~dlelin g~o- d~sique de x922 intitul6 : Unior~ 9~od~siq~le et ggophysique inlernalio- hale, Premibre Assemblde 9~n~rale, Rome, mai 1929, Seclion de G~o- dgsie, Toulouse, Privat, I9~, in-8 ~ 2~I p., pp. x59 ~ x6i. 5
  30. # define 2D Gaussian with correlation 0.6 N <- 1e4

    sigma1 <- 2 sigma2 <- 0.5 rho <- 0.6 # independent z-scores z1 <- rnorm( N ) z2 <- rnorm( N ) # use Cholesky to blend in correlation a1 <- z1 * sigma1 a2 <- ( rho*z1 + sqrt( 1-rho^2 )*z2 )*sigma2 > cor(z1,z2) [1] -0.0005542644 > cor(a1,a2) [1] 0.5999334 > sd(a1) [1] 1.997036 > sd(a2) [1] 0.4989456 que de l':krm6e, tu6 pendant la grande guerre, a imagin6, au cours de recherches sur la compensation des r6seaux g6od6si- ques, un proc6d6 tr~s ing,~nieux de r6solution des 6quations dites normales, obtenues par application de la m6thode des moindres carr6s '~ des 6quations lin6aires en hombre inf6rieur h celui des inconnues, lien a conclu une m6thode g6n6rale de r6solution des 6quations lin6aires. Nous suivrons, pour la d6monstration de cette m6thode, la progression m6me qui a ser~i au Commandant Cholesky pour l'imaginer. x. De l'~krtillerie coloniale, ancien officier g6od6sien au Service g~ographique de l'Xrm@ et au Service g6ographique de l'Indo-Chine, Membre du Comit6 national franc ais de Gdoddsie et G6oph3-sique. 2. Sur le Commandant Cholesky, tu6 h l'ennemi le 3~ ao6t I9~8, voir la notice biographique insdr6e dans le volume du B~dlelin g~o- d~sique de x922 intitul6 : Unior~ 9~od~siq~le et ggophysique inlernalio- hale, Premibre Assemblde 9~n~rale, Rome, mai 1929, Seclion de G~o- dgsie, Toulouse, Privat, I9~, in-8 ~ 2~I p., pp. x59 ~ x6i. 5 α = (diag(S A )L A Z T,A) ⊺
  31. P i ∼ Bernoulli(p i ) mean + Actor-treatment +

    Block-treatment α = (diag(S A )L A Z T,A) ⊺ β = (diag(S B )L B Z T,B) ⊺ compute alpha from non-centered pieces compute beta from non-centered pieces logit(p i ) = ¯ α A[i] + α A[i],T[i] + ¯ β B[i] + β B[i],T[i]
  32. P i ∼ Bernoulli(p i ) mean + Actor-treatment +

    Block-treatment Z T,A ∼ Normal(0,1) Z T,B ∼ Normal(0,1) α = (diag(S A )L A Z T,A) ⊺ β = (diag(S B )L B Z T,B) ⊺ compute alpha from non-centered pieces compute beta from non-centered pieces matrix of treatment-actor z-scores matrix of treatment-block z-scores logit(p i ) = ¯ α A[i] + α A[i],T[i] + ¯ β B[i] + β B[i],T[i]
  33. P i ∼ Bernoulli(p i ) z ¯ A.j ,

    z¯ B,k ∼ Normal(0,1) mean + Actor-treatment + Block-treatment Z T,A ∼ Normal(0,1) Z T,B ∼ Normal(0,1) α = (diag(S A )L A Z T,A) ⊺ β = (diag(S B )L B Z T,B) ⊺ compute alpha from non-centered pieces compute beta from non-centered pieces matrix of treatment-actor z-scores matrix of treatment-block z-scores logit(p i ) = ¯ α A[i] + α A[i],T[i] + ¯ β B[i] + β B[i],T[i] ¯ α = z ¯ A τ A , ¯ β = z¯ B τ B mean actor and block z-scores compute mean effects
  34. P i ∼ Bernoulli(p i ) z ¯ A.j ,

    z¯ B,k ∼ Normal(0,1) R A , R B ∼ LKJcorr(4) mean + Actor-treatment + Block-treatment Z T,A ∼ Normal(0,1) Z T,B ∼ Normal(0,1) α = (diag(S A )L A Z T,A) ⊺ β = (diag(S B )L B Z T,B) ⊺ compute alpha from non-centered pieces compute beta from non-centered pieces matrix of treatment-actor z-scores matrix of treatment-block z-scores each standard deviation gets same prior correlation matrix prior logit(p i ) = ¯ α A[i] + α A[i],T[i] + ¯ β B[i] + β B[i],T[i] S A,j , S B,j , τ A , τ B ∼ Exponential(1) ¯ α = z ¯ A τ A , ¯ β = z¯ B τ B mean actor and block z-scores compute mean effects
  35. m14.3 <- ulam( alist( P ~ bernoulli(p), logit(p) <- abar[A]

    + a[A,T] + bbar[B] + b[B,T], # adaptive priors - non-centered transpars> matrix[A,4]:a <- compose_noncentered( sigma_A , L_Rho_A , zA ), transpars> matrix[B,4]:b <- compose_noncentered( sigma_B , L_Rho_B , zB ), matrix[4,A]:zA ~ normal( 0 , 1 ), matrix[4,B]:zB ~ normal( 0 , 1 ), zAbar[A] ~ normal(0,1), zBbar[B] ~ normal(0,1), transpars> vector[A]:abar <<- zAbar*tau_A, transpars> vector[B]:bbar <<- zBbar*tau_B, # fixed priors c(tau_A,tau_B) ~ exponential(1), vector[4]:sigma_A ~ exponential(1), cholesky_factor_corr[4]:L_Rho_A ~ lkj_corr_cholesky(4), vector[4]:sigma_B ~ exponential(1), cholesky_factor_corr[4]:L_Rho_B ~ lkj_corr_cholesky(4), # compute ordinary correlation matrixes gq> matrix[4,4]:Rho_A <<- Chol_to_Corr(L_Rho_A), gq> matrix[4,4]:Rho_B <<- Chol_to_Corr(L_Rho_B) ) , data=dat , chains=4 , cores=4 , log_lik=TRUE ) P i ∼ Bernoulli(p i ) z ¯ A.j , z¯ B,k ∼ Normal(0,1) R A , R B ∼ LKJcorr(4) Z T,A ∼ Normal(0,1) Z T,B ∼ Normal(0,1) α = (diag(S A )L A Z T,A) ⊺ β = (diag(S B )L B Z T,B) ⊺ logit(p i ) = ¯ α A[i] + α A[i],T[i] + ¯ β B[i] + β B[i],T[ S A,j , S B,j , τ A , τ B ∼ Exponential(1) ¯ α = z ¯ A τ A , ¯ β = z¯ B τ B
  36. m14.3 <- ulam( alist( P ~ bernoulli(p), logit(p) <- abar[A]

    + a[A,T] + bbar[B] + b[B,T], # adaptive priors - non-centered transpars> matrix[A,4]:a <- compose_noncentered( sigma_A , L_Rho_A , zA ), transpars> matrix[B,4]:b <- compose_noncentered( sigma_B , L_Rho_B , zB ), matrix[4,A]:zA ~ normal( 0 , 1 ), matrix[4,B]:zB ~ normal( 0 , 1 ), zAbar[A] ~ normal(0,1), zBbar[B] ~ normal(0,1), transpars> vector[A]:abar <<- zAbar*tau_A, transpars> vector[B]:bbar <<- zBbar*tau_B, # fixed priors c(tau_A,tau_B) ~ exponential(1), vector[4]:sigma_A ~ exponential(1), cholesky_factor_corr[4]:L_Rho_A ~ lkj_corr_cholesky(4), vector[4]:sigma_B ~ exponential(1), cholesky_factor_corr[4]:L_Rho_B ~ lkj_corr_cholesky(4), # compute ordinary correlation matrixes gq> matrix[4,4]:Rho_A <<- Chol_to_Corr(L_Rho_A), gq> matrix[4,4]:Rho_B <<- Chol_to_Corr(L_Rho_B) ) , data=dat , chains=4 , cores=4 , log_lik=TRUE ) P i ∼ Bernoulli(p i ) z ¯ A.j , z¯ B,k ∼ Normal(0,1) R A , R B ∼ LKJcorr(4) Z T,A ∼ Normal(0,1) Z T,B ∼ Normal(0,1) α = (diag(S A )L A Z T,A) ⊺ β = (diag(S B )L B Z T,B) ⊺ logit(p i ) = ¯ α A[i] + α A[i],T[i] + ¯ β B[i] + β B[i],T[ S A,j , S B,j , τ A , τ B ∼ Exponential(1) ¯ α = z ¯ A τ A , ¯ β = z¯ B τ B
  37. m14.3 <- ulam( alist( P ~ bernoulli(p), logit(p) <- abar[A]

    + a[A,T] + bbar[B] + b[B,T], # adaptive priors - non-centered transpars> matrix[A,4]:a <- compose_noncentered( sigma_A , L_Rho_A , zA ), transpars> matrix[B,4]:b <- compose_noncentered( sigma_B , L_Rho_B , zB ), matrix[4,A]:zA ~ normal( 0 , 1 ), matrix[4,B]:zB ~ normal( 0 , 1 ), zAbar[A] ~ normal(0,1), zBbar[B] ~ normal(0,1), transpars> vector[A]:abar <<- zAbar*tau_A, transpars> vector[B]:bbar <<- zBbar*tau_B, # fixed priors c(tau_A,tau_B) ~ exponential(1), vector[4]:sigma_A ~ exponential(1), cholesky_factor_corr[4]:L_Rho_A ~ lkj_corr_cholesky(4), vector[4]:sigma_B ~ exponential(1), cholesky_factor_corr[4]:L_Rho_B ~ lkj_corr_cholesky(4), # compute ordinary correlation matrixes gq> matrix[4,4]:Rho_A <<- Chol_to_Corr(L_Rho_A), gq> matrix[4,4]:Rho_B <<- Chol_to_Corr(L_Rho_B) ) , data=dat , chains=4 , cores=4 , log_lik=TRUE ) P i ∼ Bernoulli(p i ) z ¯ A.j , z¯ B,k ∼ Normal(0,1) R A , R B ∼ LKJcorr(4) Z T,A ∼ Normal(0,1) Z T,B ∼ Normal(0,1) α = (diag(S A )L A Z T,A) ⊺ β = (diag(S B )L B Z T,B) ⊺ logit(p i ) = ¯ α A[i] + α A[i],T[i] + ¯ β B[i] + β B[i],T[ S A,j , S B,j , τ A , τ B ∼ Exponential(1) ¯ α = z ¯ A τ A , ¯ β = z¯ B τ B
  38. m14.3 <- ulam( alist( P ~ bernoulli(p), logit(p) <- abar[A]

    + a[A,T] + bbar[B] + b[B,T], # adaptive priors - non-centered transpars> matrix[A,4]:a <- compose_noncentered( sigma_A , L_Rho_A , zA ), transpars> matrix[B,4]:b <- compose_noncentered( sigma_B , L_Rho_B , zB ), matrix[4,A]:zA ~ normal( 0 , 1 ), matrix[4,B]:zB ~ normal( 0 , 1 ), zAbar[A] ~ normal(0,1), zBbar[B] ~ normal(0,1), transpars> vector[A]:abar <<- zAbar*tau_A, transpars> vector[B]:bbar <<- zBbar*tau_B, # fixed priors c(tau_A,tau_B) ~ exponential(1), vector[4]:sigma_A ~ exponential(1), cholesky_factor_corr[4]:L_Rho_A ~ lkj_corr_cholesky(4), vector[4]:sigma_B ~ exponential(1), cholesky_factor_corr[4]:L_Rho_B ~ lkj_corr_cholesky(4), # compute ordinary correlation matrixes gq> matrix[4,4]:Rho_A <<- Chol_to_Corr(L_Rho_A), gq> matrix[4,4]:Rho_B <<- Chol_to_Corr(L_Rho_B) ) , data=dat , chains=4 , cores=4 , log_lik=TRUE ) P i ∼ Bernoulli(p i ) z ¯ A.j , z¯ B,k ∼ Normal(0,1) R A , R B ∼ LKJcorr(4) Z T,A ∼ Normal(0,1) Z T,B ∼ Normal(0,1) α = (diag(S A )L A Z T,A) ⊺ β = (diag(S B )L B Z T,B) ⊺ logit(p i ) = ¯ α A[i] + α A[i],T[i] + ¯ β B[i] + β B[i],T[ S A,j , S B,j , τ A , τ B ∼ Exponential(1) ¯ α = z ¯ A τ A , ¯ β = z¯ B τ B
  39. m14.3 <- ulam( alist( P ~ bernoulli(p), logit(p) <- abar[A]

    + a[A,T] + bbar[B] + b[B,T], # adaptive priors - non-centered transpars> matrix[A,4]:a <- compose_noncentered( sigma_A , L_Rho_A , zA ), transpars> matrix[B,4]:b <- compose_noncentered( sigma_B , L_Rho_B , zB ), matrix[4,A]:zA ~ normal( 0 , 1 ), matrix[4,B]:zB ~ normal( 0 , 1 ), zAbar[A] ~ normal(0,1), zBbar[B] ~ normal(0,1), transpars> vector[A]:abar <<- zAbar*tau_A, transpars> vector[B]:bbar <<- zBbar*tau_B, # fixed priors c(tau_A,tau_B) ~ exponential(1), vector[4]:sigma_A ~ exponential(1), cholesky_factor_corr[4]:L_Rho_A ~ lkj_corr_cholesky(4), vector[4]:sigma_B ~ exponential(1), cholesky_factor_corr[4]:L_Rho_B ~ lkj_corr_cholesky(4), # compute ordinary correlation matrixes gq> matrix[4,4]:Rho_A <<- Chol_to_Corr(L_Rho_A), gq> matrix[4,4]:Rho_B <<- Chol_to_Corr(L_Rho_B) ) , data=dat , chains=4 , cores=4 , log_lik=TRUE ) P i ∼ Bernoulli(p i ) z ¯ A.j , z¯ B,k ∼ Normal(0,1) R A , R B ∼ LKJcorr(4) Z T,A ∼ Normal(0,1) Z T,B ∼ Normal(0,1) α = (diag(S A )L A Z T,A) ⊺ β = (diag(S B )L B Z T,B) ⊺ logit(p i ) = ¯ α A[i] + α A[i],T[i] + ¯ β B[i] + β B[i],T[ S A,j , S B,j , τ A , τ B ∼ Exponential(1) ¯ α = z ¯ A τ A , ¯ β = z¯ B τ B
  40. m14.3 <- ulam( alist( P ~ bernoulli(p), logit(p) <- abar[A]

    + a[A,T] + bbar[B] + b[B,T], # adaptive priors - non-centered transpars> matrix[A,4]:a <- compose_noncentered( sigma_A , L_Rho_A , zA ), transpars> matrix[B,4]:b <- compose_noncentered( sigma_B , L_Rho_B , zB ), matrix[4,A]:zA ~ normal( 0 , 1 ), matrix[4,B]:zB ~ normal( 0 , 1 ), zAbar[A] ~ normal(0,1), zBbar[B] ~ normal(0,1), transpars> vector[A]:abar <<- zAbar*tau_A, transpars> vector[B]:bbar <<- zBbar*tau_B, # fixed priors c(tau_A,tau_B) ~ exponential(1), vector[4]:sigma_A ~ exponential(1), cholesky_factor_corr[4]:L_Rho_A ~ lkj_corr_cholesky(4), vector[4]:sigma_B ~ exponential(1), cholesky_factor_corr[4]:L_Rho_B ~ lkj_corr_cholesky(4), # compute ordinary correlation matrixes gq> matrix[4,4]:Rho_A <<- Chol_to_Corr(L_Rho_A), gq> matrix[4,4]:Rho_B <<- Chol_to_Corr(L_Rho_B) ) , data=dat , chains=4 , cores=4 , log_lik=TRUE ) P i ∼ Bernoulli(p i ) z ¯ A.j , z¯ B,k ∼ Normal(0,1) R A , R B ∼ LKJcorr(4) Z T,A ∼ Normal(0,1) Z T,B ∼ Normal(0,1) α = (diag(S A )L A Z T,A) ⊺ β = (diag(S B )L B Z T,B) ⊺ logit(p i ) = ¯ α A[i] + α A[i],T[i] + ¯ β B[i] + β B[i],T[ S A,j , S B,j , τ A , τ B ∼ Exponential(1) ¯ α = z ¯ A τ A , ¯ β = z¯ B τ B
  41. 0 500 1000 1500 2000 2500 0 500 1000 2000

    effective samples (centered model) effective samples (non-centered model) n_eff = 1191 sigma_A[4] n_eff = NaN L_Rho_A[1,1] n_eff = 1718 L_Rho_A[2,1] n_eff = 1912 L_Rho_A[3,1] n_eff = 1863 L_Rho_A[4,1] n_eff = NaN L_Rho_A[1,2] n_eff = 1235 L_Rho_A[2,2] n_eff = 1716 L_Rho_A[3,2] n_eff = 2206 L_Rho_A[4,2] n_eff = NaN L_Rho_A[1,3]
  42. 0.0 0.2 0.4 0.6 0.8 1.0 actor-treatment probability pull left

    1 2 3 4 5 6 7 0.0 0.2 0.4 0.6 0.8 1.0 block-treatment probability pull left 1 2 3 4 5 6
  43. 0.0 0.2 0.4 0.6 0.8 1.0 actor-treatment probability pull left

    1 2 3 4 5 6 7 0.0 0.2 0.4 0.6 0.8 1.0 block-treatment probability pull left 1 2 3 4 5 6 0 1 2 3 4 5 0.0 1.0 2.0 standard dev among actors Density actor means by treatment 0 1 2 3 4 5 0.0 1.0 2.0 3.0 standard dev among blocks Density block means by treatment
  44. Correlated Varying Effects Priors that learn correlation structure:
 (1) partial

    pooling across features
 (2) exploit correlations for prediction & causal inference Varying effects can be correlated even if the prior doesn’t learn the correlations! Ethical obligation to do our best
  45. Course Schedule Week 1 Bayesian inference Chapters 1, 2, 3

    Week 2 Linear models & Causal Inference Chapter 4 Week 3 Causes, Confounds & Colliders Chapters 5 & 6 Week 4 Overfitting / MCMC Chapters 7, 8, 9 Week 5 Generalized Linear Models Chapters 10, 11 Week 6 Ordered categories & Multilevel models Chapters 12 & 13 Week 7 More Multilevel models Chapters 13 & 14 Week 8 Multilevel Tactics & Gaussian Processes Chapter 14 Week 9 Measurement & Missingness Chapter 15 Week 10 Generalized Linear Madness Chapter 16 https://github.com/rmcelreath/stat_rethinking_2022