Slide 1

Slide 1 text

Statistical Rethinking 14: Correlated Varying Effects 2022

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

Varying effects as confounds Varying effect strategy: Unmeasured features of clusters leave an imprint on the data that can be measured by (1) repeat observations of each cluster and (2) partial pooling among clusters Predictive perspective: Important source of cluster-level variation, regularize Causal perspective: Competing causes or unobserved confounds

Slide 4

Slide 4 text

M D A Age at marriage Marriage Divorce

Slide 5

Slide 5 text

M D A Age at marriage Marriage Divorce R Region

Slide 6

Slide 6 text

G P U C Grandparents Parents Children Neighborhood

Slide 7

Slide 7 text

R X S Y E G P U Treatment Participation Individual

Slide 8

Slide 8 text

R X S Y E G P U Treatment Participation Individual

Slide 9

Slide 9 text

W G1 G2 X1 X2 Government Government War

Slide 10

Slide 10 text

W G1 N2 N1 G2 X1 X2 Government Government War Nation Nation

Slide 11

Slide 11 text

Varying effects as confounds Causal perspective: Competing causes or actual confounds Advantage over “fixed effect” approach: Can include other cluster- level (time invariant) causes Fixed effects: Varying effects with variance fixed at infinity, no pooling Don’t panic: Make a generative model and draw the owl W G1 N2 N1 G2 X1 X2 War Nation Nation Geography Geography

Slide 12

Slide 12 text

2 4 6 8 10 12 1 2 3 4 5 6 7 Story Response 0 10 20 30 40 50 1 2 3 4 5 6 7 Participant Response Add clusters: More index variables, more population priors (previous lecture) Add features: More parameters, more dimensions in each population prior (this lecture) Cluster
 tanks
 stories
 individuals
 departments Features
 survival
 treatment effect
 average response
 admission rate, bias

Slide 13

Slide 13 text

Adding Features One prior distribution for each cluster One feature: One-dimensional distribution Two features: Two-D distribution N features: N-dimensional distribution Hard part: Learning associations α j ∼ Normal( ¯ α, σ) [α j , β j ] ∼ MVNormal([ ¯ α, ¯ β], Σ) α j,1..N ∼ MVNormal( ¯ α, Σ)

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

No content

Slide 16

Slide 16 text

No content

Slide 17

Slide 17 text

No content

Slide 18

Slide 18 text

No content

Slide 19

Slide 19 text

No content

Slide 20

Slide 20 text

No content

Slide 21

Slide 21 text

No content

Slide 22

Slide 22 text

No content

Slide 23

Slide 23 text

No content

Slide 24

Slide 24 text

No content

Slide 25

Slide 25 text

No content

Slide 26

Slide 26 text

No content

Slide 27

Slide 27 text

#*/0. Prosocial chimpanzees data(chimpanzees) 504 trials, 7 actors, 6 blocks 4 treatments: (1) right, no partner
 (2) left, no partner
 (3) right, partner
 (4) left, partner

Slide 28

Slide 28 text

-2 -1 0 1 2 treatment log-odds R/N L/N R/P L/P 0 1 2 3 4 5 0.0 0.4 0.8 sigma_A Density 0 1 2 3 4 5 0.0 1.5 sigma_B Density 0.0 0.2 0.4 0.6 0.8 1.0 actor probability pull left 1 2 3 4 5 6 7 actor variation treatment/block variation

Slide 29

Slide 29 text

P i ∼ Bernoulli(p i ) mean + Actor-treatment + Block-treatment P T A B block actor pulled left side condition logit(p i ) = ¯ α A[i] + α A[i],T[i] + ¯ β B[i] + β B[i],T[i]

Slide 30

Slide 30 text

P i ∼ Bernoulli(p i ) logit(p i ) = ¯ α A[i] + α A[i],T[i] + ¯ β B[i] + β B[i],T[i] mean for each actor actor offset for each treatment mean for each block block offset for each treatment

Slide 31

Slide 31 text

P i ∼ Bernoulli(p i ) α j ∼ MVNormal([0,0,0,0], R A , S A ) for j ∈ 1..7 mean + Actor-treatment + Block-treatment prior for covarying actor-treatment effects a vector of 4 parameters for each actor j logit(p i ) = ¯ α A[i] + α A[i],T[i] + ¯ β B[i] + β B[i],T[i]

Slide 32

Slide 32 text

P i ∼ Bernoulli(p i ) logit(p i ) = ¯ α A[i] + α A[i],T[i] + ¯ β B[i] + β B[i],T[i] α j ∼ MVNormal([0,0,0,0], R A , S A ) β k ∼ MVNormal([0,0,0,0], R B , S B ) for j ∈ 1..7 for k ∈ 1..6 mean + Actor-treatment + Block-treatment prior for covarying actor-treatment effects a vector of 4 parameters for each actor j prior for covarying block-treatment effects a vector of 4 parameters for each block k

Slide 33

Slide 33 text

P i ∼ Bernoulli(p i ) α j ∼ MVNormal([0,0,0,0], R A , S A ) β k ∼ MVNormal([0,0,0,0], R B , S B ) ¯ α j ∼ Normal(0,τ A ) S A,j , S B,j , τ A , τ B ∼ Exponential(1) R A , R B ∼ LKJcorr(4) for j ∈ 1..7 for k ∈ 1..6 for j ∈ 1..4 mean + Actor-treatment + Block-treatment prior for covarying actor-treatment effects a vector of 4 parameters for each actor j prior for covarying block-treatment effects a vector of 4 parameters for each block k each standard deviation gets same prior one standard deviation for each treatment correlation matrix prior logit(p i ) = ¯ α A[i] + α A[i],T[i] + ¯ β B[i] + β B[i],T[i] ¯ β k ∼ Normal(0,τ B ) priors for actor and treatment means

Slide 34

Slide 34 text

P i ∼ Bernoulli(p i ) α j ∼ MVNormal([0,0,0,0], R A , S A ) β k ∼ MVNormal([0,0,0,0], R B , S B ) ¯ α j ∼ Normal(0,τ A ) S A,j , S B,j , τ A , τ B ∼ Exponential(1) R A , R B ∼ LKJcorr(4) for j ∈ 1..7 for k ∈ 1..6 for j ∈ 1..4 logit(p i ) = ¯ α A[i] + α A[i],T[i] + ¯ β B[i] + β B[i],T[i] ¯ β k ∼ Normal(0,τ B )

Slide 35

Slide 35 text

data(chimpanzees) d <- chimpanzees dat <- list( P = d$pulled_left, A = as.integer(d$actor), B = as.integer(d$block), T = 1L + d$prosoc_left + 2L*d$condition) m14.2 <- ulam( alist( P ~ bernoulli(p), logit(p) <- abar[A] + a[A,T] + bbar[B] + b[B,T], # adaptive priors vector[4]:a[A] ~ multi_normal(0,Rho_A,sigma_A), vector[4]:b[B] ~ multi_normal(0,Rho_B,sigma_B), abar[A] ~ normal(0,tau_A), bbar[B] ~ normal(0,tau_B), # fixed priors c(tau_A,tau_B) ~ exponential(1), sigma_A ~ exponential(1), Rho_A ~ dlkjcorr(4), sigma_B ~ exponential(1), Rho_B ~ dlkjcorr(4) ) , data=dat , chains=4 , cores=4 ) P i ∼ Bernoulli(p i ) α j ∼ MVNormal(0,R A , S A ) β k ∼ MVNormal(0,R B , S B ) R A , R B ∼ LKJcorr(4) logit(p i ) = ¯ α A[i] + α A[i],T[i] + ¯ β B[i] + β B ¯ α j ∼ Normal(0,τ A ) ¯ β k ∼ Normal(0,τ B ) S A,j , S B,j , τ A , τ B ∼ Exponential(1)

Slide 36

Slide 36 text

data(chimpanzees) d <- chimpanzees dat <- list( P = d$pulled_left, A = as.integer(d$actor), B = as.integer(d$block), T = 1L + d$prosoc_left + 2L*d$condition) m14.2 <- ulam( alist( P ~ bernoulli(p), logit(p) <- abar[A] + a[A,T] + bbar[B] + b[B,T], # adaptive priors vector[4]:a[A] ~ multi_normal(0,Rho_A,sigma_A), vector[4]:b[B] ~ multi_normal(0,Rho_B,sigma_B), abar[A] ~ normal(0,tau_A), bbar[B] ~ normal(0,tau_B), # fixed priors c(tau_A,tau_B) ~ exponential(1), sigma_A ~ exponential(1), Rho_A ~ dlkjcorr(4), sigma_B ~ exponential(1), Rho_B ~ dlkjcorr(4) ) , data=dat , chains=4 , cores=4 ) P i ∼ Bernoulli(p i ) α j ∼ MVNormal(0,R A , S A ) β k ∼ MVNormal(0,R B , S B ) R A , R B ∼ LKJcorr(4) logit(p i ) = ¯ α A[i] + α A[i],T[i] + ¯ β B[i] + β B ¯ α j ∼ Normal(0,τ A ) ¯ β k ∼ Normal(0,τ B ) S A,j , S B,j , τ A , τ B ∼ Exponential(1)

Slide 37

Slide 37 text

data(chimpanzees) d <- chimpanzees dat <- list( P = d$pulled_left, A = as.integer(d$actor), B = as.integer(d$block), T = 1L + d$prosoc_left + 2L*d$condition) m14.2 <- ulam( alist( P ~ bernoulli(p), logit(p) <- abar[A] + a[A,T] + bbar[B] + b[B,T], # adaptive priors vector[4]:a[A] ~ multi_normal(0,Rho_A,sigma_A), vector[4]:b[B] ~ multi_normal(0,Rho_B,sigma_B), abar[A] ~ normal(0,tau_A), bbar[B] ~ normal(0,tau_B), # fixed priors c(tau_A,tau_B) ~ exponential(1), sigma_A ~ exponential(1), Rho_A ~ dlkjcorr(4), sigma_B ~ exponential(1), Rho_B ~ dlkjcorr(4) ) , data=dat , chains=4 , cores=4 ) P i ∼ Bernoulli(p i ) α j ∼ MVNormal(0,R A , S A ) β k ∼ MVNormal(0,R B , S B ) R A , R B ∼ LKJcorr(4) logit(p i ) = ¯ α A[i] + α A[i],T[i] + ¯ β B[i] + β B ¯ α j ∼ Normal(0,τ A ) ¯ β k ∼ Normal(0,τ B ) S A,j , S B,j , τ A , τ B ∼ Exponential(1)

Slide 38

Slide 38 text

data(chimpanzees) d <- chimpanzees dat <- list( P = d$pulled_left, A = as.integer(d$actor), B = as.integer(d$block), T = 1L + d$prosoc_left + 2L*d$condition) m14.2 <- ulam( alist( P ~ bernoulli(p), logit(p) <- abar[A] + a[A,T] + bbar[B] + b[B,T], # adaptive priors vector[4]:a[A] ~ multi_normal(0,Rho_A,sigma_A), vector[4]:b[B] ~ multi_normal(0,Rho_B,sigma_B), abar[A] ~ normal(0,tau_A), bbar[B] ~ normal(0,tau_B), # fixed priors c(tau_A,tau_B) ~ exponential(1), sigma_A ~ exponential(1), Rho_A ~ dlkjcorr(4), sigma_B ~ exponential(1), Rho_B ~ dlkjcorr(4) ) , data=dat , chains=4 , cores=4 ) P i ∼ Bernoulli(p i ) α j ∼ MVNormal(0,R A , S A ) β k ∼ MVNormal(0,R B , S B ) R A , R B ∼ LKJcorr(4) logit(p i ) = ¯ α A[i] + α A[i],T[i] + ¯ β B[i] + β B ¯ α j ∼ Normal(0,τ A ) ¯ β k ∼ Normal(0,τ B ) S A,j , S B,j , τ A , τ B ∼ Exponential(1)

Slide 39

Slide 39 text

500 1000 1500 2000 1.00 1.04 1.08 number of effective samples Rhat 220 240 260 280 300 320 340 0.000 0.010 0.020 HMC energy Density 37 Divergent transitions Check yourself before you wreck yourself n_eff = 113 log-probability

Slide 40

Slide 40 text

PAUSE

Slide 41

Slide 41 text

v ∼ Normal(0,3) z ∼ Normal(0,1) x = z exp(v) v ∼ Normal(0,3) x ∼ Normal(0, exp(v)) “Centered” “Non-centered”

Slide 42

Slide 42 text

P i ∼ Bernoulli(p i ) α j ∼ Normal( ¯ α, σ A ) σ A , σ B ∼ Exponential(1) logit(p i ) = β T[i],B[i] + α A[i] β j,k ∼ Normal(0,σ B ) ¯ α ∼ Normal(0,1.5) Centered

Slide 43

Slide 43 text

P i ∼ Bernoulli(p i ) α j ∼ Normal( ¯ α, σ A ) σ A , σ B ∼ Exponential(1) logit(p i ) = β T[i],B[i] + α A[i] β j,k ∼ Normal(0,σ B ) P i ∼ Bernoulli(p i ) logit(p i ) = ¯ α + (z α,A[i] )σ A + (z β,T[i],B[i] )σ B z α,j ∼ Normal(0,1) z β,j ∼ Normal(0,1) σ A , σ B ∼ Exponential(1) ¯ α ∼ Normal(0,1.5) ¯ α ∼ Normal(0,1.5) Centered Non-centered

Slide 44

Slide 44 text

P i ∼ Bernoulli(p i ) α j ∼ Normal( ¯ α, σ A ) σ A , σ B ∼ Exponential(1) logit(p i ) = β T[i],B[i] + α A[i] β j,k ∼ Normal(0,σ B ) P i ∼ Bernoulli(p i ) logit(p i ) = ¯ α + (z α,A[i] )σ A + (z β,T[i],B[i] )σ B z α,j ∼ Normal(0,1) z β,j ∼ Normal(0,1) σ A , σ B ∼ Exponential(1) ¯ α ∼ Normal(0,1.5) ¯ α ∼ Normal(0,1.5) Centered Non-centered

Slide 45

Slide 45 text

Centered How can we factor R and S out of the priors? P i ∼ Bernoulli(p i ) α j ∼ MVNormal([0,0,0,0], R A , S A ) β k ∼ MVNormal([0,0,0,0], R B , S B ) ¯ α j ∼ Normal(0,τ A ) S A,j , S B,j , τ A , τ B ∼ Exponential(1) R A , R B ∼ LKJcorr(4) logit(p i ) = ¯ α A[i] + α A[i],T[i] + ¯ β B[i] + β B[i],T[i] ¯ β k ∼ Normal(0,τ B )

Slide 46

Slide 46 text

No content

Slide 47

Slide 47 text

P i ∼ Bernoulli(p i ) α = S A,1 0 0 0 0 S A,2 0 0 0 0 S A,3 0 0 0 0 S A,4 L A Z T,A ⊺ diagonal matrix of standard deviations transpose!
 flips rows and columns a 7-by-4 matrix Cholesky factor of correlation matrix across treatments matrix of treatment- actor z-scores logit(p i ) = ¯ α A[i] + α A[i],T[i] + ¯ β B[i] + β B[i],T[i]

Slide 48

Slide 48 text

α = S A,1 0 0 0 0 S A,2 0 0 0 0 S A,3 0 0 0 0 S A,4 L A Z T,A ⊺ α = (diag(S A )L A Z T,A) ⊺ P i ∼ Bernoulli(p i ) logit(p i ) = ¯ α A[i] + α A[i],T[i] + ¯ β B[i] + β B[i],T[i] Cholesky factor of correlation matrix across treatments

Slide 49

Slide 49 text

I. NOTICES SCIENTIFIQUES Commandant BENOIT '. NOTE SUR UNE IVIETHODE DE RESOLUTION DES EflUA- TIONS NOR~IALES PROVENANT DE L'APPLICATION BE LA METHODE DES IVIOINDRES CARRIES A UN SYSTEME D'EQUATIONS LINP.AIRES EN NOI~IBRE INF~.RIEUR A CELUI DES INCONNUES. -- APPLICATION DE LA MP~- THODE A LA RF.SOLUTION D'UN Su DEFINI D'P.QUATIONS LINEAIRES. (Procdd~ du Commandant CnoLl~S~'~-.) Le Commandant d' krtillerie Cholesky, du Service g6ographi- que de l':krm6e, tu6 pendant la grande guerre, a imagin6, au cours de recherches sur la compensation des r6seaux g6od6si- ques, un proc6d6 tr~s ing,~nieux de r6solution des 6quations dites normales, obtenues par application de la m6thode des moindres carr6s '~ des 6quations lin6aires en hombre inf6rieur h celui des inconnues, lien a conclu une m6thode g6n6rale de r6solution des 6quations lin6aires. Nous suivrons, pour la d6monstration de cette m6thode, la progression m6me qui a ser~i au Commandant Cholesky pour l'imaginer. André-Louis Cholesky (1875–1918)

Slide 50

Slide 50 text

I. NOTICES SCIENTIFIQUES Commandant BENOIT '. NOTE SUR UNE IVIETHODE DE RESOLUTION DES EflUA- TIONS NOR~IALES PROVENANT DE L'APPLICATION BE LA METHODE DES IVIOINDRES CARRIES A UN SYSTEME D'EQUATIONS LINP.AIRES EN NOI~IBRE INF~.RIEUR A CELUI DES INCONNUES. -- APPLICATION DE LA MP~- THODE A LA RF.SOLUTION D'UN Su DEFINI D'P.QUATIONS LINEAIRES. (Procdd~ du Commandant CnoLl~S~'~-.) Le Commandant d' krtillerie Cholesky, du Service g6ographi- que de l':krm6e, tu6 pendant la grande guerre, a imagin6, au cours de recherches sur la compensation des r6seaux g6od6si- ques, un proc6d6 tr~s ing,~nieux de r6solution des 6quations dites normales, obtenues par application de la m6thode des moindres carr6s '~ des 6quations lin6aires en hombre inf6rieur h celui des inconnues, lien a conclu une m6thode g6n6rale de r6solution des 6quations lin6aires. Nous suivrons, pour la d6monstration de cette m6thode, la progression m6me qui a ser~i au Commandant Cholesky pour l'imaginer. NOTE SUR UNE IVIETHODE DE RESOLUTION DES EflUA- TIONS NOR~IALES PROVENANT DE L'APPLICATION BE LA METHODE DES IVIOINDRES CARRIES A UN SYSTEME D'EQUATIONS LINP.AIRES EN NOI~IBRE INF~.RIEUR A CELUI DES INCONNUES. -- APPLICATION DE LA MP~- THODE A LA RF.SOLUTION D'UN Su DEFINI D'P.QUATIONS LINEAIRES. (Procdd~ du Commandant CnoLl~S~'~-.) Le Commandant d' krtillerie Cholesky, du Service g6ographi- que de l':krm6e, tu6 pendant la grande guerre, a imagin6, au cours de recherches sur la compensation des r6seaux g6od6si- ques, un proc6d6 tr~s ing,~nieux de r6solution des 6quations dites normales, obtenues par application de la m6thode des moindres carr6s '~ des 6quations lin6aires en hombre inf6rieur h celui des inconnues, lien a conclu une m6thode g6n6rale de r6solution des 6quations lin6aires. André-Louis Cholesky (1875–1918)

Slide 51

Slide 51 text

I. NOTICES SCIENTIFIQUES Commandant BENOIT '. NOTE SUR UNE IVIETHODE DE RESOLUTION DES EflUA- TIONS NOR~IALES PROVENANT DE L'APPLICATION BE LA METHODE DES IVIOINDRES CARRIES A UN SYSTEME D'EQUATIONS LINP.AIRES EN NOI~IBRE INF~.RIEUR A CELUI DES INCONNUES. -- APPLICATION DE LA MP~- THODE A LA RF.SOLUTION D'UN Su DEFINI D'P.QUATIONS LINEAIRES. (Procdd~ du Commandant CnoLl~S~'~-.) Le Commandant d' krtillerie Cholesky, du Service g6ographi- que de l':krm6e, tu6 pendant la grande guerre, a imagin6, au cours de recherches sur la compensation des r6seaux g6od6si- ques, un proc6d6 tr~s ing,~nieux de r6solution des 6quations dites normales, obtenues par application de la m6thode des moindres carr6s '~ des 6quations lin6aires en hombre inf6rieur h celui des inconnues, lien a conclu une m6thode g6n6rale de r6solution des 6quations lin6aires. Nous suivrons, pour la d6monstration de cette m6thode, la progression m6me qui a ser~i au Commandant Cholesky pour l'imaginer. NOTE SUR UNE IVIETHODE DE RESOLUTION DES EflUA- TIONS NOR~IALES PROVENANT DE L'APPLICATION BE LA METHODE DES IVIOINDRES CARRIES A UN SYSTEME D'EQUATIONS LINP.AIRES EN NOI~IBRE INF~.RIEUR A CELUI DES INCONNUES. -- APPLICATION DE LA MP~- THODE A LA RF.SOLUTION D'UN Su DEFINI D'P.QUATIONS LINEAIRES. (Procdd~ du Commandant CnoLl~S~'~-.) Le Commandant d' krtillerie Cholesky, du Service g6ographi- que de l':krm6e, tu6 pendant la grande guerre, a imagin6, au cours de recherches sur la compensation des r6seaux g6od6si- ques, un proc6d6 tr~s ing,~nieux de r6solution des 6quations dites normales, obtenues par application de la m6thode des moindres carr6s '~ des 6quations lin6aires en hombre inf6rieur h celui des inconnues, lien a conclu une m6thode g6n6rale de r6solution des 6quations lin6aires. André-Louis Cholesky (1875–1918) The artillery commander Cholesky, of the Geographical Service of the army, killed during the Great War, imagined, during research on the compensation of the geodesic networks, a very ingenious process of solving the equations known as normal, obtained by application of the method of least squares to linear equations in lower number than that of the unknowns.

Slide 52

Slide 52 text

# define 2D Gaussian with correlation 0.6 N <- 1e4 sigma1 <- 2 sigma2 <- 0.5 rho <- 0.6 # independent z-scores z1 <- rnorm( N ) z2 <- rnorm( N ) # use Cholesky to blend in correlation a1 <- z1 * sigma1 a2 <- ( rho*z1 + sqrt( 1-rho^2 )*z2 )*sigma2 que de l':krm6e, tu6 pendant la grande guerre, a imagin6, au cours de recherches sur la compensation des r6seaux g6od6si- ques, un proc6d6 tr~s ing,~nieux de r6solution des 6quations dites normales, obtenues par application de la m6thode des moindres carr6s '~ des 6quations lin6aires en hombre inf6rieur h celui des inconnues, lien a conclu une m6thode g6n6rale de r6solution des 6quations lin6aires. Nous suivrons, pour la d6monstration de cette m6thode, la progression m6me qui a ser~i au Commandant Cholesky pour l'imaginer. x. De l'~krtillerie coloniale, ancien officier g6od6sien au Service g~ographique de l'Xrm@ et au Service g6ographique de l'Indo-Chine, Membre du Comit6 national franc ais de Gdoddsie et G6oph3-sique. 2. Sur le Commandant Cholesky, tu6 h l'ennemi le 3~ ao6t I9~8, voir la notice biographique insdr6e dans le volume du B~dlelin g~o- d~sique de x922 intitul6 : Unior~ 9~od~siq~le et ggophysique inlernalio- hale, Premibre Assemblde 9~n~rale, Rome, mai 1929, Seclion de G~o- dgsie, Toulouse, Privat, I9~, in-8 ~ 2~I p., pp. x59 ~ x6i. 5

Slide 53

Slide 53 text

# define 2D Gaussian with correlation 0.6 N <- 1e4 sigma1 <- 2 sigma2 <- 0.5 rho <- 0.6 # independent z-scores z1 <- rnorm( N ) z2 <- rnorm( N ) # use Cholesky to blend in correlation a1 <- z1 * sigma1 a2 <- ( rho*z1 + sqrt( 1-rho^2 )*z2 )*sigma2 que de l':krm6e, tu6 pendant la grande guerre, a imagin6, au cours de recherches sur la compensation des r6seaux g6od6si- ques, un proc6d6 tr~s ing,~nieux de r6solution des 6quations dites normales, obtenues par application de la m6thode des moindres carr6s '~ des 6quations lin6aires en hombre inf6rieur h celui des inconnues, lien a conclu une m6thode g6n6rale de r6solution des 6quations lin6aires. Nous suivrons, pour la d6monstration de cette m6thode, la progression m6me qui a ser~i au Commandant Cholesky pour l'imaginer. x. De l'~krtillerie coloniale, ancien officier g6od6sien au Service g~ographique de l'Xrm@ et au Service g6ographique de l'Indo-Chine, Membre du Comit6 national franc ais de Gdoddsie et G6oph3-sique. 2. Sur le Commandant Cholesky, tu6 h l'ennemi le 3~ ao6t I9~8, voir la notice biographique insdr6e dans le volume du B~dlelin g~o- d~sique de x922 intitul6 : Unior~ 9~od~siq~le et ggophysique inlernalio- hale, Premibre Assemblde 9~n~rale, Rome, mai 1929, Seclion de G~o- dgsie, Toulouse, Privat, I9~, in-8 ~ 2~I p., pp. x59 ~ x6i. 5

Slide 54

Slide 54 text

# define 2D Gaussian with correlation 0.6 N <- 1e4 sigma1 <- 2 sigma2 <- 0.5 rho <- 0.6 # independent z-scores z1 <- rnorm( N ) z2 <- rnorm( N ) # use Cholesky to blend in correlation a1 <- z1 * sigma1 a2 <- ( rho*z1 + sqrt( 1-rho^2 )*z2 )*sigma2 > cor(z1,z2) [1] -0.0005542644 > cor(a1,a2) [1] 0.5999334 > sd(a1) [1] 1.997036 > sd(a2) [1] 0.4989456 que de l':krm6e, tu6 pendant la grande guerre, a imagin6, au cours de recherches sur la compensation des r6seaux g6od6si- ques, un proc6d6 tr~s ing,~nieux de r6solution des 6quations dites normales, obtenues par application de la m6thode des moindres carr6s '~ des 6quations lin6aires en hombre inf6rieur h celui des inconnues, lien a conclu une m6thode g6n6rale de r6solution des 6quations lin6aires. Nous suivrons, pour la d6monstration de cette m6thode, la progression m6me qui a ser~i au Commandant Cholesky pour l'imaginer. x. De l'~krtillerie coloniale, ancien officier g6od6sien au Service g~ographique de l'Xrm@ et au Service g6ographique de l'Indo-Chine, Membre du Comit6 national franc ais de Gdoddsie et G6oph3-sique. 2. Sur le Commandant Cholesky, tu6 h l'ennemi le 3~ ao6t I9~8, voir la notice biographique insdr6e dans le volume du B~dlelin g~o- d~sique de x922 intitul6 : Unior~ 9~od~siq~le et ggophysique inlernalio- hale, Premibre Assemblde 9~n~rale, Rome, mai 1929, Seclion de G~o- dgsie, Toulouse, Privat, I9~, in-8 ~ 2~I p., pp. x59 ~ x6i. 5 α = (diag(S A )L A Z T,A) ⊺

Slide 55

Slide 55 text

P i ∼ Bernoulli(p i ) mean + Actor-treatment + Block-treatment α = (diag(S A )L A Z T,A) ⊺ β = (diag(S B )L B Z T,B) ⊺ compute alpha from non-centered pieces compute beta from non-centered pieces logit(p i ) = ¯ α A[i] + α A[i],T[i] + ¯ β B[i] + β B[i],T[i]

Slide 56

Slide 56 text

P i ∼ Bernoulli(p i ) mean + Actor-treatment + Block-treatment Z T,A ∼ Normal(0,1) Z T,B ∼ Normal(0,1) α = (diag(S A )L A Z T,A) ⊺ β = (diag(S B )L B Z T,B) ⊺ compute alpha from non-centered pieces compute beta from non-centered pieces matrix of treatment-actor z-scores matrix of treatment-block z-scores logit(p i ) = ¯ α A[i] + α A[i],T[i] + ¯ β B[i] + β B[i],T[i]

Slide 57

Slide 57 text

P i ∼ Bernoulli(p i ) z ¯ A.j , z¯ B,k ∼ Normal(0,1) mean + Actor-treatment + Block-treatment Z T,A ∼ Normal(0,1) Z T,B ∼ Normal(0,1) α = (diag(S A )L A Z T,A) ⊺ β = (diag(S B )L B Z T,B) ⊺ compute alpha from non-centered pieces compute beta from non-centered pieces matrix of treatment-actor z-scores matrix of treatment-block z-scores logit(p i ) = ¯ α A[i] + α A[i],T[i] + ¯ β B[i] + β B[i],T[i] ¯ α = z ¯ A τ A , ¯ β = z¯ B τ B mean actor and block z-scores compute mean effects

Slide 58

Slide 58 text

P i ∼ Bernoulli(p i ) z ¯ A.j , z¯ B,k ∼ Normal(0,1) R A , R B ∼ LKJcorr(4) mean + Actor-treatment + Block-treatment Z T,A ∼ Normal(0,1) Z T,B ∼ Normal(0,1) α = (diag(S A )L A Z T,A) ⊺ β = (diag(S B )L B Z T,B) ⊺ compute alpha from non-centered pieces compute beta from non-centered pieces matrix of treatment-actor z-scores matrix of treatment-block z-scores each standard deviation gets same prior correlation matrix prior logit(p i ) = ¯ α A[i] + α A[i],T[i] + ¯ β B[i] + β B[i],T[i] S A,j , S B,j , τ A , τ B ∼ Exponential(1) ¯ α = z ¯ A τ A , ¯ β = z¯ B τ B mean actor and block z-scores compute mean effects

Slide 59

Slide 59 text

m14.3 <- ulam( alist( P ~ bernoulli(p), logit(p) <- abar[A] + a[A,T] + bbar[B] + b[B,T], # adaptive priors - non-centered transpars> matrix[A,4]:a <- compose_noncentered( sigma_A , L_Rho_A , zA ), transpars> matrix[B,4]:b <- compose_noncentered( sigma_B , L_Rho_B , zB ), matrix[4,A]:zA ~ normal( 0 , 1 ), matrix[4,B]:zB ~ normal( 0 , 1 ), zAbar[A] ~ normal(0,1), zBbar[B] ~ normal(0,1), transpars> vector[A]:abar <<- zAbar*tau_A, transpars> vector[B]:bbar <<- zBbar*tau_B, # fixed priors c(tau_A,tau_B) ~ exponential(1), vector[4]:sigma_A ~ exponential(1), cholesky_factor_corr[4]:L_Rho_A ~ lkj_corr_cholesky(4), vector[4]:sigma_B ~ exponential(1), cholesky_factor_corr[4]:L_Rho_B ~ lkj_corr_cholesky(4), # compute ordinary correlation matrixes gq> matrix[4,4]:Rho_A <<- Chol_to_Corr(L_Rho_A), gq> matrix[4,4]:Rho_B <<- Chol_to_Corr(L_Rho_B) ) , data=dat , chains=4 , cores=4 , log_lik=TRUE ) P i ∼ Bernoulli(p i ) z ¯ A.j , z¯ B,k ∼ Normal(0,1) R A , R B ∼ LKJcorr(4) Z T,A ∼ Normal(0,1) Z T,B ∼ Normal(0,1) α = (diag(S A )L A Z T,A) ⊺ β = (diag(S B )L B Z T,B) ⊺ logit(p i ) = ¯ α A[i] + α A[i],T[i] + ¯ β B[i] + β B[i],T[ S A,j , S B,j , τ A , τ B ∼ Exponential(1) ¯ α = z ¯ A τ A , ¯ β = z¯ B τ B

Slide 60

Slide 60 text

m14.3 <- ulam( alist( P ~ bernoulli(p), logit(p) <- abar[A] + a[A,T] + bbar[B] + b[B,T], # adaptive priors - non-centered transpars> matrix[A,4]:a <- compose_noncentered( sigma_A , L_Rho_A , zA ), transpars> matrix[B,4]:b <- compose_noncentered( sigma_B , L_Rho_B , zB ), matrix[4,A]:zA ~ normal( 0 , 1 ), matrix[4,B]:zB ~ normal( 0 , 1 ), zAbar[A] ~ normal(0,1), zBbar[B] ~ normal(0,1), transpars> vector[A]:abar <<- zAbar*tau_A, transpars> vector[B]:bbar <<- zBbar*tau_B, # fixed priors c(tau_A,tau_B) ~ exponential(1), vector[4]:sigma_A ~ exponential(1), cholesky_factor_corr[4]:L_Rho_A ~ lkj_corr_cholesky(4), vector[4]:sigma_B ~ exponential(1), cholesky_factor_corr[4]:L_Rho_B ~ lkj_corr_cholesky(4), # compute ordinary correlation matrixes gq> matrix[4,4]:Rho_A <<- Chol_to_Corr(L_Rho_A), gq> matrix[4,4]:Rho_B <<- Chol_to_Corr(L_Rho_B) ) , data=dat , chains=4 , cores=4 , log_lik=TRUE ) P i ∼ Bernoulli(p i ) z ¯ A.j , z¯ B,k ∼ Normal(0,1) R A , R B ∼ LKJcorr(4) Z T,A ∼ Normal(0,1) Z T,B ∼ Normal(0,1) α = (diag(S A )L A Z T,A) ⊺ β = (diag(S B )L B Z T,B) ⊺ logit(p i ) = ¯ α A[i] + α A[i],T[i] + ¯ β B[i] + β B[i],T[ S A,j , S B,j , τ A , τ B ∼ Exponential(1) ¯ α = z ¯ A τ A , ¯ β = z¯ B τ B

Slide 61

Slide 61 text

m14.3 <- ulam( alist( P ~ bernoulli(p), logit(p) <- abar[A] + a[A,T] + bbar[B] + b[B,T], # adaptive priors - non-centered transpars> matrix[A,4]:a <- compose_noncentered( sigma_A , L_Rho_A , zA ), transpars> matrix[B,4]:b <- compose_noncentered( sigma_B , L_Rho_B , zB ), matrix[4,A]:zA ~ normal( 0 , 1 ), matrix[4,B]:zB ~ normal( 0 , 1 ), zAbar[A] ~ normal(0,1), zBbar[B] ~ normal(0,1), transpars> vector[A]:abar <<- zAbar*tau_A, transpars> vector[B]:bbar <<- zBbar*tau_B, # fixed priors c(tau_A,tau_B) ~ exponential(1), vector[4]:sigma_A ~ exponential(1), cholesky_factor_corr[4]:L_Rho_A ~ lkj_corr_cholesky(4), vector[4]:sigma_B ~ exponential(1), cholesky_factor_corr[4]:L_Rho_B ~ lkj_corr_cholesky(4), # compute ordinary correlation matrixes gq> matrix[4,4]:Rho_A <<- Chol_to_Corr(L_Rho_A), gq> matrix[4,4]:Rho_B <<- Chol_to_Corr(L_Rho_B) ) , data=dat , chains=4 , cores=4 , log_lik=TRUE ) P i ∼ Bernoulli(p i ) z ¯ A.j , z¯ B,k ∼ Normal(0,1) R A , R B ∼ LKJcorr(4) Z T,A ∼ Normal(0,1) Z T,B ∼ Normal(0,1) α = (diag(S A )L A Z T,A) ⊺ β = (diag(S B )L B Z T,B) ⊺ logit(p i ) = ¯ α A[i] + α A[i],T[i] + ¯ β B[i] + β B[i],T[ S A,j , S B,j , τ A , τ B ∼ Exponential(1) ¯ α = z ¯ A τ A , ¯ β = z¯ B τ B

Slide 62

Slide 62 text

m14.3 <- ulam( alist( P ~ bernoulli(p), logit(p) <- abar[A] + a[A,T] + bbar[B] + b[B,T], # adaptive priors - non-centered transpars> matrix[A,4]:a <- compose_noncentered( sigma_A , L_Rho_A , zA ), transpars> matrix[B,4]:b <- compose_noncentered( sigma_B , L_Rho_B , zB ), matrix[4,A]:zA ~ normal( 0 , 1 ), matrix[4,B]:zB ~ normal( 0 , 1 ), zAbar[A] ~ normal(0,1), zBbar[B] ~ normal(0,1), transpars> vector[A]:abar <<- zAbar*tau_A, transpars> vector[B]:bbar <<- zBbar*tau_B, # fixed priors c(tau_A,tau_B) ~ exponential(1), vector[4]:sigma_A ~ exponential(1), cholesky_factor_corr[4]:L_Rho_A ~ lkj_corr_cholesky(4), vector[4]:sigma_B ~ exponential(1), cholesky_factor_corr[4]:L_Rho_B ~ lkj_corr_cholesky(4), # compute ordinary correlation matrixes gq> matrix[4,4]:Rho_A <<- Chol_to_Corr(L_Rho_A), gq> matrix[4,4]:Rho_B <<- Chol_to_Corr(L_Rho_B) ) , data=dat , chains=4 , cores=4 , log_lik=TRUE ) P i ∼ Bernoulli(p i ) z ¯ A.j , z¯ B,k ∼ Normal(0,1) R A , R B ∼ LKJcorr(4) Z T,A ∼ Normal(0,1) Z T,B ∼ Normal(0,1) α = (diag(S A )L A Z T,A) ⊺ β = (diag(S B )L B Z T,B) ⊺ logit(p i ) = ¯ α A[i] + α A[i],T[i] + ¯ β B[i] + β B[i],T[ S A,j , S B,j , τ A , τ B ∼ Exponential(1) ¯ α = z ¯ A τ A , ¯ β = z¯ B τ B

Slide 63

Slide 63 text

m14.3 <- ulam( alist( P ~ bernoulli(p), logit(p) <- abar[A] + a[A,T] + bbar[B] + b[B,T], # adaptive priors - non-centered transpars> matrix[A,4]:a <- compose_noncentered( sigma_A , L_Rho_A , zA ), transpars> matrix[B,4]:b <- compose_noncentered( sigma_B , L_Rho_B , zB ), matrix[4,A]:zA ~ normal( 0 , 1 ), matrix[4,B]:zB ~ normal( 0 , 1 ), zAbar[A] ~ normal(0,1), zBbar[B] ~ normal(0,1), transpars> vector[A]:abar <<- zAbar*tau_A, transpars> vector[B]:bbar <<- zBbar*tau_B, # fixed priors c(tau_A,tau_B) ~ exponential(1), vector[4]:sigma_A ~ exponential(1), cholesky_factor_corr[4]:L_Rho_A ~ lkj_corr_cholesky(4), vector[4]:sigma_B ~ exponential(1), cholesky_factor_corr[4]:L_Rho_B ~ lkj_corr_cholesky(4), # compute ordinary correlation matrixes gq> matrix[4,4]:Rho_A <<- Chol_to_Corr(L_Rho_A), gq> matrix[4,4]:Rho_B <<- Chol_to_Corr(L_Rho_B) ) , data=dat , chains=4 , cores=4 , log_lik=TRUE ) P i ∼ Bernoulli(p i ) z ¯ A.j , z¯ B,k ∼ Normal(0,1) R A , R B ∼ LKJcorr(4) Z T,A ∼ Normal(0,1) Z T,B ∼ Normal(0,1) α = (diag(S A )L A Z T,A) ⊺ β = (diag(S B )L B Z T,B) ⊺ logit(p i ) = ¯ α A[i] + α A[i],T[i] + ¯ β B[i] + β B[i],T[ S A,j , S B,j , τ A , τ B ∼ Exponential(1) ¯ α = z ¯ A τ A , ¯ β = z¯ B τ B

Slide 64

Slide 64 text

m14.3 <- ulam( alist( P ~ bernoulli(p), logit(p) <- abar[A] + a[A,T] + bbar[B] + b[B,T], # adaptive priors - non-centered transpars> matrix[A,4]:a <- compose_noncentered( sigma_A , L_Rho_A , zA ), transpars> matrix[B,4]:b <- compose_noncentered( sigma_B , L_Rho_B , zB ), matrix[4,A]:zA ~ normal( 0 , 1 ), matrix[4,B]:zB ~ normal( 0 , 1 ), zAbar[A] ~ normal(0,1), zBbar[B] ~ normal(0,1), transpars> vector[A]:abar <<- zAbar*tau_A, transpars> vector[B]:bbar <<- zBbar*tau_B, # fixed priors c(tau_A,tau_B) ~ exponential(1), vector[4]:sigma_A ~ exponential(1), cholesky_factor_corr[4]:L_Rho_A ~ lkj_corr_cholesky(4), vector[4]:sigma_B ~ exponential(1), cholesky_factor_corr[4]:L_Rho_B ~ lkj_corr_cholesky(4), # compute ordinary correlation matrixes gq> matrix[4,4]:Rho_A <<- Chol_to_Corr(L_Rho_A), gq> matrix[4,4]:Rho_B <<- Chol_to_Corr(L_Rho_B) ) , data=dat , chains=4 , cores=4 , log_lik=TRUE ) P i ∼ Bernoulli(p i ) z ¯ A.j , z¯ B,k ∼ Normal(0,1) R A , R B ∼ LKJcorr(4) Z T,A ∼ Normal(0,1) Z T,B ∼ Normal(0,1) α = (diag(S A )L A Z T,A) ⊺ β = (diag(S B )L B Z T,B) ⊺ logit(p i ) = ¯ α A[i] + α A[i],T[i] + ¯ β B[i] + β B[i],T[ S A,j , S B,j , τ A , τ B ∼ Exponential(1) ¯ α = z ¯ A τ A , ¯ β = z¯ B τ B

Slide 65

Slide 65 text

0 500 1000 1500 2000 2500 0 500 1000 2000 effective samples (centered model) effective samples (non-centered model) n_eff = 1191 sigma_A[4] n_eff = NaN L_Rho_A[1,1] n_eff = 1718 L_Rho_A[2,1] n_eff = 1912 L_Rho_A[3,1] n_eff = 1863 L_Rho_A[4,1] n_eff = NaN L_Rho_A[1,2] n_eff = 1235 L_Rho_A[2,2] n_eff = 1716 L_Rho_A[3,2] n_eff = 2206 L_Rho_A[4,2] n_eff = NaN L_Rho_A[1,3]

Slide 66

Slide 66 text

0.0 0.2 0.4 0.6 0.8 1.0 actor-treatment probability pull left 1 2 3 4 5 6 7 0.0 0.2 0.4 0.6 0.8 1.0 block-treatment probability pull left 1 2 3 4 5 6

Slide 67

Slide 67 text

0.0 0.2 0.4 0.6 0.8 1.0 actor-treatment probability pull left 1 2 3 4 5 6 7 0.0 0.2 0.4 0.6 0.8 1.0 block-treatment probability pull left 1 2 3 4 5 6 0 1 2 3 4 5 0.0 1.0 2.0 standard dev among actors Density actor means by treatment 0 1 2 3 4 5 0.0 1.0 2.0 3.0 standard dev among blocks Density block means by treatment

Slide 68

Slide 68 text

Correlated Varying Effects Priors that learn correlation structure:
 (1) partial pooling across features
 (2) exploit correlations for prediction & causal inference Varying effects can be correlated even if the prior doesn’t learn the correlations! Ethical obligation to do our best

Slide 69

Slide 69 text

Course Schedule Week 1 Bayesian inference Chapters 1, 2, 3 Week 2 Linear models & Causal Inference Chapter 4 Week 3 Causes, Confounds & Colliders Chapters 5 & 6 Week 4 Overfitting / MCMC Chapters 7, 8, 9 Week 5 Generalized Linear Models Chapters 10, 11 Week 6 Ordered categories & Multilevel models Chapters 12 & 13 Week 7 More Multilevel models Chapters 13 & 14 Week 8 Multilevel Tactics & Gaussian Processes Chapter 14 Week 9 Measurement & Missingness Chapter 15 Week 10 Generalized Linear Madness Chapter 16 https://github.com/rmcelreath/stat_rethinking_2022

Slide 70

Slide 70 text

No content