Statistical Rethinking 2023 - Lecture 10

Statistical Rethinking 10. Counts & Hidden Confounds 2023

Generalized Linear Models Linear Models: Expected value is additive (“linear”)
combination of parameters Generalized Linear Models: Expected value is some function of an additive combination of parameters Y i ∼ Normal(μ i , σ) μ i = α + β X X i + β Z Z i Y i ∼ Bernoulli(p i ) f(p i ) = α + β X X i + β Z Z i f(p i )

logit(p i ) = α + βx i Generalized Linear
Models: Expected value is some function of an additive combination of parameters Uniform changes in predictor not uniform changes in prediction All predictor variables interact, moderate one another In uences predictions & uncertainty of predictions

Confounded Admissions G D A gender department admission

Confounded Admissions G D A gender department admission uability

G D A u set.seed(12) N <- 2000 # number
of applicants # even gender distribution G <- sample( 1:2 , size=N , replace=TRUE ) # sample ability, high (1) to average (0) u <- rbern(N,0.1) # gender 1 tends to apply to department 1, 2 to 2 # and G=1 with greater ability tend to apply to 2 as well D <- rbern( N , ifelse( G==1 , u*1 , 0.75 ) ) + 1 # matrix of acceptance rates [dept,gender] p_u0 <- matrix( c(0.1,0.1,0.1,0.3) , nrow=2 ) p_u1 <- matrix( c(0.3,0.3,0.5,0.5) , nrow=2 ) p_u <- list( p_u0 , p_u1 ) # simulate acceptance p <- sapply( 1:N , function(i) p_u[[1+u[i]]][D[i],G[i]] ) A <- rbern( N , p )

set.seed(12) N <- 2000 # number of applicants # even
gender distribution G <- sample( 1:2 , size=N , replace=TRUE ) # sample ability, high (1) to average (0) u <- rbern(N,0.1) # gender 1 tends to apply to department 1, 2 to 2 # and G=1 with greater ability tend to apply to 2 as well D <- rbern( N , ifelse( G==1 , u*1 , 0.75 ) ) + 1 # matrix of acceptance rates [dept,gender] p_u0 <- matrix( c(0.1,0.1,0.1,0.3) , nrow=2 ) p_u1 <- matrix( c(0.3,0.3,0.5,0.5) , nrow=2 ) p_u <- list( p_u0 , p_u1 ) # simulate acceptance p <- sapply( 1:N , function(i) p_u[[1+u[i]]][D[i],G[i]] ) A <- rbern( N , p ) G D A u

gender distribution G <- sample( 1:2 , size=N , replace=TRUE ) # sample ability, high (1) to average (0) u <- rbern(N,0.1) # gender 1 tends to apply to department 1, 2 to 2 # and G=1 with greater ability tend to apply to 2 as well D <- rbern( N , ifelse( G==1 , u*1 , 0.75 ) ) + 1 # matrix of acceptance rates [dept,gender] p_u0 <- matrix( c(0.1,0.1,0.1,0.3) , nrow=2 ) p_u1 <- matrix( c(0.3,0.3,0.5,0.5) , nrow=2 ) p_u <- list( p_u0 , p_u1 ) # simulate acceptance p <- sapply( 1:N , function(i) p_u[[1+u[i]]][D[i],G[i]] ) A <- rbern( N , p ) G D A u > table(G,D,u) , , u = 0 D G 1 2 1 908 0 2 220 645 , , u = 1 D G 1 2 1 0 114 2 28 85

gender distribution G <- sample( 1:2 , size=N , replace=TRUE ) # sample ability, high (1) to average (0) u <- rbern(N,0.1) # gender 1 tends to apply to department 1, 2 to 2 # and G=1 with greater ability tend to apply to 2 as well D <- rbern( N , ifelse( G==1 , u*1 , 0.75 ) ) + 1 # matrix of acceptance rates [dept,gender] p_u0 <- matrix( c(0.1,0.1,0.1,0.3) , nrow=2 ) p_u1 <- matrix( c(0.3,0.3,0.5,0.5) , nrow=2 ) p_u <- list( p_u0 , p_u1 ) # simulate acceptance p <- sapply( 1:N , function(i) p_u[[1+u[i]]][D[i],G[i]] ) A <- rbern( N , p ) G D A u > p_u0 [,1] [,2] [1,] 0.1 0.1 [2,] 0.1 0.3 > p_u1 [,1] [,2] [1,] 0.3 0.5 [2,] 0.3 0.5 Department 2 discriminates against gender 1

dat_sim <- list( A=A , D=D , G=G ) #
total effect gender m1 <- ulam( alist( A ~ bernoulli(p), logit(p) <- a[G], a[G] ~ normal(0,1) ), data=dat_sim , chains=4 , cores=4 ) # direct effects - now confounded! m2 <- ulam( alist( A ~ bernoulli(p), logit(p) <- a[G,D], matrix[G,D]:a ~ normal(0,1) ), data=dat_sim , chains=4 , cores=4 ) > precis(m1,depth=3) mean sd 5.5% 94.5% n_eff Rhat4 a[1] -1.82 0.09 -1.96 -1.67 1483 1 a[2] -0.89 0.07 -1.00 -0.78 1562 1 > precis(m2,depth=3) mean sd 5.5% 94.5% n_eff Rhat4 a[1,1] -2.05 0.11 -2.23 -1.89 2482 1 a[1,2] -0.66 0.18 -0.96 -0.38 2369 1 a[2,1] -1.77 0.19 -2.06 -1.47 2278 1 a[2,2] -0.68 0.08 -0.81 -0.56 2210 1 total e ect shows disadvantage direct e ect confounded

post2 <- extract.samples(m2) dens(inv_logit(post2$a[,1,1]),lwd=3,col=4,xli m=c(0,0.5),xlab="probability admission") dens(inv_logit(post2$a[,2,1]),lwd=3,col=4,lty =2,add=TRUE) dens(inv_logit(post2$a[,1,2]),lwd=3,col=2,add =TRUE)
dens(inv_logit(post2$a[,2,2]),lwd=3,col=2,lty =2,add=TRUE) > precis(m1,depth=3) mean sd 5.5% 94.5% n_eff Rhat4 a[1] -1.82 0.09 -1.96 -1.67 1483 1 a[2] -0.89 0.07 -1.00 -0.78 1562 1 total e ect shows disadvantage direct e ect confounded D2,G2 D2,G1 0.0 0.1 0.2 0.3 0.4 0.5 0 10 20 30 40 probability admission Density D1,G1 D1,G2

You guessed it: Collider bias Stratifying by D opens non-
causal path through u Can estimate total causal e ect of G, but isn’t what we want Cannot estimate direct e ect of D or G Worst place in the world, Collider Bias country

You guessed it: Collider bias More intuitive explanation: High ability
G1s apply to discriminatory department anyway G1s in that department are higher ability on average than G2s High ability compensates for discrimination => masks evidence G D A u

https://arxiv.org/abs/2207.13665

Citation networks Citation networks of members of NAS Women associated
with lower lifetime citation rate G C gender citations Lerman et al 2022 Gendered citation patterns among the scienti c elite

Membership Elections to NAS Women associated with 3–15 times higher
election rate, controlling for citations gender Card et al 2022 Gender Gaps at the Academies NAS member citations G M C

gender NAS member citations G M C q quality (hidden)

Restrict sample to NAS members, examine citations If men less likely to be elected, then must have higher q,C to compensate

Control for citations, examine elections to NAS G is “treatment”. C is a post-treatment variable! If women less likely to be cited (bias), then women more likely to be elected because they have higher q than indicated by C

No causes in; No causes out Hyped papers with vague
estimands, unwise adjustment sets Policy design through collider bias? We can do better Strong assumptions required Qualitative data useful G M C q

Sensitivity analysis What are the implications of what we don’t
know? Assume confound exists, model its consequences for di erent strengths/kinds of in uence How strong must the confound be to change conclusions? G D A u

Sensitivity analysis A i ∼ Bernoulli(p i ) logit(p i
) = α[G i , D i ] + β G[i] u i G D A u

Sensitivity analysis A i ∼ Bernoulli(p i ) logit(p i
) = α[G i , D i ] + β G[i] u i G D A u (D i = 2) ∼ Bernoulli(q i ) logit(q i ) = δ[G i ] + γ G[i] u i G D A u

A i ∼ Bernoulli(p i ) logit(p i ) =
α[G i , D i ] + β G[i] u i (D i = 2) ∼ Bernoulli(q i ) logit(q i ) = δ[G i ] + γ G[i] u i datl <- dat_sim datl$D2 <- ifelse(datl$D==2,1,0) datl$N <- length(datl$D) datl$b <- c(1,1) datl$g <- c(1,0) mGDu <- ulam( alist( # A model A ~ bernoulli(p), logit(p) <- a[G,D] + b[G]*u[i], matrix[G,D]:a ~ normal(0,1), # D model D2 ~ bernoulli(q), logit(q) <- delta[G] + g[G]*u[i], delta[G] ~ normal(0,1), # declare unobserved u vector[N]:u ~ normal(0,1) ), data=datl , chains=4 , cores=4 ) u j ∼ Normal(0,1)

0.0 0.1 0.2 0.3 0.4 0.5 0 10 20 30
40 probability admission Density 0.0 0.1 0.2 0.3 0.4 0.5 0 10 20 30 40 probability admission Density Ignore confound Assume confound D2,G2 D2,G1 D1,G1 D1,G2

Sensitivity analysis What are the implications of what we don’t
know? Somewhere between pure simulation and pure analysis Vary confound strength over range and show how results change –or– vary other e ects and estimate confound strength Confounds persist — don’t pretend G D A u 0.0 0.1 0.2 0.3 0.4 0.5 0 10 20 30 40 probability admission Density

More parameters (2006) than observations (2000)! A i ∼ Bernoulli(p
i ) logit(p i ) = α[G i , D i ] + β G[i] u i (D i = 2) ∼ Bernoulli(q i ) logit(q i ) = δ[G i ] + γ G[i] u i u j ∼ Normal(0,1)

Charlie Chaplin, Modern Times (1936)

Oceanic Technology How is technological complexity related to population size?
To social structure? data(Kline) In uence of population size and contact on total tools Kline & Boyd 2010 Population size predicts technological complexity in Oceania

Oceanic Technology data(Kline) culture population contact total_tools mean_TU 1 Malekula
1100 low 13 3.2 2 Tikopia 1500 low 22 4.7 3 Santa Cruz 3600 low 24 4.0 4 Yap 4791 high 43 5.0 5 Lau Fiji 7400 high 33 5.0 6 Trobriand 8000 high 19 4.0 7 Chuuk 9200 high 40 3.8 8 Manus 13000 low 28 6.6 9 Tonga 17500 high 55 5.4 10 Hawaii 275000 low 71 6.6 Kline & Boyd 2010 Population size predicts technological complexity in Oceania

Tools Innovations Population Loss Contact

P C T L contact population tools location

P C T L contact population tools location Adjustment set
for P: L Also want to stratify by C, to study interaction

Modeling tools Tool count is not binomial: No maximum Poisson
distribution: Very high maximum and very low probability of each success Here: Many many possible technologies, very few realized in any one place

Poisson link is log Poisson distribution takes shape from expected
value Must be positive Exponential scaling can be surprising! Y i ∼ Poisson(λ i ) log(λ i ) = α + βx i λ i = exp(α + βx i ) log(λ i ) exp

Poisson (poison) priors Exponential scaling can be surprising α ∼
Normal(0,10) log(λ i ) = α

Poisson (poison) priors Exponential scaling can be surprising α ∼
Normal(0,10) log(λ i ) = α 0 20 40 60 80 100 0.00 0.01 0.02 0.03 0.04 0.05 mean number of tools Density

0 20 40 60 80 100 0.00 0.01 0.02 0.03
0.04 0.05 mean number of tools Density Poisson (poison) priors Exponential scaling can be surprising α ∼ Normal(0,10) log(λ i ) = α α ∼ Normal(3,0.5)

Poisson priors -2 -1 0 1 2 0 20 40
60 80 100 x value expected count α ∼ Normal(0,1) log(λ i ) = α + βx i β ∼ Normal(0,10) -2 -1 0 1 2 0 20 40 60 80 100 x value expected count α ∼ Normal(3,0.5) β ∼ Normal(0,0.2)

data(Kline) d <- Kline d$P <- scale( log(d$population) ) d$contact_id
<- ifelse( d$contact=="high" , 2 , 1 ) dat <- list( T = d$total_tools , P = d$P , C = d$contact_id ) # intercept only m11.9 <- ulam( alist( T ~ dpois( lambda ), log(lambda) <- a, a ~ dnorm( 3 , 0.5 ) ), data=dat , chains=4 , log_lik=TRUE ) # interaction model m11.10 <- ulam( alist( T ~ dpois( lambda ), log(lambda) <- a[C] + b[C]*P, a[C] ~ dnorm( 3 , 0.5 ), b[C] ~ dnorm( 0 , 0.2 ) ), data=dat , chains=4 , log_lik=TRUE ) compare( m11.9 , m11.10 , func=PSIS ) Y i ∼ Poisson(λ i ) log(λ i ) = α C[i] + β C[i] log(P i ) α j ∼ Normal(3,0.5) β j ∼ Normal(0,0.2)

data(Kline) d <- Kline d$P <- scale( log(d$population) ) d$contact_id
<- ifelse( d$contact=="high" , 2 , 1 ) dat <- list( T = d$total_tools , P = d$P , C = d$contact_id ) # intercept only m11.9 <- ulam( alist( T ~ dpois( lambda ), log(lambda) <- a, a ~ dnorm( 3 , 0.5 ) ), data=dat , chains=4 , log_lik=TRUE ) # interaction model m11.10 <- ulam( alist( T ~ dpois( lambda ), log(lambda) <- a[C] + b[C]*P, a[C] ~ dnorm( 3 , 0.5 ), b[C] ~ dnorm( 0 , 0.2 ) ), data=dat , chains=4 , log_lik=TRUE ) compare( m11.9 , m11.10 , func=PSIS ) Y i ∼ Poisson(λ i ) log(λ i ) = α C[i] + β C[i] log(P i ) α j ∼ Normal(3,0.5) β j ∼ Normal(0,0.2) > compare( m11.9 , m11.10 , func=PSIS ) Some Pareto k values are high (>0.5). Set pointwise=TRUE to inspect individual points. Some Pareto k values are high (>0.5). Set pointwise=TRUE to inspect individual points. PSIS SE dPSIS dSE pPSIS weight m11.10 85.9 13.50 0.0 NA 7.3 1 m11.9 141.3 33.69 55.4 33.13 8.0 0

> compare( m11.9 , m11.10 , func=PSIS ) Some Pareto
k values are high (>0.5). Set pointwise=TRUE to inspect individual points. Some Pareto k values are high (>0.5). Set pointwise=TRUE to inspect individual points. PSIS SE dPSIS dSE pPSIS weight m11.10 85.9 13.50 0.0 NA 7.3 1 m11.9 141.3 33.69 55.4 33.13 8.0 0 pPSIS is the “penalty”, the e ective number of parameters. How can a model with more parameters (m11.10) have fewer e ective parameters? m11.9 changes more when individual societies are dropped

-1 0 1 2 0 20 40 60 log population
(std) total tools Yap Trobriand Tonga Hawaii k <- PSIS( m11.10 , pointwise=TRUE )$k plot( dat$P , dat$T , xlab="log population (std)" , ylab="total tools" , col=ifelse( dat$C==1 , 4 , 2 ) , lwd=4+4*normalize(k) , ylim=c(0,75) , cex=1+normalize(k) ) # set up the horizontal axis values to compute predictions at P_seq <- seq( from=-1.4 , to=3 , len=100 ) # predictions for C=1 (low contact) lambda <- link( m11.10 , data=data.frame( P=P_seq , C=1 ) ) lmu <- apply( lambda , 2 , mean ) lci <- apply( lambda , 2 , PI ) lines( P_seq , lmu , lty=2 , lwd=1.5 ) shade( lci , P_seq , xpd=TRUE , col=col.alpha(4,0.3) ) # predictions for C=2 (high contact) lambda <- link( m11.10 , data=data.frame( P=P_seq , C=2 ) ) lmu <- apply( lambda , 2 , mean ) lci <- apply( lambda , 2 , PI ) lines( P_seq , lmu , lty=1 , lwd=1.5 ) shade( lci , P_seq , xpd=TRUE , col=col.alpha(2,0.3)) Points scaled by leverage

-1 0 1 2 0 20 40 60 log population
(std) total tools Yap Trobriand Tonga Hawaii 0 50000 150000 250000 0 20 40 60 population total tools Tonga Hawaii log scale Natural scale

0 50000 150000 250000 0 20 40 60 population total
tools Tonga Hawaii

0 50000 150000 250000 0 20 40 6 total tools
Tonga

is model is wack Two immediate ways to improve the
model (1) Use a robust model: gamma-Poisson (neg-binomial) (2) A principled scienti c model 0 50000 150000 250000 0 20 40 60 population total tools Tonga Hawaii

ΔT = αPβ − γT

ΔT = αPβ − γT change in tools

ΔT = αPβ − γT change in tools innovation rate

diminishing returns (elasticity)

diminishing returns (elasticity) rate of loss

ΔT = α C Pβ C − γT diminishing returns
depend upon contact innovation depends upon contact

ΔT = α C Pβ C − γT Di erence
equation: Says how tools change, not expected number What is the equilibrium?

ΔT = α C Pβ C − γT 0 10
20 30 40 50 0 2 4 6 8 10 time Tools P = 1e4 P = 1e3 f <- function(a=0.02,b=0.5,g=0.2,P=1e4,t_max=50) { T <- rep(0,t_max) for ( i in 2:t_max ) T[i] <- T[i-1] + a*P^b - g*T[i-1] return(T) } plot( NULL , xlim=c(0,50) , ylim=c(0,10) , xlab="time" , ylab="Tools" ) T <- f(P=1e3) lines( 1:50 , T , lwd=3 , col=2 ) T <- f(P=1e4) lines( 1:50 , T , lwd=3 , col=2 )

ΔT = α C Pβ C − γT = 0
Solve for equilibrium T

ΔT = α C Pβ C − γT = 0
̂ T = α C Pβ C γ Solve for equilibrium T

̂ T = α C Pβ C γ T i
∼ Poisson(λ i ) λ i = ̂ T

# innovation/loss model dat2 <- list( T=d$total_tools, P=d$population, C=d$contact_id )
m11.11 <- ulam( alist( T ~ dpois( lambda ), lambda <- exp(a[C])*P^b[C]/g, a[C] ~ dnorm(1,1), b[C] ~ dexp(1), g ~ dexp(1) ), data=dat2 , chains=4 , cores=4 ) All parameters must be positive Two ways to do this (1) use exp() (2) use appropriate prior ̂ T = α C Pβ C γ

All parameters must be positive Two ways to do this
(1) use exp() (2) use appropriate prior # innovation/loss model dat2 <- list( T=d$total_tools, P=d$population, C=d$contact_id ) m11.11 <- ulam( alist( T ~ dpois( lambda ), lambda <- exp(a[C])*P^b[C]/g, a[C] ~ dnorm(1,1), b[C] ~ dexp(1), g ~ dexp(1) ), data=dat2 , chains=4 , cores=4 ) ̂ T = α C Pβ C γ > precis(m11.11,2) mean sd 5.5% 94.5% n_eff Rhat4 a[1] 0.85 0.68 -0.26 1.90 698 1 a[2] 0.93 0.83 -0.39 2.31 902 1 b[1] 0.26 0.03 0.21 0.32 1149 1 b[2] 0.29 0.10 0.12 0.45 711 1 g 1.11 0.70 0.32 2.43 862 1

tools Tonga Hawaii # innovation/loss model dat2 <- list( T=d$total_tools, P=d$population, C=d$contact_id ) m11.11 <- ulam( alist( T ~ dpois( lambda ), lambda <- exp(a[C])*P^b[C]/g, a[C] ~ dnorm(1,1), b[C] ~ dexp(1), g ~ dexp(1) ), data=dat2 , chains=4 , cores=4 ) ̂ T = α C Pβ C γ

tools Tonga Hawaii 0 50000 150000 250000 0 20 40 60 population total tools Tonga Hawaii Innovation/loss model Generalized linear model Still have to deal with location as confound

Count GLMs Distributions from constraints Maximum entropy priors: Binomial, Poisson,
and extensions More event types: Multinomial and categorical Robust regressions: Beta-binomial, gamma-Poisson

Course Schedule Week 1 Bayesian inference Chapters 1, 2, 3
Week 2 Linear models & Causal Inference Chapter 4 Week 3 Causes, Confounds & Colliders Chapters 5 & 6 Week 4 Over tting / MCMC Chapters 7, 8, 9 Week 5 Generalized Linear Models Chapters 10, 11 Week 6 Mixtures & ordered categories Chapters 11 & 12 Week 7 Multilevel models I Chapter 13 Week 8 Multilevel models II Chapter 14 Week 9 Measurement & Missingness Chapter 15 Week 10 Generalized Linear Madness Chapter 16 https://github.com/rmcelreath/stat_rethinking_2023

Simpson’s Pandora’s Box Simpson’s paradox: Reversal of an association when
groups are combined or separated No way to know which association (combined/separated) is correct without causal assumptions In nite evils unleashed Simpson 1951. e Interpretation of Interaction in Contingency Tables The container HELD every imaginable estimate. Pandora opens a spreadsheet Simpson’s paradox plagues us to this day.

Berkeley Paradox Unconditional on department: Women admitted at lower rate
Conditional on department: Women admitted slightly more Which is correct? No way to know without assumptions dept admit reject applications gender 1 A 512 313 825 male 2 A 89 19 108 female 3 B 353 207 560 male 4 B 17 8 25 female 5 C 120 205 325 male 6 C 202 391 593 female 7 D 138 279 417 male 8 D 131 244 375 female 9 E 53 138 191 male 10 E 94 299 393 female 11 F 22 351 373 male 12 F 24 317 341 female

Berkeley Paradox Which is correct? No way to know without
assumptions Mediator (department) Collider + confound (ability) dept admit reject applications gender 1 A 512 313 825 male 2 A 89 19 108 female 3 B 353 207 560 male 4 B 17 8 25 female 5 C 120 205 325 male 6 C 202 391 593 female 7 D 138 279 417 male 8 D 131 244 375 female 9 E 53 138 191 male 10 E 94 299 393 female 11 F 22 351 373 male 12 F 24 317 341 female

X Z Y e Pipe X Z Y e Fork
X Z Y e Collider X Z Y e Descendant A

Z = 1 Z = 0 X Z Y -3
-2 -1 0 1 2 3 4 -3 -2 -1 0 1 2 3 X Y cols <- c(4,2) N <- 300 Z <- rbern(N) X <- rnorm(N,2*Z-1) Y <- rnorm(N,2*Z-1) plot( X , Y , col=cols[Z+1] , lwd=3 ) abline(lm(Y[Z==1]~X[Z==1]),col=2,lwd=3) abline(lm(Y[Z==0]~X[Z==0]),col=4,lwd=3) abline(lm(Y~X),lwd=3)

Z = 1 Z = 0 cols <- c(4,2) N
<- 300 X <- rnorm(N) Z <- rbern(N,inv_logit(X)) Y <- rnorm(N,(2*Z-1)) plot( X , Y , col=cols[Z+1] , lwd=3 ) abline(lm(Y[Z==1]~X[Z==1]),col=2,lwd=3) abline(lm(Y[Z==0]~X[Z==0]),col=4,lwd=3) abline(lm(Y~X),lwd=3) X Z Y -3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3 X Y

Z = 1 Z = 0 cols <- c(4,2) N
<- 300 X <- rnorm(N) Y <- rnorm(N) Z <- rbern(N,inv_logit(2*X+2*Y-2)) plot( X , Y , col=cols[Z+1] , lwd=3 ) abline(lm(Y[Z==1]~X[Z==1]),col=2,lwd=3) abline(lm(Y[Z==0]~X[Z==0]),col=4,lwd=3) abline(lm(Y~X),lwd=3) X Z Y -2 -1 0 1 2 3 -2 -1 0 1 2 3 X Y

Non-linear haunting In event models, e ect reversal can arise
other ways Example: Base rate di erences X Y Z treatment outcome covariate

# base rate differences erasing effect of X N <-
1000 X <- rnorm(N) Z <- rbern(N) p <- inv_logit(X + ifelse(Z==1,-1,5)) Y <- rbern(N,p) mX <- quap( alist( Y ~ dbern(p), logit(p) <- a + b*X, a ~ dnorm(0,1), b ~ dnorm(0,1) ) , data=list(Y=Y,X=X) ) mXZ <- quap( alist( Y ~ dbern(p), logit(p) <- a + b[Z]*X, a ~ dnorm(0,1), b[Z] ~ dnorm(0,1) ) , data=list(Y=Y,X=X,Z=Z+1) ) X Y Z

X Y Z # base rate differences erasing effect of
X N <- 1000 X <- rnorm(N) Z <- rbern(N) p <- inv_logit(X + ifelse(Z==1,-1,5)) Y <- rbern(N,p) mX <- quap( alist( Y ~ dbern(p), logit(p) <- a + b*X, a ~ dnorm(0,1), b ~ dnorm(0,1) ) , data=list(Y=Y,X=X) ) mXZ <- quap( alist( Y ~ dbern(p), logit(p) <- a + b[Z]*X, a ~ dnorm(0,1), b[Z] ~ dnorm(0,1) ) , data=list(Y=Y,X=X,Z=Z+1) ) logit(p i ) = α + βX i

# base rate differences erasing effect of X N <-
1000 X <- rnorm(N) Z <- rbern(N) p <- inv_logit(X + ifelse(Z==1,-1,5)) Y <- rbern(N,p) mX <- quap( alist( Y ~ dbern(p), logit(p) <- a + b*X, a ~ dnorm(0,1), b ~ dnorm(0,1) ) , data=list(Y=Y,X=X) ) mXZ <- quap( alist( Y ~ dbern(p), logit(p) <- a + b[Z]*X, a ~ dnorm(0,1), b[Z] ~ dnorm(0,1) ) , data=list(Y=Y,X=X,Z=Z+1) ) X Y Z logit(p i ) = α + βX i logit(p i ) = α + β Z[i] X i

-2 -1 0 1 2 0.0 0.2 0.4 0.6 0.8
1.0 treatment (X) outcome (Y) -2 -1 0 1 2 0.0 0.2 0.4 0.6 0.8 1.0 treatment (X) outcome (Y) logit(p i ) = α + βX i logit(p i ) = α + β Z[i] X i Z = 1 Z = 2 Z collapsed

-2 -1 0 1 2 0.0 0.2 0.4 0.6 0.8
1.0 treatment (X) outcome (Y) logit(p i ) = α + β Z[i] X i Z = 1 Z = 2 -0.5 0.0 0.5 1.0 0 1 2 3 4 beta[Z] Density

mX <- quap( alist( Y ~ dbern(p), logit(p) <- a
+ b*X, a ~ dnorm(0,1), b ~ dnorm(0,1) ) , data=list(Y=Y,X=X) ) mXZ <- quap( alist( Y ~ dbern(p), logit(p) <- a + b[Z]*X, a ~ dnorm(0,1), b[Z] ~ dnorm(0,1) ) , data=list(Y=Y,X=X,Z=Z+1) ) mXZ2 <- quap( alist( Y ~ dbern(p), logit(p) <- a[Z] + b[Z]*X, a[Z] ~ dnorm(0,1), b[Z] ~ dnorm(0,1) ) , data=list(Y=Y,X=X,Z=Z+1) ) logit(p i ) = α + βX i logit(p i ) = α + β Z[i] X i logit(p i ) = α Z[i] + β Z[i] X i

-2 -1 0 1 2 0.0 0.2 0.4 0.6 0.8
1.0 treatment (X) outcome (Y) logit(p i ) = α + β Z[i] X i Z = 1 Z = 2 logit(p i ) = α Z[i] + β Z[i] X i -2 -1 0 1 2 0.0 0.2 0.4 0.6 0.8 1.0 treatment (X) outcome (Y) Z = 1 Z = 2

-2 -1 0 1 2 0.0 0.2 0.4 0.6 0.8
1.0 treatment (X) outcome (Y) logit(p i ) = α + β Z[i] X i Z = 1 Z = 2 logit(p i ) = α Z[i] + β Z[i] X i -2 -1 0 1 2 0.0 0.2 0.4 0.6 0.8 1.0 treatment (X) outcome (Y) Z = 1 Z = 2 -1.0 -0.5 0.0 0.5 1.0 1.5 0.0 1.0 2.0 3.0 beta[Z] Density -0.5 0.0 0.5 1.0 0 1 2 3 4 beta[Z] Density

logit(p i ) = α Z[i] + β Z[i] X
i Z = 1 Z = 2 -1.0 -0.5 0.0 0.5 1.0 1.5 0.0 1.0 2.0 3.0 beta[Z] Density

Simpson’s Pandora’s Box No paradox, because almost anything can produce
it People do not have intuitions about coe cient reversals Stop naming statistical paradoxes; start teaching scienti c logic Simpson 1951. e Interpretation of Interaction in Contingency Tables The container HELD every imaginable estimate. Pandora opens a spreadsheet Simpson’s paradox plagues us to this day.

Statistical Rethinking 2023 - Lecture 10

Statistical Rethinking 2023 - Lecture 10

More Decks by Richard McElreath

Other Decks in Education

Featured

Transcript