Richard McElreath
February 01, 2023
1.4k

# Statistical Rethinking 2023 - Lecture 10

## Richard McElreath

February 01, 2023

## Transcript

2. ### Generalized Linear Models Linear Models: Expected value is additive (“linear”)

combination of parameters Generalized Linear Models: Expected value is some function of an additive combination of parameters Y i ∼ Normal(μ i , σ) μ i = α + β X X i + β Z Z i Y i ∼ Bernoulli(p i ) f(p i ) = α + β X X i + β Z Z i f(p i )
3. ### logit(p i ) = α + βx i Generalized Linear

Models: Expected value is some function of an additive combination of parameters Uniform changes in predictor not uniform changes in prediction All predictor variables interact, moderate one another In uences predictions & uncertainty of predictions

6. ### G D A u set.seed(12) N <- 2000 # number

of applicants # even gender distribution G <- sample( 1:2 , size=N , replace=TRUE ) # sample ability, high (1) to average (0) u <- rbern(N,0.1) # gender 1 tends to apply to department 1, 2 to 2 # and G=1 with greater ability tend to apply to 2 as well D <- rbern( N , ifelse( G==1 , u*1 , 0.75 ) ) + 1 # matrix of acceptance rates [dept,gender] p_u0 <- matrix( c(0.1,0.1,0.1,0.3) , nrow=2 ) p_u1 <- matrix( c(0.3,0.3,0.5,0.5) , nrow=2 ) p_u <- list( p_u0 , p_u1 ) # simulate acceptance p <- sapply( 1:N , function(i) p_u[[1+u[i]]][D[i],G[i]] ) A <- rbern( N , p )
7. ### set.seed(12) N <- 2000 # number of applicants # even

gender distribution G <- sample( 1:2 , size=N , replace=TRUE ) # sample ability, high (1) to average (0) u <- rbern(N,0.1) # gender 1 tends to apply to department 1, 2 to 2 # and G=1 with greater ability tend to apply to 2 as well D <- rbern( N , ifelse( G==1 , u*1 , 0.75 ) ) + 1 # matrix of acceptance rates [dept,gender] p_u0 <- matrix( c(0.1,0.1,0.1,0.3) , nrow=2 ) p_u1 <- matrix( c(0.3,0.3,0.5,0.5) , nrow=2 ) p_u <- list( p_u0 , p_u1 ) # simulate acceptance p <- sapply( 1:N , function(i) p_u[[1+u[i]]][D[i],G[i]] ) A <- rbern( N , p ) G D A u
8. ### set.seed(12) N <- 2000 # number of applicants # even

gender distribution G <- sample( 1:2 , size=N , replace=TRUE ) # sample ability, high (1) to average (0) u <- rbern(N,0.1) # gender 1 tends to apply to department 1, 2 to 2 # and G=1 with greater ability tend to apply to 2 as well D <- rbern( N , ifelse( G==1 , u*1 , 0.75 ) ) + 1 # matrix of acceptance rates [dept,gender] p_u0 <- matrix( c(0.1,0.1,0.1,0.3) , nrow=2 ) p_u1 <- matrix( c(0.3,0.3,0.5,0.5) , nrow=2 ) p_u <- list( p_u0 , p_u1 ) # simulate acceptance p <- sapply( 1:N , function(i) p_u[[1+u[i]]][D[i],G[i]] ) A <- rbern( N , p ) G D A u
9. ### set.seed(12) N <- 2000 # number of applicants # even

gender distribution G <- sample( 1:2 , size=N , replace=TRUE ) # sample ability, high (1) to average (0) u <- rbern(N,0.1) # gender 1 tends to apply to department 1, 2 to 2 # and G=1 with greater ability tend to apply to 2 as well D <- rbern( N , ifelse( G==1 , u*1 , 0.75 ) ) + 1 # matrix of acceptance rates [dept,gender] p_u0 <- matrix( c(0.1,0.1,0.1,0.3) , nrow=2 ) p_u1 <- matrix( c(0.3,0.3,0.5,0.5) , nrow=2 ) p_u <- list( p_u0 , p_u1 ) # simulate acceptance p <- sapply( 1:N , function(i) p_u[[1+u[i]]][D[i],G[i]] ) A <- rbern( N , p ) G D A u > table(G,D,u) , , u = 0 D G 1 2 1 908 0 2 220 645 , , u = 1 D G 1 2 1 0 114 2 28 85
10. ### set.seed(12) N <- 2000 # number of applicants # even

gender distribution G <- sample( 1:2 , size=N , replace=TRUE ) # sample ability, high (1) to average (0) u <- rbern(N,0.1) # gender 1 tends to apply to department 1, 2 to 2 # and G=1 with greater ability tend to apply to 2 as well D <- rbern( N , ifelse( G==1 , u*1 , 0.75 ) ) + 1 # matrix of acceptance rates [dept,gender] p_u0 <- matrix( c(0.1,0.1,0.1,0.3) , nrow=2 ) p_u1 <- matrix( c(0.3,0.3,0.5,0.5) , nrow=2 ) p_u <- list( p_u0 , p_u1 ) # simulate acceptance p <- sapply( 1:N , function(i) p_u[[1+u[i]]][D[i],G[i]] ) A <- rbern( N , p ) G D A u > p_u0 [,1] [,2] [1,] 0.1 0.1 [2,] 0.1 0.3 > p_u1 [,1] [,2] [1,] 0.3 0.5 [2,] 0.3 0.5 Department 2 discriminates against gender 1
11. ### dat_sim <- list( A=A , D=D , G=G ) #

total effect gender m1 <- ulam( alist( A ~ bernoulli(p), logit(p) <- a[G], a[G] ~ normal(0,1) ), data=dat_sim , chains=4 , cores=4 ) # direct effects - now confounded! m2 <- ulam( alist( A ~ bernoulli(p), logit(p) <- a[G,D], matrix[G,D]:a ~ normal(0,1) ), data=dat_sim , chains=4 , cores=4 ) > precis(m1,depth=3) mean sd 5.5% 94.5% n_eff Rhat4 a[1] -1.82 0.09 -1.96 -1.67 1483 1 a[2] -0.89 0.07 -1.00 -0.78 1562 1 > precis(m2,depth=3) mean sd 5.5% 94.5% n_eff Rhat4 a[1,1] -2.05 0.11 -2.23 -1.89 2482 1 a[1,2] -0.66 0.18 -0.96 -0.38 2369 1 a[2,1] -1.77 0.19 -2.06 -1.47 2278 1 a[2,2] -0.68 0.08 -0.81 -0.56 2210 1 total e ect shows disadvantage direct e ect confounded

dens(inv_logit(post2\$a[,2,2]),lwd=3,col=2,lty =2,add=TRUE) > precis(m1,depth=3) mean sd 5.5% 94.5% n_eff Rhat4 a[1] -1.82 0.09 -1.96 -1.67 1483 1 a[2] -0.89 0.07 -1.00 -0.78 1562 1 total e ect shows disadvantage direct e ect confounded D2,G2 D2,G1 0.0 0.1 0.2 0.3 0.4 0.5 0 10 20 30 40 probability admission Density D1,G1 D1,G2
13. ### You guessed it: Collider bias Stratifying by D opens non-

causal path through u Can estimate total causal e ect of G, but isn’t what we want Cannot estimate direct e ect of D or G Worst place in the world, Collider Bias country
14. ### You guessed it: Collider bias More intuitive explanation: High ability

G1s apply to discriminatory department anyway G1s in that department are higher ability on average than G2s High ability compensates for discrimination => masks evidence G D A u

16. ### Citation networks Citation networks of members of NAS Women associated

with lower lifetime citation rate G C gender citations Lerman et al 2022 Gendered citation patterns among the scienti c elite
17. ### Membership Elections to NAS Women associated with 3–15 times higher

election rate, controlling for citations gender Card et al 2022 Gender Gaps at the Academies NAS member citations G M C

19. ### gender NAS member citations G M C q quality (hidden)

Restrict sample to NAS members, examine citations If men less likely to be elected, then must have higher q,C to compensate
20. ### gender NAS member citations G M C q quality (hidden)

Control for citations, examine elections to NAS G is “treatment”. C is a post-treatment variable! If women less likely to be cited (bias), then women more likely to be elected because they have higher q than indicated by C
21. ### No causes in; No causes out Hyped papers with vague

estimands, unwise adjustment sets Policy design through collider bias? We can do better Strong assumptions required Qualitative data useful G M C q
22. ### Sensitivity analysis What are the implications of what we don’t

know? Assume confound exists, model its consequences for di erent strengths/kinds of in uence How strong must the confound be to change conclusions? G D A u
23. ### Sensitivity analysis A i ∼ Bernoulli(p i ) logit(p i

) = α[G i , D i ] + β G[i] u i G D A u
24. ### Sensitivity analysis A i ∼ Bernoulli(p i ) logit(p i

) = α[G i , D i ] + β G[i] u i G D A u (D i = 2) ∼ Bernoulli(q i ) logit(q i ) = δ[G i ] + γ G[i] u i G D A u
25. ### A i ∼ Bernoulli(p i ) logit(p i ) =

α[G i , D i ] + β G[i] u i (D i = 2) ∼ Bernoulli(q i ) logit(q i ) = δ[G i ] + γ G[i] u i datl <- dat_sim datl\$D2 <- ifelse(datl\$D==2,1,0) datl\$N <- length(datl\$D) datl\$b <- c(1,1) datl\$g <- c(1,0) mGDu <- ulam( alist( # A model A ~ bernoulli(p), logit(p) <- a[G,D] + b[G]*u[i], matrix[G,D]:a ~ normal(0,1), # D model D2 ~ bernoulli(q), logit(q) <- delta[G] + g[G]*u[i], delta[G] ~ normal(0,1), # declare unobserved u vector[N]:u ~ normal(0,1) ), data=datl , chains=4 , cores=4 ) u j ∼ Normal(0,1)
26. ### 0.0 0.1 0.2 0.3 0.4 0.5 0 10 20 30

40 probability admission Density 0.0 0.1 0.2 0.3 0.4 0.5 0 10 20 30 40 probability admission Density Ignore confound Assume confound D2,G2 D2,G1 D1,G1 D1,G2
27. ### Sensitivity analysis What are the implications of what we don’t

know? Somewhere between pure simulation and pure analysis Vary confound strength over range and show how results change –or– vary other e ects and estimate confound strength Confounds persist — don’t pretend G D A u 0.0 0.1 0.2 0.3 0.4 0.5 0 10 20 30 40 probability admission Density
28. ### More parameters (2006) than observations (2000)! A i ∼ Bernoulli(p

i ) logit(p i ) = α[G i , D i ] + β G[i] u i (D i = 2) ∼ Bernoulli(q i ) logit(q i ) = δ[G i ] + γ G[i] u i u j ∼ Normal(0,1)

31. ### Oceanic Technology How is technological complexity related to population size?

To social structure? data(Kline) In uence of population size and contact on total tools Kline & Boyd 2010 Population size predicts technological complexity in Oceania
32. ### Oceanic Technology data(Kline) culture population contact total_tools mean_TU 1 Malekula

1100 low 13 3.2 2 Tikopia 1500 low 22 4.7 3 Santa Cruz 3600 low 24 4.0 4 Yap 4791 high 43 5.0 5 Lau Fiji 7400 high 33 5.0 6 Trobriand 8000 high 19 4.0 7 Chuuk 9200 high 40 3.8 8 Manus 13000 low 28 6.6 9 Tonga 17500 high 55 5.4 10 Hawaii 275000 low 71 6.6 Kline & Boyd 2010 Population size predicts technological complexity in Oceania

35. ### P C T L contact population tools location Adjustment set

for P: L Also want to stratify by C, to study interaction
36. ### Modeling tools Tool count is not binomial: No maximum Poisson

distribution: Very high maximum and very low probability of each success Here: Many many possible technologies, very few realized in any one place
37. ### Poisson link is log Poisson distribution takes shape from expected

value Must be positive Exponential scaling can be surprising! Y i ∼ Poisson(λ i ) log(λ i ) = α + βx i λ i = exp(α + βx i ) log(λ i ) exp
38. ### Poisson (poison) priors Exponential scaling can be surprising α ∼

Normal(0,10) log(λ i ) = α
39. ### Poisson (poison) priors Exponential scaling can be surprising α ∼

Normal(0,10) log(λ i ) = α 0 20 40 60 80 100 0.00 0.01 0.02 0.03 0.04 0.05 mean number of tools Density
40. ### 0 20 40 60 80 100 0.00 0.01 0.02 0.03

0.04 0.05 mean number of tools Density Poisson (poison) priors Exponential scaling can be surprising α ∼ Normal(0,10) log(λ i ) = α α ∼ Normal(3,0.5)
41. ### Poisson priors -2 -1 0 1 2 0 20 40

60 80 100 x value expected count α ∼ Normal(0,1) log(λ i ) = α + βx i β ∼ Normal(0,10) -2 -1 0 1 2 0 20 40 60 80 100 x value expected count α ∼ Normal(3,0.5) β ∼ Normal(0,0.2)
42. ### data(Kline) d <- Kline d\$P <- scale( log(d\$population) ) d\$contact_id

<- ifelse( d\$contact=="high" , 2 , 1 ) dat <- list( T = d\$total_tools , P = d\$P , C = d\$contact_id ) # intercept only m11.9 <- ulam( alist( T ~ dpois( lambda ), log(lambda) <- a, a ~ dnorm( 3 , 0.5 ) ), data=dat , chains=4 , log_lik=TRUE ) # interaction model m11.10 <- ulam( alist( T ~ dpois( lambda ), log(lambda) <- a[C] + b[C]*P, a[C] ~ dnorm( 3 , 0.5 ), b[C] ~ dnorm( 0 , 0.2 ) ), data=dat , chains=4 , log_lik=TRUE ) compare( m11.9 , m11.10 , func=PSIS ) Y i ∼ Poisson(λ i ) log(λ i ) = α C[i] + β C[i] log(P i ) α j ∼ Normal(3,0.5) β j ∼ Normal(0,0.2)
43. ### data(Kline) d <- Kline d\$P <- scale( log(d\$population) ) d\$contact_id

<- ifelse( d\$contact=="high" , 2 , 1 ) dat <- list( T = d\$total_tools , P = d\$P , C = d\$contact_id ) # intercept only m11.9 <- ulam( alist( T ~ dpois( lambda ), log(lambda) <- a, a ~ dnorm( 3 , 0.5 ) ), data=dat , chains=4 , log_lik=TRUE ) # interaction model m11.10 <- ulam( alist( T ~ dpois( lambda ), log(lambda) <- a[C] + b[C]*P, a[C] ~ dnorm( 3 , 0.5 ), b[C] ~ dnorm( 0 , 0.2 ) ), data=dat , chains=4 , log_lik=TRUE ) compare( m11.9 , m11.10 , func=PSIS ) Y i ∼ Poisson(λ i ) log(λ i ) = α C[i] + β C[i] log(P i ) α j ∼ Normal(3,0.5) β j ∼ Normal(0,0.2) > compare( m11.9 , m11.10 , func=PSIS ) Some Pareto k values are high (>0.5). Set pointwise=TRUE to inspect individual points. Some Pareto k values are high (>0.5). Set pointwise=TRUE to inspect individual points. PSIS SE dPSIS dSE pPSIS weight m11.10 85.9 13.50 0.0 NA 7.3 1 m11.9 141.3 33.69 55.4 33.13 8.0 0
44. ### > compare( m11.9 , m11.10 , func=PSIS ) Some Pareto

k values are high (>0.5). Set pointwise=TRUE to inspect individual points. Some Pareto k values are high (>0.5). Set pointwise=TRUE to inspect individual points. PSIS SE dPSIS dSE pPSIS weight m11.10 85.9 13.50 0.0 NA 7.3 1 m11.9 141.3 33.69 55.4 33.13 8.0 0 pPSIS is the “penalty”, the e ective number of parameters. How can a model with more parameters (m11.10) have fewer e ective parameters? m11.9 changes more when individual societies are dropped
45. ### -1 0 1 2 0 20 40 60 log population

(std) total tools Yap Trobriand Tonga Hawaii k <- PSIS( m11.10 , pointwise=TRUE )\$k plot( dat\$P , dat\$T , xlab="log population (std)" , ylab="total tools" , col=ifelse( dat\$C==1 , 4 , 2 ) , lwd=4+4*normalize(k) , ylim=c(0,75) , cex=1+normalize(k) ) # set up the horizontal axis values to compute predictions at P_seq <- seq( from=-1.4 , to=3 , len=100 ) # predictions for C=1 (low contact) lambda <- link( m11.10 , data=data.frame( P=P_seq , C=1 ) ) lmu <- apply( lambda , 2 , mean ) lci <- apply( lambda , 2 , PI ) lines( P_seq , lmu , lty=2 , lwd=1.5 ) shade( lci , P_seq , xpd=TRUE , col=col.alpha(4,0.3) ) # predictions for C=2 (high contact) lambda <- link( m11.10 , data=data.frame( P=P_seq , C=2 ) ) lmu <- apply( lambda , 2 , mean ) lci <- apply( lambda , 2 , PI ) lines( P_seq , lmu , lty=1 , lwd=1.5 ) shade( lci , P_seq , xpd=TRUE , col=col.alpha(2,0.3)) Points scaled by leverage
46. ### -1 0 1 2 0 20 40 60 log population

(std) total tools Yap Trobriand Tonga Hawaii 0 50000 150000 250000 0 20 40 60 population total tools Tonga Hawaii log scale Natural scale
47. ### 0 50000 150000 250000 0 20 40 60 population total

tools Tonga Hawaii

Tonga
49. ### is model is wack Two immediate ways to improve the

model (1) Use a robust model: gamma-Poisson (neg-binomial) (2) A principled scienti c model 0 50000 150000 250000 0 20 40 60 population total tools Tonga Hawaii

53. ### ΔT = αPβ − γT change in tools innovation rate

diminishing returns (elasticity)
54. ### ΔT = αPβ − γT change in tools innovation rate

diminishing returns (elasticity) rate of loss
55. ### ΔT = α C Pβ C − γT diminishing returns

depend upon contact innovation depends upon contact
56. ### ΔT = α C Pβ C − γT Di erence

equation: Says how tools change, not expected number What is the equilibrium?
57. ### ΔT = α C Pβ C − γT 0 10

20 30 40 50 0 2 4 6 8 10 time Tools P = 1e4 P = 1e3 f <- function(a=0.02,b=0.5,g=0.2,P=1e4,t_max=50) { T <- rep(0,t_max) for ( i in 2:t_max ) T[i] <- T[i-1] + a*P^b - g*T[i-1] return(T) } plot( NULL , xlim=c(0,50) , ylim=c(0,10) , xlab="time" , ylab="Tools" ) T <- f(P=1e3) lines( 1:50 , T , lwd=3 , col=2 ) T <- f(P=1e4) lines( 1:50 , T , lwd=3 , col=2 )
58. ### ΔT = α C Pβ C − γT = 0

Solve for equilibrium T
59. ### ΔT = α C Pβ C − γT = 0

̂ T = α C Pβ C γ Solve for equilibrium T
60. ### ̂ T = α C Pβ C γ T i

∼ Poisson(λ i ) λ i = ̂ T
61. ### # innovation/loss model dat2 <- list( T=d\$total_tools, P=d\$population, C=d\$contact_id )

m11.11 <- ulam( alist( T ~ dpois( lambda ), lambda <- exp(a[C])*P^b[C]/g, a[C] ~ dnorm(1,1), b[C] ~ dexp(1), g ~ dexp(1) ), data=dat2 , chains=4 , cores=4 ) All parameters must be positive Two ways to do this (1) use exp() (2) use appropriate prior ̂ T = α C Pβ C γ
62. ### All parameters must be positive Two ways to do this

(1) use exp() (2) use appropriate prior # innovation/loss model dat2 <- list( T=d\$total_tools, P=d\$population, C=d\$contact_id ) m11.11 <- ulam( alist( T ~ dpois( lambda ), lambda <- exp(a[C])*P^b[C]/g, a[C] ~ dnorm(1,1), b[C] ~ dexp(1), g ~ dexp(1) ), data=dat2 , chains=4 , cores=4 ) ̂ T = α C Pβ C γ > precis(m11.11,2) mean sd 5.5% 94.5% n_eff Rhat4 a[1] 0.85 0.68 -0.26 1.90 698 1 a[2] 0.93 0.83 -0.39 2.31 902 1 b[1] 0.26 0.03 0.21 0.32 1149 1 b[2] 0.29 0.10 0.12 0.45 711 1 g 1.11 0.70 0.32 2.43 862 1
63. ### 0 50000 150000 250000 0 20 40 60 population total

tools Tonga Hawaii # innovation/loss model dat2 <- list( T=d\$total_tools, P=d\$population, C=d\$contact_id ) m11.11 <- ulam( alist( T ~ dpois( lambda ), lambda <- exp(a[C])*P^b[C]/g, a[C] ~ dnorm(1,1), b[C] ~ dexp(1), g ~ dexp(1) ), data=dat2 , chains=4 , cores=4 ) ̂ T = α C Pβ C γ
64. ### 0 50000 150000 250000 0 20 40 60 population total

tools Tonga Hawaii 0 50000 150000 250000 0 20 40 60 population total tools Tonga Hawaii Innovation/loss model Generalized linear model Still have to deal with location as confound
65. ### Count GLMs Distributions from constraints Maximum entropy priors: Binomial, Poisson,

and extensions More event types: Multinomial and categorical Robust regressions: Beta-binomial, gamma-Poisson
66. ### Course Schedule Week 1 Bayesian inference Chapters 1, 2, 3

Week 2 Linear models & Causal Inference Chapter 4 Week 3 Causes, Confounds & Colliders Chapters 5 & 6 Week 4 Over tting / MCMC Chapters 7, 8, 9 Week 5 Generalized Linear Models Chapters 10, 11 Week 6 Mixtures & ordered categories Chapters 11 & 12 Week 7 Multilevel models I Chapter 13 Week 8 Multilevel models II Chapter 14 Week 9 Measurement & Missingness Chapter 15 Week 10 Generalized Linear Madness Chapter 16 https://github.com/rmcelreath/stat_rethinking_2023

68. ### Simpson’s Pandora’s Box Simpson’s paradox: Reversal of an association when

groups are combined or separated No way to know which association (combined/separated) is correct without causal assumptions In nite evils unleashed Simpson 1951. e Interpretation of Interaction in Contingency Tables The container HELD every imaginable estimate. Pandora opens a spreadsheet Simpson’s paradox plagues us to this day.
69. ### Berkeley Paradox Unconditional on department: Women admitted at lower rate

Conditional on department: Women admitted slightly more Which is correct? No way to know without assumptions dept admit reject applications gender 1 A 512 313 825 male 2 A 89 19 108 female 3 B 353 207 560 male 4 B 17 8 25 female 5 C 120 205 325 male 6 C 202 391 593 female 7 D 138 279 417 male 8 D 131 244 375 female 9 E 53 138 191 male 10 E 94 299 393 female 11 F 22 351 373 male 12 F 24 317 341 female
70. ### Berkeley Paradox Which is correct? No way to know without

assumptions Mediator (department) Collider + confound (ability) dept admit reject applications gender 1 A 512 313 825 male 2 A 89 19 108 female 3 B 353 207 560 male 4 B 17 8 25 female 5 C 120 205 325 male 6 C 202 391 593 female 7 D 138 279 417 male 8 D 131 244 375 female 9 E 53 138 191 male 10 E 94 299 393 female 11 F 22 351 373 male 12 F 24 317 341 female
71. ### X Z Y e Pipe X Z Y e Fork

X Z Y e Collider X Z Y e Descendant A
72. ### Z = 1 Z = 0 X Z Y -3

-2 -1 0 1 2 3 4 -3 -2 -1 0 1 2 3 X Y cols <- c(4,2) N <- 300 Z <- rbern(N) X <- rnorm(N,2*Z-1) Y <- rnorm(N,2*Z-1) plot( X , Y , col=cols[Z+1] , lwd=3 ) abline(lm(Y[Z==1]~X[Z==1]),col=2,lwd=3) abline(lm(Y[Z==0]~X[Z==0]),col=4,lwd=3) abline(lm(Y~X),lwd=3)
73. ### Z = 1 Z = 0 cols <- c(4,2) N

<- 300 X <- rnorm(N) Z <- rbern(N,inv_logit(X)) Y <- rnorm(N,(2*Z-1)) plot( X , Y , col=cols[Z+1] , lwd=3 ) abline(lm(Y[Z==1]~X[Z==1]),col=2,lwd=3) abline(lm(Y[Z==0]~X[Z==0]),col=4,lwd=3) abline(lm(Y~X),lwd=3) X Z Y -3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3 X Y
74. ### Z = 1 Z = 0 cols <- c(4,2) N

<- 300 X <- rnorm(N) Y <- rnorm(N) Z <- rbern(N,inv_logit(2*X+2*Y-2)) plot( X , Y , col=cols[Z+1] , lwd=3 ) abline(lm(Y[Z==1]~X[Z==1]),col=2,lwd=3) abline(lm(Y[Z==0]~X[Z==0]),col=4,lwd=3) abline(lm(Y~X),lwd=3) X Z Y -2 -1 0 1 2 3 -2 -1 0 1 2 3 X Y
75. ### Non-linear haunting In event models, e ect reversal can arise

other ways Example: Base rate di erences X Y Z treatment outcome covariate
76. ### # base rate differences erasing effect of X N <-

1000 X <- rnorm(N) Z <- rbern(N) p <- inv_logit(X + ifelse(Z==1,-1,5)) Y <- rbern(N,p) mX <- quap( alist( Y ~ dbern(p), logit(p) <- a + b*X, a ~ dnorm(0,1), b ~ dnorm(0,1) ) , data=list(Y=Y,X=X) ) mXZ <- quap( alist( Y ~ dbern(p), logit(p) <- a + b[Z]*X, a ~ dnorm(0,1), b[Z] ~ dnorm(0,1) ) , data=list(Y=Y,X=X,Z=Z+1) ) X Y Z
77. ### X Y Z # base rate differences erasing effect of

X N <- 1000 X <- rnorm(N) Z <- rbern(N) p <- inv_logit(X + ifelse(Z==1,-1,5)) Y <- rbern(N,p) mX <- quap( alist( Y ~ dbern(p), logit(p) <- a + b*X, a ~ dnorm(0,1), b ~ dnorm(0,1) ) , data=list(Y=Y,X=X) ) mXZ <- quap( alist( Y ~ dbern(p), logit(p) <- a + b[Z]*X, a ~ dnorm(0,1), b[Z] ~ dnorm(0,1) ) , data=list(Y=Y,X=X,Z=Z+1) ) logit(p i ) = α + βX i
78. ### # base rate differences erasing effect of X N <-

1000 X <- rnorm(N) Z <- rbern(N) p <- inv_logit(X + ifelse(Z==1,-1,5)) Y <- rbern(N,p) mX <- quap( alist( Y ~ dbern(p), logit(p) <- a + b*X, a ~ dnorm(0,1), b ~ dnorm(0,1) ) , data=list(Y=Y,X=X) ) mXZ <- quap( alist( Y ~ dbern(p), logit(p) <- a + b[Z]*X, a ~ dnorm(0,1), b[Z] ~ dnorm(0,1) ) , data=list(Y=Y,X=X,Z=Z+1) ) X Y Z logit(p i ) = α + βX i logit(p i ) = α + β Z[i] X i
79. ### -2 -1 0 1 2 0.0 0.2 0.4 0.6 0.8

1.0 treatment (X) outcome (Y) -2 -1 0 1 2 0.0 0.2 0.4 0.6 0.8 1.0 treatment (X) outcome (Y) logit(p i ) = α + βX i logit(p i ) = α + β Z[i] X i Z = 1 Z = 2 Z collapsed
80. ### -2 -1 0 1 2 0.0 0.2 0.4 0.6 0.8

1.0 treatment (X) outcome (Y) logit(p i ) = α + β Z[i] X i Z = 1 Z = 2 -0.5 0.0 0.5 1.0 0 1 2 3 4 beta[Z] Density
81. ### mX <- quap( alist( Y ~ dbern(p), logit(p) <- a

+ b*X, a ~ dnorm(0,1), b ~ dnorm(0,1) ) , data=list(Y=Y,X=X) ) mXZ <- quap( alist( Y ~ dbern(p), logit(p) <- a + b[Z]*X, a ~ dnorm(0,1), b[Z] ~ dnorm(0,1) ) , data=list(Y=Y,X=X,Z=Z+1) ) mXZ2 <- quap( alist( Y ~ dbern(p), logit(p) <- a[Z] + b[Z]*X, a[Z] ~ dnorm(0,1), b[Z] ~ dnorm(0,1) ) , data=list(Y=Y,X=X,Z=Z+1) ) logit(p i ) = α + βX i logit(p i ) = α + β Z[i] X i logit(p i ) = α Z[i] + β Z[i] X i
82. ### -2 -1 0 1 2 0.0 0.2 0.4 0.6 0.8

1.0 treatment (X) outcome (Y) logit(p i ) = α + β Z[i] X i Z = 1 Z = 2 logit(p i ) = α Z[i] + β Z[i] X i -2 -1 0 1 2 0.0 0.2 0.4 0.6 0.8 1.0 treatment (X) outcome (Y) Z = 1 Z = 2
83. ### -2 -1 0 1 2 0.0 0.2 0.4 0.6 0.8

1.0 treatment (X) outcome (Y) logit(p i ) = α + β Z[i] X i Z = 1 Z = 2 logit(p i ) = α Z[i] + β Z[i] X i -2 -1 0 1 2 0.0 0.2 0.4 0.6 0.8 1.0 treatment (X) outcome (Y) Z = 1 Z = 2 -1.0 -0.5 0.0 0.5 1.0 1.5 0.0 1.0 2.0 3.0 beta[Z] Density -0.5 0.0 0.5 1.0 0 1 2 3 4 beta[Z] Density
84. ### logit(p i ) = α Z[i] + β Z[i] X

i Z = 1 Z = 2 -1.0 -0.5 0.0 0.5 1.0 1.5 0.0 1.0 2.0 3.0 beta[Z] Density
85. ### Simpson’s Pandora’s Box No paradox, because almost anything can produce

it People do not have intuitions about coe cient reversals Stop naming statistical paradoxes; start teaching scienti c logic Simpson 1951. e Interpretation of Interaction in Contingency Tables The container HELD every imaginable estimate. Pandora opens a spreadsheet Simpson’s paradox plagues us to this day.