770

# Statistical Rethinking 2023 - Lecture 10 ## Richard McElreath

February 01, 2023

## Transcript

1. Statistical Rethinking
10. Counts & Hidden Confounds
2023

2. Generalized Linear Models
Linear Models: Expected value is
parameters
Generalized Linear Models:
Expected value is some function of
parameters
Y
i
∼ Normal(μ
i
, σ)
μ
i
= α + β
X
X
i
+ β
Z
Z
i
Y
i
∼ Bernoulli(p
i
)
f(p
i
) = α + β
X
X
i
+ β
Z
Z
i
f(p
i
)

3. logit(p
i
) = α + βx
i
Generalized Linear Models:
Expected value is some function
parameters
Uniform changes in predictor not
uniform changes in prediction
All predictor variables interact,
moderate one another
In uences predictions &
uncertainty of predictions

G
D
A
gender
department

G
D
A
gender
department
uability

6. G
D
A
u
set.seed(12)
N <- 2000 # number of applicants
# even gender distribution
G <- sample( 1:2 , size=N , replace=TRUE )
# sample ability, high (1) to average (0)
u <- rbern(N,0.1)
# gender 1 tends to apply to department 1, 2 to 2
# and G=1 with greater ability tend to apply to 2 as well
D <- rbern( N , ifelse( G==1 , u*1 , 0.75 ) ) + 1
# matrix of acceptance rates [dept,gender]
p_u0 <- matrix( c(0.1,0.1,0.1,0.3) , nrow=2 )
p_u1 <- matrix( c(0.3,0.3,0.5,0.5) , nrow=2 )
p_u <- list( p_u0 , p_u1 )
# simulate acceptance
p <- sapply( 1:N , function(i) p_u[[1+u[i]]][D[i],G[i]] )
A <- rbern( N , p )

7. set.seed(12)
N <- 2000 # number of applicants
# even gender distribution
G <- sample( 1:2 , size=N , replace=TRUE )
# sample ability, high (1) to average (0)
u <- rbern(N,0.1)
# gender 1 tends to apply to department 1, 2 to 2
# and G=1 with greater ability tend to apply to 2 as well
D <- rbern( N , ifelse( G==1 , u*1 , 0.75 ) ) + 1
# matrix of acceptance rates [dept,gender]
p_u0 <- matrix( c(0.1,0.1,0.1,0.3) , nrow=2 )
p_u1 <- matrix( c(0.3,0.3,0.5,0.5) , nrow=2 )
p_u <- list( p_u0 , p_u1 )
# simulate acceptance
p <- sapply( 1:N , function(i) p_u[[1+u[i]]][D[i],G[i]] )
A <- rbern( N , p )
G
D
A
u

8. set.seed(12)
N <- 2000 # number of applicants
# even gender distribution
G <- sample( 1:2 , size=N , replace=TRUE )
# sample ability, high (1) to average (0)
u <- rbern(N,0.1)
# gender 1 tends to apply to department 1, 2 to 2
# and G=1 with greater ability tend to apply to 2 as well
D <- rbern( N , ifelse( G==1 , u*1 , 0.75 ) ) + 1
# matrix of acceptance rates [dept,gender]
p_u0 <- matrix( c(0.1,0.1,0.1,0.3) , nrow=2 )
p_u1 <- matrix( c(0.3,0.3,0.5,0.5) , nrow=2 )
p_u <- list( p_u0 , p_u1 )
# simulate acceptance
p <- sapply( 1:N , function(i) p_u[[1+u[i]]][D[i],G[i]] )
A <- rbern( N , p )
G
D
A
u

9. set.seed(12)
N <- 2000 # number of applicants
# even gender distribution
G <- sample( 1:2 , size=N , replace=TRUE )
# sample ability, high (1) to average (0)
u <- rbern(N,0.1)
# gender 1 tends to apply to department 1, 2 to 2
# and G=1 with greater ability tend to apply to 2 as well
D <- rbern( N , ifelse( G==1 , u*1 , 0.75 ) ) + 1
# matrix of acceptance rates [dept,gender]
p_u0 <- matrix( c(0.1,0.1,0.1,0.3) , nrow=2 )
p_u1 <- matrix( c(0.3,0.3,0.5,0.5) , nrow=2 )
p_u <- list( p_u0 , p_u1 )
# simulate acceptance
p <- sapply( 1:N , function(i) p_u[[1+u[i]]][D[i],G[i]] )
A <- rbern( N , p )
G
D
A
u
> table(G,D,u)
, , u = 0
D
G 1 2
1 908 0
2 220 645
, , u = 1
D
G 1 2
1 0 114
2 28 85

10. set.seed(12)
N <- 2000 # number of applicants
# even gender distribution
G <- sample( 1:2 , size=N , replace=TRUE )
# sample ability, high (1) to average (0)
u <- rbern(N,0.1)
# gender 1 tends to apply to department 1, 2 to 2
# and G=1 with greater ability tend to apply to 2 as well
D <- rbern( N , ifelse( G==1 , u*1 , 0.75 ) ) + 1
# matrix of acceptance rates [dept,gender]
p_u0 <- matrix( c(0.1,0.1,0.1,0.3) , nrow=2 )
p_u1 <- matrix( c(0.3,0.3,0.5,0.5) , nrow=2 )
p_u <- list( p_u0 , p_u1 )
# simulate acceptance
p <- sapply( 1:N , function(i) p_u[[1+u[i]]][D[i],G[i]] )
A <- rbern( N , p )
G
D
A
u
> p_u0
[,1] [,2]
[1,] 0.1 0.1
[2,] 0.1 0.3
> p_u1
[,1] [,2]
[1,] 0.3 0.5
[2,] 0.3 0.5
Department 2 discriminates
against gender 1

11. dat_sim <- list( A=A , D=D , G=G )
# total effect gender
m1 <- ulam(
alist(
A ~ bernoulli(p),
logit(p) <- a[G],
a[G] ~ normal(0,1)
), data=dat_sim , chains=4 , cores=4 )
# direct effects - now confounded!
m2 <- ulam(
alist(
A ~ bernoulli(p),
logit(p) <- a[G,D],
matrix[G,D]:a ~ normal(0,1)
), data=dat_sim , chains=4 , cores=4 )
> precis(m1,depth=3)
mean sd 5.5% 94.5% n_eff Rhat4
a -1.82 0.09 -1.96 -1.67 1483 1
a -0.89 0.07 -1.00 -0.78 1562 1
> precis(m2,depth=3)
mean sd 5.5% 94.5% n_eff Rhat4
a[1,1] -2.05 0.11 -2.23 -1.89 2482 1
a[1,2] -0.66 0.18 -0.96 -0.38 2369 1
a[2,1] -1.77 0.19 -2.06 -1.47 2278 1
a[2,2] -0.68 0.08 -0.81 -0.56 2210 1
direct e ect confounded

12. post2 <- extract.samples(m2)
dens(inv_logit(post2\$a[,1,1]),lwd=3,col=4,xli
dens(inv_logit(post2\$a[,2,1]),lwd=3,col=4,lty
=TRUE)
dens(inv_logit(post2\$a[,2,2]),lwd=3,col=2,lty
> precis(m1,depth=3)
mean sd 5.5% 94.5% n_eff Rhat4
a -1.82 0.09 -1.96 -1.67 1483 1
a -0.89 0.07 -1.00 -0.78 1562 1
direct e ect confounded
D2,G2
D2,G1
0.0 0.1 0.2 0.3 0.4 0.5
0 10 20 30 40
Density
D1,G1
D1,G2

13. You guessed it: Collider bias
Stratifying by D opens non-
causal path through u
Can estimate total causal e ect
of G, but isn’t what we want
Cannot estimate direct e ect
of D or G
Worst place in the world, Collider Bias country

14. You guessed it: Collider bias
More intuitive explanation:
High ability G1s apply to
discriminatory department anyway
G1s in that department are higher
ability on average than G2s
High ability compensates for
G
D
A
u

15. https://arxiv.org/abs/2207.13665

16. Citation networks
Citation networks of members of
NAS
Women associated with lower
G
C
gender
citations
Lerman et al 2022 Gendered citation patterns among the scienti c elite

17. Membership
Elections to NAS
Women associated with 3–15 times
higher election rate, controlling for
citations
gender
Card et al 2022 Gender Gaps at the Academies
NAS member
citations
G M
C

18. gender NAS member
citations
G M
C
q quality (hidden)

19. gender NAS member
citations
G M
C
q quality (hidden)
Restrict sample to NAS members, examine citations
If men less likely to be elected, then must
have higher q,C to compensate

20. gender NAS member
citations
G M
C
q quality (hidden)
Control for citations, examine elections to NAS
G is “treatment”. C is a post-treatment variable!
If women less likely to be cited (bias), then
women more likely to be elected because they
have higher q than indicated by C

21. No causes in; No causes out
Hyped papers with vague estimands,
Policy design through collider bias?
We can do better
Strong assumptions required
Qualitative data useful
G M
C
q

22. Sensitivity analysis
What are the implications of what
we don’t know?
Assume confound exists, model its
consequences for di erent
strengths/kinds of in uence
How strong must the confound be
to change conclusions?
G
D
A
u

23. Sensitivity analysis
A
i
∼ Bernoulli(p
i
)
logit(p
i
) = α[G
i
, D
i
] + β
G[i]
u
i
G
D
A
u

24. Sensitivity analysis
A
i
∼ Bernoulli(p
i
)
logit(p
i
) = α[G
i
, D
i
] + β
G[i]
u
i
G
D
A
u
(D
i
= 2) ∼ Bernoulli(q
i
)
logit(q
i
) = δ[G
i
] + γ
G[i]
u
i
G
D
A
u

25. A
i
∼ Bernoulli(p
i
)
logit(p
i
) = α[G
i
, D
i
] + β
G[i]
u
i
(D
i
= 2) ∼ Bernoulli(q
i
)
logit(q
i
) = δ[G
i
] + γ
G[i]
u
i
datl <- dat_sim
datl\$D2 <- ifelse(datl\$D==2,1,0)
datl\$N <- length(datl\$D)
datl\$b <- c(1,1)
datl\$g <- c(1,0)
mGDu <- ulam(
alist(
# A model
A ~ bernoulli(p),
logit(p) <- a[G,D] + b[G]*u[i],
matrix[G,D]:a ~ normal(0,1),
# D model
D2 ~ bernoulli(q),
logit(q) <- delta[G] + g[G]*u[i],
delta[G] ~ normal(0,1),
# declare unobserved u
vector[N]:u ~ normal(0,1)
), data=datl , chains=4 , cores=4 )
u
j
∼ Normal(0,1)

26. 0.0 0.1 0.2 0.3 0.4 0.5
0 10 20 30 40
Density
0.0 0.1 0.2 0.3 0.4 0.5
0 10 20 30 40
Density
Ignore
confound
Assume
confound
D2,G2
D2,G1
D1,G1
D1,G2

27. Sensitivity analysis
What are the implications of what we
don’t know?
Somewhere between pure simulation
and pure analysis
Vary confound strength over range and
show how results change –or– vary other
e ects and estimate confound strength
Confounds persist — don’t pretend
G
D
A
u
0.0 0.1 0.2 0.3 0.4 0.5
0 10 20 30 40
Density

28. More parameters (2006)
than observations (2000)!
A
i
∼ Bernoulli(p
i
)
logit(p
i
) = α[G
i
, D
i
] + β
G[i]
u
i
(D
i
= 2) ∼ Bernoulli(q
i
)
logit(q
i
) = δ[G
i
] + γ
G[i]
u
i
u
j
∼ Normal(0,1)

29. Charlie Chaplin, Modern Times (1936)

30. PAUSE

31. Oceanic Technology
How is technological
complexity related to
population size?
To social structure?
data(Kline)
In uence of population size
and contact on total tools
Kline & Boyd 2010 Population size predicts technological complexity in Oceania

32. Oceanic Technology
data(Kline)
culture population contact total_tools mean_TU
1 Malekula 1100 low 13 3.2
2 Tikopia 1500 low 22 4.7
3 Santa Cruz 3600 low 24 4.0
4 Yap 4791 high 43 5.0
5 Lau Fiji 7400 high 33 5.0
6 Trobriand 8000 high 19 4.0
7 Chuuk 9200 high 40 3.8
8 Manus 13000 low 28 6.6
9 Tonga 17500 high 55 5.4
10 Hawaii 275000 low 71 6.6
Kline & Boyd 2010 Population size predicts technological complexity in Oceania

33. Tools
Innovations
Population Loss
Contact

34. P
C
T
L
contact
population tools
location

35. P
C
T
L
contact
population tools
location
Also want to stratify by C,
to study interaction

36. Modeling tools
Tool count is not binomial: No maximum
Poisson distribution: Very high maximum and very low
probability of each success
Here: Many many possible technologies, very few realized in
any one place

Poisson distribution takes shape
from expected value
Must be positive
Exponential scaling can be
surprising!
Y
i
∼ Poisson(λ
i
)
log(λ
i
) = α + βx
i
λ
i
= exp(α + βx
i
)
log(λ
i
)
exp

38. Poisson (poison) priors
Exponential scaling can be surprising
α ∼ Normal(0,10)
log(λ
i
) = α

39. Poisson (poison) priors
Exponential scaling can be surprising
α ∼ Normal(0,10)
log(λ
i
) = α
0 20 40 60 80 100
0.00 0.01 0.02 0.03 0.04 0.05
mean number of tools
Density

40. 0 20 40 60 80 100
0.00 0.01 0.02 0.03 0.04 0.05
mean number of tools
Density
Poisson (poison) priors
Exponential scaling can be surprising
α ∼ Normal(0,10)
log(λ
i
) = α
α ∼ Normal(3,0.5)

41. Poisson priors
-2 -1 0 1 2
0 20 40 60 80 100
x value
expected count
α ∼ Normal(0,1)
log(λ
i
) = α + βx
i
β ∼ Normal(0,10)
-2 -1 0 1 2
0 20 40 60 80 100
x value
expected count
α ∼ Normal(3,0.5)
β ∼ Normal(0,0.2)

42. data(Kline)
d <- Kline
d\$P <- scale( log(d\$population) )
d\$contact_id <- ifelse( d\$contact=="high" , 2 , 1 )
dat <- list(
T = d\$total_tools ,
P = d\$P ,
C = d\$contact_id )
# intercept only
m11.9 <- ulam(
alist(
T ~ dpois( lambda ),
log(lambda) <- a,
a ~ dnorm( 3 , 0.5 )
), data=dat , chains=4 , log_lik=TRUE )
# interaction model
m11.10 <- ulam(
alist(
T ~ dpois( lambda ),
log(lambda) <- a[C] + b[C]*P,
a[C] ~ dnorm( 3 , 0.5 ),
b[C] ~ dnorm( 0 , 0.2 )
), data=dat , chains=4 , log_lik=TRUE )
compare( m11.9 , m11.10 , func=PSIS )
Y
i
∼ Poisson(λ
i
)
log(λ
i
) = α
C[i]
+ β
C[i]
log(P
i
)
α
j
∼ Normal(3,0.5)
β
j
∼ Normal(0,0.2)

43. data(Kline)
d <- Kline
d\$P <- scale( log(d\$population) )
d\$contact_id <- ifelse( d\$contact=="high" , 2 , 1 )
dat <- list(
T = d\$total_tools ,
P = d\$P ,
C = d\$contact_id )
# intercept only
m11.9 <- ulam(
alist(
T ~ dpois( lambda ),
log(lambda) <- a,
a ~ dnorm( 3 , 0.5 )
), data=dat , chains=4 , log_lik=TRUE )
# interaction model
m11.10 <- ulam(
alist(
T ~ dpois( lambda ),
log(lambda) <- a[C] + b[C]*P,
a[C] ~ dnorm( 3 , 0.5 ),
b[C] ~ dnorm( 0 , 0.2 )
), data=dat , chains=4 , log_lik=TRUE )
compare( m11.9 , m11.10 , func=PSIS )
Y
i
∼ Poisson(λ
i
)
log(λ
i
) = α
C[i]
+ β
C[i]
log(P
i
)
α
j
∼ Normal(3,0.5)
β
j
∼ Normal(0,0.2)
> compare( m11.9 , m11.10 , func=PSIS )
Some Pareto k values are high (>0.5). Set pointwise=TRUE to inspect individual points.
Some Pareto k values are high (>0.5). Set pointwise=TRUE to inspect individual points.
PSIS SE dPSIS dSE pPSIS weight
m11.10 85.9 13.50 0.0 NA 7.3 1
m11.9 141.3 33.69 55.4 33.13 8.0 0

44. > compare( m11.9 , m11.10 , func=PSIS )
Some Pareto k values are high (>0.5). Set pointwise=TRUE to inspect individual points.
Some Pareto k values are high (>0.5). Set pointwise=TRUE to inspect individual points.
PSIS SE dPSIS dSE pPSIS weight
m11.10 85.9 13.50 0.0 NA 7.3 1
m11.9 141.3 33.69 55.4 33.13 8.0 0
pPSIS is the “penalty”, the e ective number of parameters.
How can a model with more parameters (m11.10) have fewer
e ective parameters?
m11.9 changes more when individual societies are dropped

45. -1 0 1 2
0 20 40 60
log population (std)
total tools
Yap
Trobriand
Tonga
Hawaii
k <- PSIS( m11.10 , pointwise=TRUE )\$k
plot( dat\$P , dat\$T , xlab="log population (std)" ,
ylab="total tools" ,
col=ifelse( dat\$C==1 , 4 , 2 ) , lwd=4+4*normalize(k) ,
ylim=c(0,75) , cex=1+normalize(k) )
# set up the horizontal axis values to compute predictions
at
P_seq <- seq( from=-1.4 , to=3 , len=100 )
# predictions for C=1 (low contact)
lambda <- link( m11.10 , data=data.frame( P=P_seq , C=1 ) )
lmu <- apply( lambda , 2 , mean )
lci <- apply( lambda , 2 , PI )
lines( P_seq , lmu , lty=2 , lwd=1.5 )
shade( lci , P_seq , xpd=TRUE , col=col.alpha(4,0.3) )
# predictions for C=2 (high contact)
lambda <- link( m11.10 , data=data.frame( P=P_seq , C=2 ) )
lmu <- apply( lambda , 2 , mean )
lci <- apply( lambda , 2 , PI )
lines( P_seq , lmu , lty=1 , lwd=1.5 )
shade( lci , P_seq , xpd=TRUE , col=col.alpha(2,0.3))
Points scaled by leverage

46. -1 0 1 2
0 20 40 60
log population (std)
total tools
Yap
Trobriand
Tonga
Hawaii
0 50000 150000 250000
0 20 40 60
population
total tools
Tonga
Hawaii
log scale Natural scale

47. 0 50000 150000 250000
0 20 40 60
population
total tools
Tonga
Hawaii

48. 0 50000 150000 250000
0 20 40 6
total tools
Tonga

49. is model is wack
Two immediate ways to improve the
model
(1) Use a robust model:
gamma-Poisson (neg-binomial)
(2) A principled scienti c model
0 50000 150000 250000
0 20 40 60
population
total tools
Tonga
Hawaii

50. ΔT = αPβ − γT

51. ΔT = αPβ − γT
change in tools

52. ΔT = αPβ − γT
change in tools
innovation
rate

53. ΔT = αPβ − γT
change in tools
innovation
rate
diminishing returns
(elasticity)

54. ΔT = αPβ − γT
change in tools
innovation
rate
diminishing returns
(elasticity)
rate of loss

55. ΔT = α
C

C
− γT
diminishing returns
depend upon contact
innovation depends
upon contact

56. ΔT = α
C

C
− γT
Di erence equation: Says how tools
change, not expected number
What is the equilibrium?

57. ΔT = α
C

C
− γT
0 10 20 30 40 50
0 2 4 6 8 10
time
Tools
P = 1e4
P = 1e3
f <- function(a=0.02,b=0.5,g=0.2,P=1e4,t_max=50) {
T <- rep(0,t_max)
for ( i in 2:t_max )
T[i] <- T[i-1] + a*P^b - g*T[i-1]
return(T)
}
plot( NULL , xlim=c(0,50) , ylim=c(0,10) ,
xlab="time" , ylab="Tools" )
T <- f(P=1e3)
lines( 1:50 , T , lwd=3 , col=2 )
T <- f(P=1e4)
lines( 1:50 , T , lwd=3 , col=2 )

58. ΔT = α
C

C
− γT = 0
Solve for equilibrium T

59. ΔT = α
C

C
− γT = 0
̂
T =
α
C

C
γ
Solve for equilibrium T

60. ̂
T =
α
C

C
γ
T
i
∼ Poisson(λ
i
)
λ
i
= ̂
T

61. # innovation/loss model
dat2 <- list( T=d\$total_tools, P=d\$population,
C=d\$contact_id )
m11.11 <- ulam(
alist(
T ~ dpois( lambda ),
lambda <- exp(a[C])*P^b[C]/g,
a[C] ~ dnorm(1,1),
b[C] ~ dexp(1),
g ~ dexp(1)
), data=dat2 , chains=4 , cores=4 )
All parameters must be positive
Two ways to do this
(1) use exp()
(2) use appropriate prior
̂
T =
α
C

C
γ

62. All parameters must be positive
Two ways to do this
(1) use exp()
(2) use appropriate prior
# innovation/loss model
dat2 <- list( T=d\$total_tools, P=d\$population,
C=d\$contact_id )
m11.11 <- ulam(
alist(
T ~ dpois( lambda ),
lambda <- exp(a[C])*P^b[C]/g,
a[C] ~ dnorm(1,1),
b[C] ~ dexp(1),
g ~ dexp(1)
), data=dat2 , chains=4 , cores=4 )
̂
T =
α
C

C
γ
> precis(m11.11,2)
mean sd 5.5% 94.5% n_eff Rhat4
a 0.85 0.68 -0.26 1.90 698 1
a 0.93 0.83 -0.39 2.31 902 1
b 0.26 0.03 0.21 0.32 1149 1
b 0.29 0.10 0.12 0.45 711 1
g 1.11 0.70 0.32 2.43 862 1

63. 0 50000 150000 250000
0 20 40 60
population
total tools
Tonga
Hawaii
# innovation/loss model
dat2 <- list( T=d\$total_tools, P=d\$population,
C=d\$contact_id )
m11.11 <- ulam(
alist(
T ~ dpois( lambda ),
lambda <- exp(a[C])*P^b[C]/g,
a[C] ~ dnorm(1,1),
b[C] ~ dexp(1),
g ~ dexp(1)
), data=dat2 , chains=4 , cores=4 )
̂
T =
α
C

C
γ

64. 0 50000 150000 250000
0 20 40 60
population
total tools
Tonga
Hawaii
0 50000 150000 250000
0 20 40 60
population
total tools
Tonga
Hawaii
Innovation/loss model Generalized linear model
Still have to deal with location as confound

65. Count GLMs
Distributions from constraints
Maximum entropy priors: Binomial,
Poisson, and extensions
More event types: Multinomial and
categorical
Robust regressions:
Beta-binomial, gamma-Poisson

66. Course Schedule
Week 1 Bayesian inference Chapters 1, 2, 3
Week 2 Linear models & Causal Inference Chapter 4
Week 3 Causes, Confounds & Colliders Chapters 5 & 6
Week 4 Over tting / MCMC Chapters 7, 8, 9
Week 5 Generalized Linear Models Chapters 10, 11
Week 6 Mixtures & ordered categories Chapters 11 & 12
Week 7 Multilevel models I Chapter 13
Week 8 Multilevel models II Chapter 14
Week 9 Measurement & Missingness Chapter 15
Week 10 Generalized Linear Madness Chapter 16
https://github.com/rmcelreath/stat_rethinking_2023

67. BONUS

68. Simpson’s Pandora’s Box
association when groups are
combined or separated
No way to know which association
(combined/separated) is correct
without causal assumptions
In nite evils unleashed
Simpson 1951. e Interpretation of Interaction in Contingency Tables
The container HELD
every imaginable
estimate.
plagues us to this day.

Unconditional on department:
Conditional on department:
Which is correct? No way to
know without assumptions
1 A 512 313 825 male
2 A 89 19 108 female
3 B 353 207 560 male
4 B 17 8 25 female
5 C 120 205 325 male
6 C 202 391 593 female
7 D 138 279 417 male
8 D 131 244 375 female
9 E 53 138 191 male
10 E 94 299 393 female
11 F 22 351 373 male
12 F 24 317 341 female

Which is correct? No way to
know without assumptions
Mediator (department)
Collider + confound (ability)
1 A 512 313 825 male
2 A 89 19 108 female
3 B 353 207 560 male
4 B 17 8 25 female
5 C 120 205 325 male
6 C 202 391 593 female
7 D 138 279 417 male
8 D 131 244 375 female
9 E 53 138 191 male
10 E 94 299 393 female
11 F 22 351 373 male
12 F 24 317 341 female

71. X Z Y
e Pipe
X Z Y
e Fork
X Z Y
e Collider
X Z Y
e Descendant
A

72. Z = 1
Z = 0
X Z Y
-3 -2 -1 0 1 2 3 4
-3 -2 -1 0 1 2 3
X
Y
cols <- c(4,2)
N <- 300
Z <- rbern(N)
X <- rnorm(N,2*Z-1)
Y <- rnorm(N,2*Z-1)
plot( X , Y , col=cols[Z+1] , lwd=3 )
abline(lm(Y[Z==1]~X[Z==1]),col=2,lwd=3)
abline(lm(Y[Z==0]~X[Z==0]),col=4,lwd=3)
abline(lm(Y~X),lwd=3)

73. Z = 1
Z = 0
cols <- c(4,2)
N <- 300
X <- rnorm(N)
Z <- rbern(N,inv_logit(X))
Y <- rnorm(N,(2*Z-1))
plot( X , Y , col=cols[Z+1] , lwd=3 )
abline(lm(Y[Z==1]~X[Z==1]),col=2,lwd=3)
abline(lm(Y[Z==0]~X[Z==0]),col=4,lwd=3)
abline(lm(Y~X),lwd=3)
X Z Y
-3 -2 -1 0 1 2 3
-3 -2 -1 0 1 2 3
X
Y

74. Z = 1
Z = 0
cols <- c(4,2)
N <- 300
X <- rnorm(N)
Y <- rnorm(N)
Z <- rbern(N,inv_logit(2*X+2*Y-2))
plot( X , Y , col=cols[Z+1] , lwd=3 )
abline(lm(Y[Z==1]~X[Z==1]),col=2,lwd=3)
abline(lm(Y[Z==0]~X[Z==0]),col=4,lwd=3)
abline(lm(Y~X),lwd=3)
X Z Y
-2 -1 0 1 2 3
-2 -1 0 1 2 3
X
Y

75. Non-linear haunting
In event models, e ect reversal
can arise other ways
Example: Base rate di erences
X Y
Z
treatment outcome
covariate

76. # base rate differences erasing effect of X
N <- 1000
X <- rnorm(N)
Z <- rbern(N)
p <- inv_logit(X + ifelse(Z==1,-1,5))
Y <- rbern(N,p)
mX <- quap(
alist(
Y ~ dbern(p),
logit(p) <- a + b*X,
a ~ dnorm(0,1),
b ~ dnorm(0,1)
) , data=list(Y=Y,X=X) )
mXZ <- quap(
alist(
Y ~ dbern(p),
logit(p) <- a + b[Z]*X,
a ~ dnorm(0,1),
b[Z] ~ dnorm(0,1)
) , data=list(Y=Y,X=X,Z=Z+1) )
X Y
Z

77. X Y
Z
# base rate differences erasing effect of X
N <- 1000
X <- rnorm(N)
Z <- rbern(N)
p <- inv_logit(X + ifelse(Z==1,-1,5))
Y <- rbern(N,p)
mX <- quap(
alist(
Y ~ dbern(p),
logit(p) <- a + b*X,
a ~ dnorm(0,1),
b ~ dnorm(0,1)
) , data=list(Y=Y,X=X) )
mXZ <- quap(
alist(
Y ~ dbern(p),
logit(p) <- a + b[Z]*X,
a ~ dnorm(0,1),
b[Z] ~ dnorm(0,1)
) , data=list(Y=Y,X=X,Z=Z+1) )
logit(p
i
) = α + βX
i

78. # base rate differences erasing effect of X
N <- 1000
X <- rnorm(N)
Z <- rbern(N)
p <- inv_logit(X + ifelse(Z==1,-1,5))
Y <- rbern(N,p)
mX <- quap(
alist(
Y ~ dbern(p),
logit(p) <- a + b*X,
a ~ dnorm(0,1),
b ~ dnorm(0,1)
) , data=list(Y=Y,X=X) )
mXZ <- quap(
alist(
Y ~ dbern(p),
logit(p) <- a + b[Z]*X,
a ~ dnorm(0,1),
b[Z] ~ dnorm(0,1)
) , data=list(Y=Y,X=X,Z=Z+1) )
X Y
Z
logit(p
i
) = α + βX
i
logit(p
i
) = α + β
Z[i]
X
i

79. -2 -1 0 1 2
0.0 0.2 0.4 0.6 0.8 1.0
treatment (X)
outcome (Y)
-2 -1 0 1 2
0.0 0.2 0.4 0.6 0.8 1.0
treatment (X)
outcome (Y)
logit(p
i
) = α + βX
i
logit(p
i
) = α + β
Z[i]
X
i
Z = 1
Z = 2
Z collapsed

80. -2 -1 0 1 2
0.0 0.2 0.4 0.6 0.8 1.0
treatment (X)
outcome (Y)
logit(p
i
) = α + β
Z[i]
X
i
Z = 1
Z = 2
-0.5 0.0 0.5 1.0
0 1 2 3 4
beta[Z]
Density

81. mX <- quap(
alist(
Y ~ dbern(p),
logit(p) <- a + b*X,
a ~ dnorm(0,1),
b ~ dnorm(0,1)
) , data=list(Y=Y,X=X) )
mXZ <- quap(
alist(
Y ~ dbern(p),
logit(p) <- a + b[Z]*X,
a ~ dnorm(0,1),
b[Z] ~ dnorm(0,1)
) , data=list(Y=Y,X=X,Z=Z+1) )
mXZ2 <- quap(
alist(
Y ~ dbern(p),
logit(p) <- a[Z] + b[Z]*X,
a[Z] ~ dnorm(0,1),
b[Z] ~ dnorm(0,1)
) , data=list(Y=Y,X=X,Z=Z+1) )
logit(p
i
) = α + βX
i
logit(p
i
) = α + β
Z[i]
X
i
logit(p
i
) = α
Z[i]
+ β
Z[i]
X
i

82. -2 -1 0 1 2
0.0 0.2 0.4 0.6 0.8 1.0
treatment (X)
outcome (Y)
logit(p
i
) = α + β
Z[i]
X
i
Z = 1
Z = 2
logit(p
i
) = α
Z[i]
+ β
Z[i]
X
i
-2 -1 0 1 2
0.0 0.2 0.4 0.6 0.8 1.0
treatment (X)
outcome (Y)
Z = 1
Z = 2

83. -2 -1 0 1 2
0.0 0.2 0.4 0.6 0.8 1.0
treatment (X)
outcome (Y)
logit(p
i
) = α + β
Z[i]
X
i
Z = 1
Z = 2
logit(p
i
) = α
Z[i]
+ β
Z[i]
X
i
-2 -1 0 1 2
0.0 0.2 0.4 0.6 0.8 1.0
treatment (X)
outcome (Y)
Z = 1
Z = 2
-1.0 -0.5 0.0 0.5 1.0 1.5
0.0 1.0 2.0 3.0
beta[Z]
Density
-0.5 0.0 0.5 1.0
0 1 2 3 4
beta[Z]
Density

84. logit(p
i
) = α
Z[i]
+ β
Z[i]
X
i
Z = 1
Z = 2
-1.0 -0.5 0.0 0.5 1.0 1.5
0.0 1.0 2.0 3.0
beta[Z]
Density

85. Simpson’s Pandora’s Box