Richard McElreath
January 18, 2023
1.3k

# Statistical Rethinking 2023 - Lecture 06

January 18, 2023

## Transcript

1. Statistical Rethinking
2023

2. Avoid Being Clever At All Costs
Being clever: unreliable, opaque
Given a causal model, can use logic
to derive implications
Others can use same logic to verify
Better than clever

3. X Z Y
e Pipe
X Z Y
e Fork
X Z Y
e Collider
X Z Y
e Descendant
A

4. X Z Y
e Pipe
X Z Y
e Fork
X Z Y
e Collider
X and Y associated
unless stratify by Z
X and Y associated
unless stratify by Z
X and Y not associated
unless stratify by Z

5. X Y
U
treatment outcome
confound

6. X Y
U
treatment outcome
confound
RANDOMIZE!
R

7. X Y
U
treatment outcome
confound
randomize?
R

8. Causal inking
In an experiment, we cut causes of
the treatment
We randomize (we try at least)
So how does causal inference
without randomization ever work?
Is there a statistical procedure that
mimics randomization?
X Y
U
Without randomization
X Y
U
With randomization
do(X)

9. Causal inking
Is there a statistical procedure
that mimics randomization? X Y
U
Without randomization
P(Y|do(X)) = P(Y|?)
do(X) means intervene on X
Can analyze causal model to
X Y
U
With randomization
do(X)

10. Example: Simple Confound
X Y
U

11. Example: Simple Confound
X Y
U Non-causal path
X <– U –> Y
Close the fork!
Condition on U

12. Example: Simple Confound
X Y
U Non-causal path
X <– U –> Y
Close the fork!
Condition on U
P(Y|do(X)) =

U
P(Y|X, U)P(U) = E
U
P(Y|X, U)
“ e distribution of Y, strati ed by X and U,
averaged over the distribution of U.”

13. e causal e ect of X on Y is not (in
general) the coe cient relating X to Y
It is the distribution of Y when we change
X, averaged over the distributions of the
control variables (here U)
P(Y|do(X)) =

U
P(Y|X, U)P(U) = E
U
P(Y|X, U)
“ e distribution of Y, strati ed by X and U,
averaged over the distribution of U.”
X Y
U

14. Marginal E ects Example
B G
C
cheetahs
baboons gazelle

15. B G
C
cheetahs present

16. B G
C
B G
C
cheetahs absent
Causal e ect of baboons depends upon distribution of cheetahs
cheetahs present

17. do-calculus
For DAGs, rules for nding
P(Y|do(X)) known as do-calculus
do-calculus says what is possible
to say before picking functions
Justi es graphical analysis
Do calculus, not too much, mostly graphs

18. do-calculus
do-calculus is worst case:
allow stronger inference
do-calculus is best case:
if inference possible by do-
calculus, does not depend on
special assumptions
Judea Pearl, father of do-calculus (1966)

19. Backdoor Criterion
Backdoor Criterion is a shortcut to
applying (some) results of do-calculus
Can be performed with your eyeballs

20. Backdoor Criterion: Rule to nd a set of
variables to stratify by to yield P(Y|do(X))
(1) Identify all paths connecting the
treatment (X) to the outcome (Y)
(2) Paths with arrows entering X are
backdoor paths (non-causal paths)
(3) Find adjustment set that closes/blocks
all backdoor paths

21. (1) Identify all paths connecting the
treatment (X) to the outcome (Y)

22. (2) Paths with arrows entering X are
backdoor paths (confounding paths)

23. (3) Find a set of control variables that
close/block all backdoor paths
Block the pipe: X ⫫ U | Z
Z “knows” all of the association
between X,Y that is due to U

24. (3) Find a set of control variables that
close/block all backdoor paths
P(Y|do(X)) =

z
P(Y|X, Z)P(Z = z)
μ
i
= α + β
X
X
i
+ β
Z
Z
i
Y
i
∼ Normal(μ
i
, σ)
Block the pipe: X ⫫ U | Z

25. # simulate confounded Y
N <- 200
b_XY <- 0
b_UY <- -1
b_UZ <- -1
b_ZX <- 1
set.seed(10)
U <- rbern(N)
Z <- rnorm(N,b_UZ*U)
X <- rnorm(N,b_ZX*Z)
Y <- rnorm(N,b_XY*X+b_UY*U)
d <- list(Y=Y,X=X,Z=Z)

26. # ignore U,Z
m_YX <- quap(
alist(
Y ~ dnorm( mu , sigma ),
mu <- a + b_XY*X,
a ~ dnorm( 0 , 1 ),
b_XY ~ dnorm( 0 , 1 ),
sigma ~ dexp( 1 )
), data=d )
# stratify by Z
m_YXZ <- quap(
alist(
Y ~ dnorm( mu , sigma ),
mu <- a + b_XY*X + b_Z*Z,
a ~ dnorm( 0 , 1 ),
c(b_XY,b_Z) ~ dnorm( 0 , 1 ),
sigma ~ dexp( 1 )
), data=d )
post <- extract.samples(m_YX)
post2 <- extract.samples(m_YXZ)
dens(post\$b_XY,lwd=3,col=1,xlab="posterior
b_XY",xlim=c(-0.3,0.3))
Y|X,Z
Y|X
-0.3 -0.2 -0.1 0.0 0.1 0.2 0.3
0 2 4 6 8
posterior b_XY
Density

27. # ignore U,Z
m_YX <- quap(
alist(
Y ~ dnorm( mu , sigma ),
mu <- a + b_XY*X,
a ~ dnorm( 0 , 1 ),
b_XY ~ dnorm( 0 , 1 ),
sigma ~ dexp( 1 )
), data=d )
# stratify by Z
m_YXZ <- quap(
alist(
Y ~ dnorm( mu , sigma ),
mu <- a + b_XY*X + b_Z*Z,
a ~ dnorm( 0 , 1 ),
c(b_XY,b_Z) ~ dnorm( 0 , 1 ),
sigma ~ dexp( 1 )
), data=d )
post <- extract.samples(m_YX)
post2 <- extract.samples(m_YXZ)
dens(post\$b_XY,lwd=3,col=1,xlab="posterior
b_XY",xlim=c(-0.3,0.3))
> precis(m_YXZ)
mean sd 5.5% 94.5%
a -0.32 0.09 -0.47 -0.18
b_XY -0.01 0.08 -0.13 0.11
b_Z 0.24 0.11 0.06 0.42
sigma 1.18 0.06 1.08 1.27
Coe cient on Z means
nothing. “Table 2 Fallacy”

28. X Y
Z B
List all the paths connecting X and Y.
Which need to be closed to estimate
e ect of X on Y?
A
C

29. X Y
Z B
A
C
P(Y|do(X))
X Y
Z B
A
C

30. X Y
Z B
A
C
X Y
Z B
A
C
X Y
Z B
A
C
X Y
Z B
A
C
X Y
Z B
A
C
X Y
Z B
A
C

31. Causal path, open
X Y
Z B
A
C

32. X Y
Z B
A
C
X Y
Z B
A
C
X Y
Z B
A
C
X Y
Z B
A
C
X Y
Z B
A
C
X Y
Z B
A
C

33. Backdoor path, open
Close with C
X Y
Z B
A
C

34. X Y
Z B
A
C
X Y
Z B
A
C
X Y
Z B
A
C
X Y
Z B
A
C
X Y
Z B
A
C
X Y
Z B
A
C
C

35. X Y
Z B
A
C
X Y
Z B
A
C
X Y
Z B
A
C
X Y
Z B
A
C
X Y
Z B
A
C
X Y
Z B
A
C
C

36. Backdoor path, open
Close with Z
X Y
Z B
A
C

37. X Y
Z B
A
C
X Y
Z B
A
C
X Y
Z B
A
C
X Y
Z B
A
C
X Y
Z B
A
C
X Y
Z B
A
C
C Z

38. Backdoor path, opened by Z
A or B to close
X Y
Z B
A
C

39. X Y
Z B
A
C
X Y
Z B
A
C
X Y
Z B
A
C
X Y
Z B
A
C
X Y
Z B
A
C
X Y
Z B
A
C
C Z
A,B

40. X Y
Z B
A
C
X Y
Z B
A
C
X Y
Z B
A
C
X Y
Z B
A
C
X Y
Z B
A
C
X Y
Z B
A
C
C Z
A,B

41. Backdoor path, open
Close with A or Z
X Y
Z B
A
C

42. X Y
Z B
A
C
X Y
Z B
A
C
X Y
Z B
A
C
X Y
Z B
A
C
X Y
Z B
A
C
X Y
Z B
A
C
C Z
A,B

43. X Y
Z B
A
C
X Y
Z B
A
C
X Y
Z B
A
C
C, Z, and either A or B
(B is better choice)
C Z
A,B

44. www.dagitty.net

45. G P
U
C
grandparent
education
parent
education
child
education
unobserved
confound

46. U
G P
C
P is a mediator
Pipe: G –> P –> C

47. G P
U
C
P is a collider
Pipe: G –> P –> C
Fork: C <– U –> P

48. G P
U
C
Can estimate total
e ect of G on C
Cannot estimate
direct e ect
G P
U
C
C
i
∼ Normal(μ
i
, σ)
μ
i
= α + β
G
G
i
C
i
∼ Normal(μ
i
, σ)
μ
i
= α + β
G
G
i
+ β
P
P
i

49. Backdoor Criterion
do-calc more than backdoors &
Full Luxury Bayes: use all variables, but
in separate sub-models instead of single
regression
do-calc less demanding: nds relevant
variables; saves us having to make some
assumptions; not always a regression

50. PAUSE

“Control” variable: Variable introduced to
an analysis so that a causal estimate is
possible
Common wrong heuristics for choosing
control variables
Any variables not highly collinear
Any pre-treatment measurement (baseline)

52. X
Cinelli, Forney, Pearl 2021 A Crash Course in Good and Bad Controls
Y

53. X Y
u v
Z
Cinelli, Forney, Pearl 2021 A Crash Course in Good and Bad Controls
unobserved

54. X Y
u v
Z
Cinelli, Forney, Pearl 2021 A Crash Course in Good and Bad Controls
Health
person 1
Health
person 2
Hobbies
person 1
Hobbies
person 2
Friends

55. X Y
u v
Z
(1) List the paths

56. X Y
u v
Z
(1) List the paths
X → Y

57. X Y
u v
Z
(1) List the paths
X → Y
X ← u → Z ← v → Y

58. X Y
u v
Z
(1) List the paths
X → Y
X ← u → Z ← v → Y
frontdoor & open
backdoor & closed
(2) Find backdoors

59. X Y
u v
Z
(1) List the paths
X → Y
X ← u → Z ← v → Y
frontdoor & open
backdoor & closed
(2) Find backdoors

60. X Y
u v
Z
(1) List the paths
X → Y
X ← u → Z ← v → Y
frontdoor & open
backdoor & closed
(2) Find backdoors (3) Close backdoors

61. X Y
u v
Z
What happens if you stratify by
Z?
Opens the backdoor path
Z could be a pre-treatment
variable
Not safe to always control pre-
treatment measurements
Health
person 1
Health
person 2
Hobbies
person 1
Hobbies
person 2
Friends

62. X Y
Z
u

63. X Y
Z
u
Win
lottery
Lifespan
Happiness
Contextual
confounds

64. X Y
Z
u
X → Z → Y
X → Z ← u → Y
No backdoor, no need
to control for Z

65. X Y
Z
u
f <- function(n=100,bXZ=1,bZY=1) {
X <- rnorm(n)
u <- rnorm(n)
Z <- rnorm(n, bXZ*X + u)
Y <- rnorm(n, bZY*Z + u )
bX <- coef( lm(Y ~ X) )['X']
bXZ <- coef( lm(Y ~ X + Z) )['X']
return( c(bX,bXZ) )
}
sim <- mcreplicate( 1e4 , f() , mc.cores=8 )
dens( sim[1,] , lwd=3 , xlab="posterior mean" )
dens( sim[2,] , lwd=3 , col=2 , add=TRUE )
1 1
1 1

66. X Y
Z
u
-1.0 0.0 0.5 1.0 1.5 2.0
0.0 1.0 2.0
posterior mean
Density
f <- function(n=100,bXZ=1,bZY=1) {
X <- rnorm(n)
u <- rnorm(n)
Z <- rnorm(n, bXZ*X + u)
Y <- rnorm(n, bZY*Z + u )
bX <- coef( lm(Y ~ X) )['X']
bXZ <- coef( lm(Y ~ X + Z) )['X']
return( c(bX,bXZ) )
}
sim <- mcreplicate( 1e4 , f() , mc.cores=8 )
dens( sim[1,] , lwd=3 , xlab="posterior mean" )
dens( sim[2,] , lwd=3 , col=2 , add=TRUE )
Y ~ X
correct
Y ~ X + Z
wrong
1 1
1 1

67. X Y
Z
u
Y ~ X
correct
Y ~ X + Z
wrong
Change bZY to zero
f <- function(n=100,bXZ=1,bZY=1) {
X <- rnorm(n)
u <- rnorm(n)
Z <- rnorm(n, bXZ*X + u)
Y <- rnorm(n, bZY*Z + u )
bX <- coef( lm(Y ~ X) )['X']
bXZ <- coef( lm(Y ~ X + Z) )['X']
return( c(bX,bXZ) )
}
sim <- mcreplicate( 1e4 , f(bZY=0) , mc.cores=8 )
dens( sim[1,] , lwd=3 , xlab="posterior mean" )
dens( sim[2,] , lwd=3 , col=2 , add=TRUE )
-1.0 -0.5 0.0 0.5
0.0 1.0 2.0
posterior mean
Density
1 0
1 1

68. X Y
Z
u
X → Z → Y
X → Z ← u → Y
No backdoor, no need
to control for Z
Controlling for Z biases
treatment estimate X
Controlling for Z opens biasing
path through u
Can estimate e ect of X; Cannot
estimate mediation e ect Z
Win
lottery
Lifespan
Happiness

69. X Y
Z
u
Win
lottery
Lifespan
Happiness
Montgomery et al 2018 How Conditioning on Posttreatment Variables Can Ruin Your Experiment
Regression with confounds
Regression with post-
treatment variables

70. X Y
Z
Do not touch the collider!

71. X Y
Z u
Colliders not always so obvious

72. X Y
Z u
education
values
income
family

73. X Y
Z
Case-control bias
(selection on outcome)

74. X Y
Z
Education Occupation
Income
Case-control bias
(selection on outcome)

75. X Y Z
f <- function(n=100,bXY=1,bYZ=1) {
X <- rnorm(n)
Y <- rnorm(n, bXY*X )
Z <- rnorm(n, bYZ*Y )
bX <- coef( lm(Y ~ X) )['X']
bXZ <- coef( lm(Y ~ X + Z) )['X']
return( c(bX,bXZ) )
}
sim <- mcreplicate( 1e4 , f() , mc.cores=8 )
dens( sim[1,] , lwd=3 , xlab="posterior mean" )
dens( sim[2,] , lwd=3 , col=2 , add=TRUE )
0.0 0.5 1.0 1.5
0 1 2 3 4 5
posterior mean
Density
Y ~ X
correct
Y ~ X + Z
wrong
1 1
Case-control bias
(selection on outcome)

76. X Y
Z
“Precision parasite”
No backdoors
But still not good to
condition on Z

77. X Y
Z
“Precision parasite”
f <- function(n=100,bZX=1,bXY=1) {
Z <- rnorm(n)
X <- rnorm(n, bZX*Z )
Y <- rnorm(n, bXY*X )
bX <- coef( lm(Y ~ X) )['X']
bXZ <- coef( lm(Y ~ X + Z) )['X']
return( c(bX,bXZ) )
}
sim <- mcreplicate( 1e4 , f(n=50) , mc.cores=8 )
dens( sim[1,] , lwd=3 , xlab="posterior mean" )
dens( sim[2,] , lwd=3 , col=2 , add=TRUE )
0.6 0.8 1.0 1.2 1.4
0 1 2 3 4
posterior mean
Density
Y ~ X
correct
Y ~ X + Z
wrong

78. X Y
Z
u
“Bias ampli cation”
X and Y confounded by u
Something truly awful happens

79. f <- function(n=100,bZX=1,bXY=1) {
Z <- rnorm(n)
u <- rnorm(n)
X <- rnorm(n, bZX*Z + u )
Y <- rnorm(n, bXY*X + u )
bX <- coef( lm(Y ~ X) )['X']
bXZ <- coef( lm(Y ~ X + Z) )['X']
return( c(bX,bXZ) )
}
sim <- mcreplicate( 1e4 , f(bXY=0) , mc.cores=8 )
dens( sim[1,] , lwd=3 , xlab="posterior mean" )
dens( sim[2,] , lwd=3 , col=2 , add=TRUE )
X Y
Z
u
-0.5 0.0 0.5 1.0
0 1 2 3 4 5
posterior mean
Density
Y ~ X
biased
Y ~ X + Z
more bias
true value
is zero

80. X Y
Z
u
-0.5 0.0 0.5 1.0
0 1 2 3 4 5
posterior mean
Density
Y ~ X
biased
Y ~ X + Z
more bias
true value
is zero
WHY?
Covariation X & Y requires
variation in their causes
Within each level of Z, less
variation in X
Confound u relatively more
important within each Z

81. -5 0 5 10
-2 0 2 4
X
Y
X Y
Z
u
0
+ + +
n <- 1000
Z <- rbern(n)
u <- rnorm(n)
X <- rnorm(n, 7*Z + u )
Y <- rnorm(n, 0*X + u )
Z = 0 Z = 1

82. X Y
Z
u
education
occupation income
regional/cultural
factors

“Control” variable: Variable
introduced to an analysis so that a
causal estimate is possible
variables can be worse than omitting
Make assumptions explicit
MODEL
ALL THE
THINGS

84. Course Schedule
Week 1 Bayesian inference Chapters 1, 2, 3
Week 2 Linear models & Causal Inference Chapter 4
Week 3 Causes, Confounds & Colliders Chapters 5 & 6
Week 4 Over tting / MCMC Chapters 7, 8, 9
Week 5 Generalized Linear Models Chapters 10, 11
Week 6 Integers & Other Monsters Chapters 11 & 12
Week 7 Multilevel models I Chapter 13
Week 8 Multilevel models II Chapter 14
Week 9 Measurement & Missingness Chapter 15
Week 10 Generalized Linear Madness Chapter 16
https://github.com/rmcelreath/stat_rethinking_2023

85. BONUS

86. TABLE 2-ESTIMATED PROBIT MODELS
FOR THE USE OF A SCREEN
Finals
Preliminaries blind blind
(1) (2) (3)
(Proportion female),_ 2.744 3.120 0.490
(3.265) (3.271) (1.163)
[0.006] [0.004] [0.011]
(Proportion of orchestra -26.46 -28.13 -9.467
personnel with <6 (7.314) (8.459) (2.787)
years tenure),- 1 [-0.058] [-0.039] [-0.207]
"Big Five" orchestra 0.367
(0.452)
[0.001]
pseudo R2 0.178 0.193 0.050
Number of observations 294 294 434
who attende
the names of
For the preli
round. Anoth
the semifina
of who won
we recorded
section, prin
dition was h
individual ha
semifinal or
occurs when
above some
compete in a
recorded wh

87. Table 2 Fallacy
Not all coe cients are causal
e ects
Statistical model designed to
identify X –> Y will not also
identify e ects of control
variables
Table 2 is dangerous
Westreich & Greenland 2013 e Table 2 Fallacy
724 THE AMERICAN
EC
TABLE 2-ESTIMATED PROBIT MODELS
FOR THE USE OF A SCREEN
Finals
Preliminaries blind blind
(1) (2) (3)
(Proportion female),_ 2.744 3.120 0.490
(3.265) (3.271) (1.163)
[0.006] [0.004] [0.011]
(Proportion of orchestra -26.46 -28.13 -9.467
personnel with <6 (7.314) (8.459) (2.787)
years tenure),- 1 [-0.058] [-0.039] [-0.207]
"Big Five" orchestra 0.367
(0.452)
[0.001]
pseudo R2 0.178 0.193 0.050
Number of observations 294 294 434
Notes: The dependent variable is 1 if the orchestra adopts a
screen, 0 otherwise. Huber standard errors (with orchestra
random effects) are in parentheses. All specifications in-
clude a constant. Changes in probabilities are in brackets.

88. A
X Y
S
Westreich & Greenland 2013 e Table 2 Fallacy
Stroke
HIV
Smoking
Age

89. Use Backdoor Criterion
A
X Y
S

90. Use Backdoor Criterion
A
X Y
S X Y

91. Use Backdoor Criterion
A
X Y
S X Y
X Y
S

92. Use Backdoor Criterion
A
X Y
S X Y
X Y
S
A
X Y

93. Use Backdoor Criterion
A
X Y
S X Y
X Y
S
A
X Y
A
X Y
S

94. Use Backdoor Criterion
A
X Y
S X Y
X Y
S
A
X Y
A
X Y
S

95. Y
i
∼ Normal(μ
i
, σ)
μ
i
= α + β
X
X
i
+ β
S
S
i
+ β
A
A
i
A
X Y
S

96. A
X Y
S
Confounded by A
and S
Unconditional
X

97. Coe cient for X:
E ect of X on Y
(still must
marginalize!)
A
X Y
S
Confounded by A
and S
A
X Y
S
Unconditional Conditional on A and S
X

98. A
X Y
S
E ect of S
confounded by A
Unconditional
S

99. Coe cient for S:
Direct e ect of S on Y
A
X Y
S
E ect of S
confounded by A
Unconditional Conditional on A and X
A
X Y
S
S

100. A
X Y
S
Total causal e ect
of A on Y ows
through all paths
Unconditional
A

101. Coe cient for A:
Direct e ect of A on Y
A
X Y
S
Total causal e ect
of A on Y ows
through all paths
Unconditional Conditional on X and S
A
X Y
S
A

102. A
X Y
S
Stroke
HIV
Smoking
Age
u
unobserved
confound

103. Table 2 Fallacy
Not all coe cients created equal
So do not present them as equal
Options:
Do not present control coe cients
Give explicit interpretation of each
No interpretation without causal
representation
A
X Y
S
u