Statistical Rethinking 2023 - Lecture 06

Slide 1

Slide 1 text

Statistical Rethinking 6. Good & Bad Controls 2023

Slide 2

Slide 2 text

Avoid Being Clever At All Costs Being clever: unreliable, opaque Given a causal model, can use logic to derive implications Others can use same logic to verify & challenge your work Better than clever

Slide 3

Slide 3 text

X Z Y e Pipe X Z Y e Fork X Z Y e Collider X Z Y e Descendant A

Slide 4

Slide 4 text

X Z Y e Pipe X Z Y e Fork X Z Y e Collider X and Y associated unless stratify by Z X and Y associated unless stratify by Z X and Y not associated unless stratify by Z

Slide 5

Slide 5 text

X Y U treatment outcome confound

Slide 6

Slide 6 text

X Y U treatment outcome confound RANDOMIZE! R

Slide 7

Slide 7 text

X Y U treatment outcome confound randomize? R

Slide 8

Slide 8 text

Causal inking In an experiment, we cut causes of the treatment We randomize (we try at least) So how does causal inference without randomization ever work? Is there a statistical procedure that mimics randomization? X Y U Without randomization X Y U With randomization do(X)

Slide 9

Slide 9 text

Causal inking Is there a statistical procedure that mimics randomization? X Y U Without randomization P(Y|do(X)) = P(Y|?) do(X) means intervene on X Can analyze causal model to nd answer (if it exists) X Y U With randomization do(X)

Slide 10

Slide 10 text

Example: Simple Confound X Y U

Slide 11

Slide 11 text

Example: Simple Confound X Y U Non-causal path X <– U –> Y Close the fork! Condition on U

Slide 12

Slide 12 text

Example: Simple Confound X Y U Non-causal path X <– U –> Y Close the fork! Condition on U P(Y|do(X)) = ∑ U P(Y|X, U)P(U) = E U P(Y|X, U) “ e distribution of Y, strati ed by X and U, averaged over the distribution of U.”

Slide 13

Slide 13 text

e causal e ect of X on Y is not (in general) the coe cient relating X to Y It is the distribution of Y when we change X, averaged over the distributions of the control variables (here U) P(Y|do(X)) = ∑ U P(Y|X, U)P(U) = E U P(Y|X, U) “ e distribution of Y, strati ed by X and U, averaged over the distribution of U.” X Y U

Slide 14

Slide 14 text

Marginal E ects Example B G C cheetahs baboons gazelle

Slide 15

Slide 15 text

B G C cheetahs present

Slide 16

Slide 16 text

B G C B G C cheetahs absent Causal e ect of baboons depends upon distribution of cheetahs cheetahs present

Slide 17

Slide 17 text

do-calculus For DAGs, rules for nding P(Y|do(X)) known as do-calculus do-calculus says what is possible to say before picking functions Justi es graphical analysis Do calculus, not too much, mostly graphs

Slide 18

Slide 18 text

do-calculus do-calculus is worst case: additional assumptions o en allow stronger inference do-calculus is best case: if inference possible by do- calculus, does not depend on special assumptions Judea Pearl, father of do-calculus (1966)

Slide 19

Slide 19 text

Backdoor Criterion Backdoor Criterion is a shortcut to applying (some) results of do-calculus Can be performed with your eyeballs

Slide 20

Slide 20 text

Backdoor Criterion: Rule to nd a set of variables to stratify by to yield P(Y|do(X)) (1) Identify all paths connecting the treatment (X) to the outcome (Y) (2) Paths with arrows entering X are backdoor paths (non-causal paths) (3) Find adjustment set that closes/blocks all backdoor paths

Slide 21

Slide 21 text

(1) Identify all paths connecting the treatment (X) to the outcome (Y)

Slide 22

Slide 22 text

(2) Paths with arrows entering X are backdoor paths (confounding paths)

Slide 23

Slide 23 text

(3) Find a set of control variables that close/block all backdoor paths Block the pipe: X ⫫ U | Z Z “knows” all of the association between X,Y that is due to U

Slide 24

Slide 24 text

(3) Find a set of control variables that close/block all backdoor paths P(Y|do(X)) = ∑ z P(Y|X, Z)P(Z = z) μ i = α + β X X i + β Z Z i Y i ∼ Normal(μ i , σ) Block the pipe: X ⫫ U | Z

Slide 25

Slide 25 text

# simulate confounded Y N <- 200 b_XY <- 0 b_UY <- -1 b_UZ <- -1 b_ZX <- 1 set.seed(10) U <- rbern(N) Z <- rnorm(N,b_UZ*U) X <- rnorm(N,b_ZX*Z) Y <- rnorm(N,b_XY*X+b_UY*U) d <- list(Y=Y,X=X,Z=Z)

Slide 26

Slide 26 text

# ignore U,Z m_YX <- quap( alist( Y ~ dnorm( mu , sigma ), mu <- a + b_XY*X, a ~ dnorm( 0 , 1 ), b_XY ~ dnorm( 0 , 1 ), sigma ~ dexp( 1 ) ), data=d ) # stratify by Z m_YXZ <- quap( alist( Y ~ dnorm( mu , sigma ), mu <- a + b_XY*X + b_Z*Z, a ~ dnorm( 0 , 1 ), c(b_XY,b_Z) ~ dnorm( 0 , 1 ), sigma ~ dexp( 1 ) ), data=d ) post <- extract.samples(m_YX) post2 <- extract.samples(m_YXZ) dens(post$b_XY,lwd=3,col=1,xlab="posterior b_XY",xlim=c(-0.3,0.3)) dens(post2$b_XY,lwd=3,col=2,add=TRUE) Y|X,Z Y|X -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0 2 4 6 8 posterior b_XY Density

Slide 27

Slide 27 text

# ignore U,Z m_YX <- quap( alist( Y ~ dnorm( mu , sigma ), mu <- a + b_XY*X, a ~ dnorm( 0 , 1 ), b_XY ~ dnorm( 0 , 1 ), sigma ~ dexp( 1 ) ), data=d ) # stratify by Z m_YXZ <- quap( alist( Y ~ dnorm( mu , sigma ), mu <- a + b_XY*X + b_Z*Z, a ~ dnorm( 0 , 1 ), c(b_XY,b_Z) ~ dnorm( 0 , 1 ), sigma ~ dexp( 1 ) ), data=d ) post <- extract.samples(m_YX) post2 <- extract.samples(m_YXZ) dens(post$b_XY,lwd=3,col=1,xlab="posterior b_XY",xlim=c(-0.3,0.3)) dens(post2$b_XY,lwd=3,col=2,add=TRUE) > precis(m_YXZ) mean sd 5.5% 94.5% a -0.32 0.09 -0.47 -0.18 b_XY -0.01 0.08 -0.13 0.11 b_Z 0.24 0.11 0.06 0.42 sigma 1.18 0.06 1.08 1.27 Coe cient on Z means nothing. “Table 2 Fallacy”

Slide 28

Slide 28 text

X Y Z B List all the paths connecting X and Y. Which need to be closed to estimate e ect of X on Y? A C

Slide 29

Slide 29 text

No content

Slide 30

Slide 30 text

X Y Z B A C P(Y|do(X)) X Y Z B A C

Slide 31

Slide 31 text

X Y Z B A C X Y Z B A C X Y Z B A C X Y Z B A C X Y Z B A C X Y Z B A C

Slide 32

Slide 32 text

Causal path, open X Y Z B A C

Slide 33

Slide 33 text

X Y Z B A C X Y Z B A C X Y Z B A C X Y Z B A C X Y Z B A C X Y Z B A C

Slide 34

Slide 34 text

Backdoor path, open Close with C X Y Z B A C

Slide 35

Slide 35 text

X Y Z B A C X Y Z B A C X Y Z B A C X Y Z B A C X Y Z B A C X Y Z B A C C

Slide 36

Slide 36 text

X Y Z B A C X Y Z B A C X Y Z B A C X Y Z B A C X Y Z B A C X Y Z B A C C

Slide 37

Slide 37 text

Backdoor path, open Close with Z X Y Z B A C

Slide 38

Slide 38 text

X Y Z B A C X Y Z B A C X Y Z B A C X Y Z B A C X Y Z B A C X Y Z B A C C Z

Slide 39

Slide 39 text

Backdoor path, opened by Z A or B to close X Y Z B A C

Slide 40

Slide 40 text

X Y Z B A C X Y Z B A C X Y Z B A C X Y Z B A C X Y Z B A C X Y Z B A C C Z A,B

Slide 41

Slide 41 text

X Y Z B A C X Y Z B A C X Y Z B A C X Y Z B A C X Y Z B A C X Y Z B A C C Z A,B

Slide 42

Slide 42 text

Backdoor path, open Close with A or Z X Y Z B A C

Slide 43

Slide 43 text

X Y Z B A C X Y Z B A C X Y Z B A C X Y Z B A C X Y Z B A C X Y Z B A C C Z A,B

Slide 44

Slide 44 text

X Y Z B A C X Y Z B A C X Y Z B A C Minimum adjustment set: C, Z, and either A or B (B is better choice) C Z A,B

Slide 45

Slide 45 text

www.dagitty.net

Slide 46

Slide 46 text

G P U C grandparent education parent education child education unobserved confound

Slide 47

Slide 47 text

U G P C P is a mediator Pipe: G –> P –> C

Slide 48

Slide 48 text

G P U C P is a collider Pipe: G –> P –> C Fork: C <– U –> P

Slide 49

Slide 49 text

G P U C Can estimate total e ect of G on C Cannot estimate direct e ect G P U C C i ∼ Normal(μ i , σ) μ i = α + β G G i C i ∼ Normal(μ i , σ) μ i = α + β G G i + β P P i

Slide 50

Slide 50 text

Backdoor Criterion do-calc more than backdoors & adjustment sets Full Luxury Bayes: use all variables, but in separate sub-models instead of single regression do-calc less demanding: nds relevant variables; saves us having to make some assumptions; not always a regression

Slide 51

Slide 51 text

PAUSE

Slide 52

Slide 52 text

Good & Bad Controls “Control” variable: Variable introduced to an analysis so that a causal estimate is possible Common wrong heuristics for choosing control variables Anything in the spreadsheet YOLO! Any variables not highly collinear Any pre-treatment measurement (baseline)

Slide 53

Slide 53 text

X Cinelli, Forney, Pearl 2021 A Crash Course in Good and Bad Controls Y

Slide 54

Slide 54 text

X Y u v Z Cinelli, Forney, Pearl 2021 A Crash Course in Good and Bad Controls unobserved

Slide 55

Slide 55 text

X Y u v Z Cinelli, Forney, Pearl 2021 A Crash Course in Good and Bad Controls Health person 1 Health person 2 Hobbies person 1 Hobbies person 2 Friends

Slide 56

Slide 56 text

X Y u v Z (1) List the paths

Slide 57

Slide 57 text

X Y u v Z (1) List the paths X → Y

Slide 58

Slide 58 text

X Y u v Z (1) List the paths X → Y X ← u → Z ← v → Y

Slide 59

Slide 59 text

X Y u v Z (1) List the paths X → Y X ← u → Z ← v → Y frontdoor & open backdoor & closed (2) Find backdoors

Slide 60

Slide 60 text

X Y u v Z (1) List the paths X → Y X ← u → Z ← v → Y frontdoor & open backdoor & closed (2) Find backdoors

Slide 61

Slide 61 text

X Y u v Z (1) List the paths X → Y X ← u → Z ← v → Y frontdoor & open backdoor & closed (2) Find backdoors (3) Close backdoors

Slide 62

Slide 62 text

X Y u v Z What happens if you stratify by Z? Opens the backdoor path Z could be a pre-treatment variable Not safe to always control pre- treatment measurements Health person 1 Health person 2 Hobbies person 1 Hobbies person 2 Friends

Slide 63

Slide 63 text

X Y Z u

Slide 64

Slide 64 text

X Y Z u Win lottery Lifespan Happiness Contextual confounds

Slide 65

Slide 65 text

X Y Z u X → Z → Y X → Z ← u → Y No backdoor, no need to control for Z

Slide 66

Slide 66 text

X Y Z u f <- function(n=100,bXZ=1,bZY=1) { X <- rnorm(n) u <- rnorm(n) Z <- rnorm(n, bXZ*X + u) Y <- rnorm(n, bZY*Z + u ) bX <- coef( lm(Y ~ X) )['X'] bXZ <- coef( lm(Y ~ X + Z) )['X'] return( c(bX,bXZ) ) } sim <- mcreplicate( 1e4 , f() , mc.cores=8 ) dens( sim[1,] , lwd=3 , xlab="posterior mean" ) dens( sim[2,] , lwd=3 , col=2 , add=TRUE ) 1 1 1 1

Slide 67

Slide 67 text

X Y Z u -1.0 0.0 0.5 1.0 1.5 2.0 0.0 1.0 2.0 posterior mean Density f <- function(n=100,bXZ=1,bZY=1) { X <- rnorm(n) u <- rnorm(n) Z <- rnorm(n, bXZ*X + u) Y <- rnorm(n, bZY*Z + u ) bX <- coef( lm(Y ~ X) )['X'] bXZ <- coef( lm(Y ~ X + Z) )['X'] return( c(bX,bXZ) ) } sim <- mcreplicate( 1e4 , f() , mc.cores=8 ) dens( sim[1,] , lwd=3 , xlab="posterior mean" ) dens( sim[2,] , lwd=3 , col=2 , add=TRUE ) Y ~ X correct Y ~ X + Z wrong 1 1 1 1

Slide 68

Slide 68 text

X Y Z u Y ~ X correct Y ~ X + Z wrong Change bZY to zero f <- function(n=100,bXZ=1,bZY=1) { X <- rnorm(n) u <- rnorm(n) Z <- rnorm(n, bXZ*X + u) Y <- rnorm(n, bZY*Z + u ) bX <- coef( lm(Y ~ X) )['X'] bXZ <- coef( lm(Y ~ X + Z) )['X'] return( c(bX,bXZ) ) } sim <- mcreplicate( 1e4 , f(bZY=0) , mc.cores=8 ) dens( sim[1,] , lwd=3 , xlab="posterior mean" ) dens( sim[2,] , lwd=3 , col=2 , add=TRUE ) -1.0 -0.5 0.0 0.5 0.0 1.0 2.0 posterior mean Density 1 0 1 1

Slide 69

Slide 69 text

X Y Z u X → Z → Y X → Z ← u → Y No backdoor, no need to control for Z Controlling for Z biases treatment estimate X Controlling for Z opens biasing path through u Can estimate e ect of X; Cannot estimate mediation e ect Z Win lottery Lifespan Happiness

Slide 70

Slide 70 text

X Y Z u Win lottery Lifespan Happiness Montgomery et al 2018 How Conditioning on Posttreatment Variables Can Ruin Your Experiment Regression with confounds Regression with posttreatment variables

Slide 71

Slide 71 text

X Y Z Do not touch the collider!

Slide 72

Slide 72 text

X Y Z u Colliders not always so obvious

Slide 73

Slide 73 text

X Y Z u education values income family

Slide 74

Slide 74 text

X Y Z Case-control bias (selection on outcome)

Slide 75

Slide 75 text

X Y Z Education Occupation Income Case-control bias (selection on outcome)

Slide 76

Slide 76 text

X Y Z f <- function(n=100,bXY=1,bYZ=1) { X <- rnorm(n) Y <- rnorm(n, bXY*X ) Z <- rnorm(n, bYZ*Y ) bX <- coef( lm(Y ~ X) )['X'] bXZ <- coef( lm(Y ~ X + Z) )['X'] return( c(bX,bXZ) ) } sim <- mcreplicate( 1e4 , f() , mc.cores=8 ) dens( sim[1,] , lwd=3 , xlab="posterior mean" ) dens( sim[2,] , lwd=3 , col=2 , add=TRUE ) 0.0 0.5 1.0 1.5 0 1 2 3 4 5 posterior mean Density Y ~ X correct Y ~ X + Z wrong 1 1 Case-control bias (selection on outcome)

Slide 77

Slide 77 text

X Y Z “Precision parasite” No backdoors But still not good to condition on Z

Slide 78

Slide 78 text

X Y Z “Precision parasite” f <- function(n=100,bZX=1,bXY=1) { Z <- rnorm(n) X <- rnorm(n, bZX*Z ) Y <- rnorm(n, bXY*X ) bX <- coef( lm(Y ~ X) )['X'] bXZ <- coef( lm(Y ~ X + Z) )['X'] return( c(bX,bXZ) ) } sim <- mcreplicate( 1e4 , f(n=50) , mc.cores=8 ) dens( sim[1,] , lwd=3 , xlab="posterior mean" ) dens( sim[2,] , lwd=3 , col=2 , add=TRUE ) 0.6 0.8 1.0 1.2 1.4 0 1 2 3 4 posterior mean Density Y ~ X correct Y ~ X + Z wrong

Slide 79

Slide 79 text

X Y Z u “Bias ampli cation” X and Y confounded by u Something truly awful happens when we add Z

Slide 80

Slide 80 text

f <- function(n=100,bZX=1,bXY=1) { Z <- rnorm(n) u <- rnorm(n) X <- rnorm(n, bZX*Z + u ) Y <- rnorm(n, bXY*X + u ) bX <- coef( lm(Y ~ X) )['X'] bXZ <- coef( lm(Y ~ X + Z) )['X'] return( c(bX,bXZ) ) } sim <- mcreplicate( 1e4 , f(bXY=0) , mc.cores=8 ) dens( sim[1,] , lwd=3 , xlab="posterior mean" ) dens( sim[2,] , lwd=3 , col=2 , add=TRUE ) X Y Z u -0.5 0.0 0.5 1.0 0 1 2 3 4 5 posterior mean Density Y ~ X biased Y ~ X + Z more bias true value is zero

Slide 81

Slide 81 text

X Y Z u -0.5 0.0 0.5 1.0 0 1 2 3 4 5 posterior mean Density Y ~ X biased Y ~ X + Z more bias true value is zero WHY? Covariation X & Y requires variation in their causes Within each level of Z, less variation in X Confound u relatively more important within each Z

Slide 82

Slide 82 text

-5 0 5 10 -2 0 2 4 X Y X Y Z u 0 + + + n <- 1000 Z <- rbern(n) u <- rnorm(n) X <- rnorm(n, 7*Z + u ) Y <- rnorm(n, 0*X + u ) Z = 0 Z = 1

Slide 83

Slide 83 text

X Y Z u education occupation income regional/cultural factors

Slide 84

Slide 84 text

Good & Bad Controls “Control” variable: Variable introduced to an analysis so that a causal estimate is possible Heuristics fail — adding control variables can be worse than omitting Make assumptions explicit MODEL ALL THE THINGS

Slide 85

Slide 85 text

Course Schedule Week 1 Bayesian inference Chapters 1, 2, 3 Week 2 Linear models & Causal Inference Chapter 4 Week 3 Causes, Confounds & Colliders Chapters 5 & 6 Week 4 Over tting / MCMC Chapters 7, 8, 9 Week 5 Generalized Linear Models Chapters 10, 11 Week 6 Integers & Other Monsters Chapters 11 & 12 Week 7 Multilevel models I Chapter 13 Week 8 Multilevel models II Chapter 14 Week 9 Measurement & Missingness Chapter 15 Week 10 Generalized Linear Madness Chapter 16 https://github.com/rmcelreath/stat_rethinking_2023

Slide 86

Slide 86 text

No content

Slide 87

Slide 87 text

BONUS

Slide 88

Slide 88 text

TABLE 2-ESTIMATED PROBIT MODELS FOR THE USE OF A SCREEN Finals Preliminaries blind blind (1) (2) (3) (Proportion female),_ 2.744 3.120 0.490 (3.265) (3.271) (1.163) [0.006] [0.004] [0.011] (Proportion of orchestra -26.46 -28.13 -9.467 personnel with <6 (7.314) (8.459) (2.787) years tenure),- 1 [-0.058] [-0.039] [-0.207] "Big Five" orchestra 0.367 (0.452) [0.001] pseudo R2 0.178 0.193 0.050 Number of observations 294 294 434 who attende the names of For the preli advancemen round. Anoth the semifina of who won we recorded section, prin dition was h individual ha semifinal or occurs when above some compete in a recorded wh

Slide 89

Slide 89 text

Table 2 Fallacy Not all coe cients are causal e ects Statistical model designed to identify X –> Y will not also identify e ects of control variables Table 2 is dangerous Westreich & Greenland 2013 e Table 2 Fallacy 724 THE AMERICAN EC TABLE 2-ESTIMATED PROBIT MODELS FOR THE USE OF A SCREEN Finals Preliminaries blind blind (1) (2) (3) (Proportion female),_ 2.744 3.120 0.490 (3.265) (3.271) (1.163) [0.006] [0.004] [0.011] (Proportion of orchestra -26.46 -28.13 -9.467 personnel with <6 (7.314) (8.459) (2.787) years tenure),- 1 [-0.058] [-0.039] [-0.207] "Big Five" orchestra 0.367 (0.452) [0.001] pseudo R2 0.178 0.193 0.050 Number of observations 294 294 434 Notes: The dependent variable is 1 if the orchestra adopts a screen, 0 otherwise. Huber standard errors (with orchestra random effects) are in parentheses. All specifications in- clude a constant. Changes in probabilities are in brackets.

Slide 90

Slide 90 text

A X Y S Westreich & Greenland 2013 e Table 2 Fallacy Stroke HIV Smoking Age

Slide 91

Slide 91 text

No content

Slide 92

Slide 92 text

Use Backdoor Criterion A X Y S

Slide 93

Slide 93 text

Use Backdoor Criterion A X Y S X Y

Slide 94

Slide 94 text

Use Backdoor Criterion A X Y S X Y X Y S

Slide 95

Slide 95 text

Use Backdoor Criterion A X Y S X Y X Y S A X Y

Slide 96

Slide 96 text

Use Backdoor Criterion A X Y S X Y X Y S A X Y A X Y S

Slide 97

Slide 97 text

Use Backdoor Criterion A X Y S X Y X Y S A X Y A X Y S

Slide 98

Slide 98 text

Y i ∼ Normal(μ i , σ) μ i = α + β X X i + β S S i + β A A i A X Y S

Slide 99

Slide 99 text

A X Y S Confounded by A and S Unconditional X

Slide 100

Slide 100 text

Coe cient for X: E ect of X on Y (still must marginalize!) A X Y S Confounded by A and S A X Y S Unconditional Conditional on A and S X

Slide 101

Slide 101 text

A X Y S E ect of S confounded by A Unconditional S

Slide 102

Slide 102 text

Coe cient for S: Direct e ect of S on Y A X Y S E ect of S confounded by A Unconditional Conditional on A and X A X Y S S

Slide 103

Slide 103 text

A X Y S Total causal e ect of A on Y ows through all paths Unconditional A

Slide 104

Slide 104 text

Coe cient for A: Direct e ect of A on Y A X Y S Total causal e ect of A on Y ows through all paths Unconditional Conditional on X and S A X Y S A

Slide 105

Slide 105 text

A X Y S Stroke HIV Smoking Age u unobserved confound

Slide 106

Slide 106 text

Table 2 Fallacy Not all coe cients created equal So do not present them as equal Options: Do not present control coe cients Give explicit interpretation of each No interpretation without causal representation A X Y S u

Slide 107

Slide 107 text

No content