Causal Inference 2022 Week 5

CAUSAL INFERENCE Weighting and matching Will Lowe Data Science Lab,
Hertie School 2022-03-11

PLAN 1 Last week: adjustment Matching (rather than adjustment) Space,
the nal frontier Support problems Some poetry about matching the propensity score Weighting (rather than adjustment or selection) Belt and suspenders approaches

RECAP: LAST WEEK 2 D We need to block the
‘backdoor paths’ between X and Y to identify the causal e ect of X on Y. X G H Y єX єY F → Nearer Y, e.g. control for things like H → Nearer X, e.g. control for things like G L Adjust existing observations ˜ Xi = Xi − E[Xi Gi , Hi . . .] ˜ Yi = Yi − E[Yi Gi , Hi . . .] by subtracting out variation due to confounders → is is what regression models do P We want good models of Y( ) and Y( ) , so regression works best for variables nearer Y

THE PURSUIT OF BALANCE 3 O X G Y єX
єY E X G Y єX єY E What hath randomization wraught? → e distribution of confounders (and everything else) is approximately balanced across values of X In an observational study they are probably not M Could we select (rather than adjust) the observations to be just like observations we would get from an experiment? → Why, yes. Yes we can

MATCHING 4 X G Y M єX єY G is
gender ( % male, % female) and X is treatment. P(X = G = m) = . P(X = G = f) = . So P(G = m X = ) = . × . . × . + . × . = . M To estimate the ATT: → Set M = for all observations → For each case where X = , nd an X = case with the same value of G → Set M = for each such pair is new variable M = f (G, X) is a collider. But let’s condition on it (by subsetting) Now X ⊥ ⊥ G but X ⊥ ⊥ G M = So the matched subset is as if there were no G → X, e.g. in an experiment

MATCHING DETAILS 5 E We described one-to-one exact matching for
the ATT → For the ATC, match from control cases → e ATE is a harder search problem T → Balance is an empirical property so can be checked → Many rounds of balance checking → Unlike adding covariates to a regression, it’s harder to ‘over t’ here I Successful matching approximates a randomized experiment. → No precision gains from control (King & Nielsen, ) C Standard errors are tricky (Abadie & Imbens, , ) Exact matches are hard to nd → e more covariates there are, the harder it gets because high dimensional spaces are weird and empty

EMPTY SPACE 6 Every new covariate adds a dimension to
the confounders Covariate value combinations increase faster than you collect data. → Support disappears rapidly → Interpolation gets more like extrapolation is is the curse of dimensionality (Bellman, ) S → Work in area of common support → Functional form, e.g. linearity x1 D = 1 x1 x2 D = 2 x1 x2 x3 D = 3

WEIRD SPACE 7 E Consider D Normally distributed covariates for
matching For convenience: → zero mean → uncorrelated (and therefore independent) What sorts of covariate pro les do we expect to see in a random sample? Intuition says: → Lots of observation in the ‘middle’ But it turns out, this depends on D Probability density always looks like D = but it’s spread out across a space of increasingly large D, so the mass diverges → c.f. ‘concentration of measure’

THE IMPORTANCE OF SUPPORT 9 What you can’t buy with
data you buy with assumptions (futures trading comes with risks, see your statistician for details) R Matching sometimes changes the subject, e.g. from ATT, to “the ATT I can get empirical support for”. U → Robustness to modeling choices D → No match, no support for the counterfactual

RELAXING THE SEARCH CRITERIA 10 E : Nearest neighbour matching
(Ho et al., ) → Match to cases with nearby ages → ‘Calipers’ restrict maximum match distances Coarsened exact matching (Iacus et al., ) → Match exactly, but to age ranges, e.g. - , - Entropy matching (Hainmueller, ) → Exactly match only particular moments, e.g. just mean and variance Propensity score matching (King & Nielsen, ; Rosenbaum & Rubin, )

SOLUTION (TO CONFOUNDING) 11 Nach dem Aufstand des . Juni
Ließ der Sekret¨ ar des Schri stellerverbands In der Stalinallee Flugbl¨ atter verteilen Auf denen zu lesen war, daß das Volk Das Vertrauen der Regierung verscherzt habe Und es nur durch verdoppelte Arbeit zur¨ uckerobern k¨ onne. W¨ are es da Nicht doch einfacher, die Regierung L¨ oste das Volk auf und W¨ ahlte ein anderes? Bertolt Brecht ( ) A er the uprising of the th of June e Secretary of the Writers’ Union Had lea ets distributed on the Stalinallee Stating that the people Had forfeited the con dence of the government And could only win it back By redoubled e orts. Would it not be simpler for the government To dissolve the people And elect another?

PROPENSITY SCORES 12 O X G H Y єX єY
E X G H Y єX єY L G and H? Could we nd something lower dimensional to condition on? I e only variation in G and H that matters is variation that moves X, speci cally, variation that a ects the propensity score ei = E[Xi Gi , Hi ] = P(Xi = Gi , Hi ) which is a single number

PROPENSITY SCORES 13 e X G H Y єX єY
e propensity score is on all backdoor paths between X and Y Conditioning on e is equivalent to → conditioning on G and H with regression or strati cation → balancing G and H with matching (Ho et al., ) e propensity score is (the coarsest possible) balancing score, meaning X ⊥ ⊥ G, H e (but we read that already from the graph) W ’ ? We don’t actually observe it. I A logistic regression of X on G and H → It’s harder to over t this model → Using ˆ ei can be more e cient than using ei itself (Hirano et al., )

USES FOR THE PROPENSITY SCORE 14 A We can treat
it as a low dimensional backdoor path-closing covariate → Could be ine cient if A To estimate the ATT: → Set M = for all observations → For each case where Xi = , nd a case j where Xj = with the same (or similar) value of e → Set M = for each such pair A

AS A WEIGHT 15 X G Y єX єY P(Y(x))
= P(Y X = x, G = д)P(G = д) From which the ATE ∆ = E[Y( )−Y( )] is ∆ = ∆дP(G = д) where the causal e ect for subgroup д is ∆д = E[Y X = , G = д] − E[Y X = , G = д] With a little rearrangement P(Y(x)) = P(Y X = x, G = д)P(G = д) = P(Y, X = x, G = д) P(X = x G = д) so the two probabilities we need for ∆ are P(Y( )) = P(Y, X = , G = д) e P(Y( )) = P(Y, X = , G = д) ( − e) and our observed data is just Xi , Yi , Gi ∼ P(X, Y, G) ≤ i ≤ N so we can get ∆ by weighting our observations

PROPENSITY SCORE WEIGHTING 16 X G Y єX єY G
is gender ( % male, % female) and X is treatment. P(X = G = m) = . P(X = G = f) = . So P(G = m X = ) = . × . . × . + . × . = . E N = cases: in treatment, in control I ( ) ∆ = E[Y( )] − E[Y( )] = N N i Xi Yi ei − N N i ( − Xi ) Yi ( − ei ) If there were no imbalance then everyone gets weight e = : ∆ = i Xi Yi . − i ( − Xi ), Yi . Look closely, and you’ll see this is an ordinary di erence of means

PROPENSITY SCORE WEIGHTING 17 X G Y єX єY G
is gender ( % male, % female) and X is treatment. P(X = G = m) = . P(X = G = f) = . So P(G = m X = ) = . × . . × . + . × . = . But there is imbalance, so ∆ = E[Y( )] − E[Y( )] = N N i Xi Yi ei − N N i ( − Xi ) Yi ( − ei ) is not a di erence in means. Consider just the treatment group average for men i = . . . and women i = . . . ¯ Y( ) = i= Yi . + i= Yi . Equivalently, this is a weighted average of the male and female-speci c Y averages (Horvitz & ompson, )

APPLY ALL THE THINGS 18 Por qu´ e no los
dos? A We use all the data! → Standard errors are easier (hint: when in doubt, bootstrap) C Propensity scores can be zero → No support → No evidence that the counterfactual exists Propensity scores can be extreme → Unstable weights and high variance estimation → Lots of stabilization schemes available

WHY NOT BOTH? 19 X G H Y єX єY
R → predict Y better (so also the potential outcomes) → add precision by conditioning on non-confounding causes of Y P → deal with confounding nearer X O Regression weights for the ATE → X= weight: /e → X= weight: /( -e) and for the ATT → X= weight: → X= weight: e/( -e) When the method is double robust, only one model need be correct (Kang & Schafer, )

PLAN 20 Last week: adjustment Matching (rather than adjustment) Space,
the nal frontier Support problems Some poetry about matching the propensity score Weighting (rather than adjustment or selection) Belt and suspenders approaches

REFERENCES 21 Abadie, A., & Imbens, G. W. ( ).
On the failure of the bootstrap for matching estimators. Econometrica, ( ), – . Abadie, A., & Imbens, G. W. ( ). Matching on the estimated propensity score. Econometrica, ( ), – . Bellman, R. ( ). Dynamic programming. Princeton University Press. Hainmueller, J. ( ). Entropy balancing for causal e ects: A multivariate reweighting method to produce balanced samples in observational studies. Political Analysis, ( ), – . Hirano, K., Imbens, G. W., & Ridder, G. ( ). E cient estimation of average treatment e ects using the estimated propensity score. Econometrica, ( ), – . Ho, D. E., Imai, K., King, G., & Stuart, E. A. ( ). Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Political Analysis, ( ), – . Horvitz, D. G., & ompson, D. J. ( ). A generalization of sampling without replacement from a nite universe. Journal of the American Statistical Association, ( ), – . Iacus, S. M., King, G., & Porro, G. ( ). Causal inference without balance checking: Coarsened exact matching. Political Analysis, ( ), – .

REFERENCES 22 Kang, J. D. Y., & Schafer, J. L.
( ). Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. Statistical Science, ( ). King, G., & Nielsen, R. ( ). Why propensity scores should not be used for matching. Political Analysis, ( ), – . Rosenbaum, P. R., & Rubin, D. B. ( ). e central role of the propensity score in observational studies for causal e ects. Biometrika, ( ), – .

Causal Inference 2022 Week 5

Causal Inference 2022 Week 5

Will Lowe

More Decks by Will Lowe

Featured

Transcript

CAUSAL INFERENCE Weighting and matching Will Lowe Data Science Lab,

PLAN 1 Last week: adjustment Matching (rather than adjustment) Space,

RECAP: LAST WEEK 2 D We need to block the

THE PURSUIT OF BALANCE 3 O X G Y єX

MATCHING 4 X G Y M єX єY G is

MATCHING DETAILS 5 E We described one-to-one exact matching for

EMPTY SPACE 6 Every new covariate adds a dimension to

WEIRD SPACE 7 E Consider D Normally distributed covariates for

8

THE IMPORTANCE OF SUPPORT 9 What you can’t buy with

RELAXING THE SEARCH CRITERIA 10 E : Nearest neighbour matching

SOLUTION (TO CONFOUNDING) 11 Nach dem Aufstand des . Juni

PROPENSITY SCORES 12 O X G H Y єX єY

PROPENSITY SCORES 13 e X G H Y єX єY

USES FOR THE PROPENSITY SCORE 14 A We can treat

AS A WEIGHT 15 X G Y єX єY P(Y(x))

PROPENSITY SCORE WEIGHTING 16 X G Y єX єY G

PROPENSITY SCORE WEIGHTING 17 X G Y єX єY G

APPLY ALL THE THINGS 18 Por qu´ e no los

WHY NOT BOTH? 19 X G H Y єX єY

PLAN 20 Last week: adjustment Matching (rather than adjustment) Space,

REFERENCES 21 Abadie, A., & Imbens, G. W. ( ).

REFERENCES 22 Kang, J. D. Y., & Schafer, J. L.