Causal Inference 2022 Week 11

CAUSAL INFERENCE Sensitivity analysis 2022-05-02

SENSITIVITY ANALYSIS 1 S → Collider bias: lolnope! (but see
Ding & Miratrix, ) → Mediator-outcome confounding: previous weeks → Exclusion violation restrictions? [link] (Conley et al., ; van Kippersluis & Rietveld, ) → Confounding...

SENSITIVITY ANALYSIS 1 S → Collider bias: lolnope! (but see
Ding & Miratrix, ) → Mediator-outcome confounding: previous weeks → Exclusion violation restrictions? [link] (Conley et al., ; van Kippersluis & Rietveld, ) → Confounding... O e state of the art: → Omitted variable bias in coe cients → Omitted variable bias in R → Plots, plots, plots

EXAMPLE 2 e e ect harm from janjaweed attacks on
preferences C → directly harmed (D) → peace index (Y) C age, occupation, past voting record, household size, gender, village xed e ects → summarised as X Model 1 (intercept) 1.082 (0.315) ∗∗∗ directly harmed 0.097 (0.023) ∗∗∗ age −0.002 (0.001) ∗ occup: farmer −0.040 (0.029) occup: herder 0.014 (0.032) pastvoted −0.048 (0.024) ∗ hhsize 0.001 (0.002) female −0.232 (0.024) ∗∗∗ xed e ects village R2 0.512 ∗∗∗ p < 0.001; ∗∗ p < 0.01; ∗ p < 0.05 Cinelli and Hazlett ( )

UNMEASURED CONFOUNDING 3 D X єY єD Z Y Possible
Zs: → village centrality → asset types → village accessibility F , W , L Construct the residuals from each sub-model r(D) = D − ˆ D → єD r(Y) = Y − ˆ Y → єY and t the following linear model r(Y) = r(D) βD + є P ˆ D = E[D X] ≠ E[D X, Z] ˆ Y = E[Y X] ≠ E[Y X, Z] So this won’t work. → is is omitted variable bias

OMITTED VARIABLES 4 In Cinelli and Hazlitt’s notation, what we
want to estimate is Y = Dˆ τ + X ˆ β + Z ˆ γ + ˆ є but what we actually estimate is Y = Dˆ τ res + X ˆ β res + ˆ є res so the di erence in estimates is bias = ˆ τ res − ˆ τ What is ˆ τ res as a function of ˆ τ?

OMITTED VARIABLES 4 In Cinelli and Hazlitt’s notation, what we
want to estimate is Y = Dˆ τ + X ˆ β + Z ˆ γ + ˆ є but what we actually estimate is Y = Dˆ τ res + X ˆ β res + ˆ є res so the di erence in estimates is bias = ˆ τ res − ˆ τ What is ˆ τ res as a function of ˆ τ? B For any two variables D and Y τD = cov(D, Y) var(D) M Now controlling for X τD = cov(r(D X), r(Y X)) var(r(D X)) where we’ve just replaced D and Y by their residualized versions

OMITTED VARIABLES FORMULA 5 en ˆ τ res = cov(r(D
X), r(Y X)) var(r(D X)) = cov(r(D X), r(Y X) r(D X) + r(Z X)) var(r(D X)) = ˆ τ + cov(r(D X), r(Z X)) var(r(D X)) ˆ γ = ˆ τ + ˆ δˆ γ where ˆ δ is from tting a linear model of E[Z D, X] = β +DβD+XβX + єZ

OMITTED VARIABLES FORMULA 5 en ˆ τ res = cov(r(D
X), r(Y X)) var(r(D X)) = cov(r(D X), r(Y X) r(D X) + r(Z X)) var(r(D X)) = ˆ τ + cov(r(D X), r(Z X)) var(r(D X)) ˆ γ = ˆ τ + ˆ δˆ γ where ˆ δ is from tting a linear model of E[Z D, X] = β +DβD+XβX + єZ is may be surprising: Z → D need not be linear δ D X єY єD Z Y τ γ → δ is a measure of ‘imbalance’ → γ is a measure of the (not necessarily causal) ‘impact’ of Z on Y

BIAS IS IMBALANCE TIMES IMPACT 6 e only way in
which Z’s relationship to D enters the bias is captured by its ‘linear imbalance’, parameterized by ˆ δ. In other words, the linear regression of Z on D and X need not re ect the correct expected value of Z—rather it serves to capture the aspects of the relationship between Z and D that a ects the bias. (Cinelli & Hazlett, ) is is why we might match or weight, rather than try to model Z’s e ect on Y

BIAS IS IMBALANCE TIMES IMPACT 6 e only way in
which Z’s relationship to D enters the bias is captured by its ‘linear imbalance’, parameterized by ˆ δ. In other words, the linear regression of Z on D and X need not re ect the correct expected value of Z—rather it serves to capture the aspects of the relationship between Z and D that a ects the bias. (Cinelli & Hazlett, ) is is why we might match or weight, rather than try to model Z’s e ect on Y I For binary Z it’s reasonably straightforward to plot the two components

LIMITATIONS 7 → Hard to interpret if Z is not
binary → What about multiple Z? → How much ‘robustness’ is enough? → Robustness compared to what? → Can get sensitivity for more than a point estimate?

LIMITATIONS 7 → Hard to interpret if Z is not
binary → What about multiple Z? → How much ‘robustness’ is enough? → Robustness compared to what? → Can get sensitivity for more than a point estimate? Solution: Switch from coe cients to partial R s A → Multiple Zs are ne → Only worry about how all the missing variables generate bias in aggregate R e ‘coe cient of determination’ is symmetrical in bivariate regressions: RZ∼D = var( ˆ Z) var(Z) = − var(Z D) var(Z) = cor(Z, ˆ Z) = cor(Z, D) and so is partial R RZ∼D X = − var(Z D, X) var(Z X)) = cor(r(Z X), r(D X)) = RD∼Z X So we don’t need to worry about the ‘direction’ of δ

FINALLY, A GOOD USE FOR R-SQUARED 8 e relevant quantities
are now: → How much variance in Y is explained by Z controlling for everything else: RY∼Z X,D → How much variance in D is explained by Z controlling for everything else: RD∼Z X ere can be lots of Zs working in concert → eir scales are irrelevant now we work in variance explained, a.k.a. ‘explanatory power’ D X єY єD Z Y τ R D∼Z X R Y∼Z X,D

GRAPHICAL REPORTING 9

GRAPHICAL REPORTING 9 R RVq = RY∼Z X,D = RD∼Z
X → how strong Z has to be to reduce the ˆ τ by a factor of q → RV to reduce ˆ τ to zero (about . ) → e bottom le to top right diagonal in the diagram

GRAPHICAL REPORTING 9 R RVq = RY∼Z X,D = RD∼Z
X → how strong Z has to be to reduce the ˆ τ by a factor of q → RV to reduce ˆ τ to zero (about . ) → e bottom le to top right diagonal in the diagram C → the maximum e ect that a Z not more than k times as strong as, ‘Female’ would have on ˆ τ Upper bounds on the e ects of multiple Zs

TABULAR REPORTING 10 → Unmeasured confounders with equal e ect
on D and Y would have to explain . % to reduce ˆ τ = . to zero → or . % to make ˆ τ not statistically distinguishable from zero (at the % level) → if confounders explained % of the residual variance of the Y, they would need to explain at least . % of the residual variance of the D to reduce ˆ τ to zero → the footnote shows the two relevant quantities for a Z like ‘Female’. Notice that % < . %, so even in the worst case scenario above, ˆ τ would not be reduced to zero

ASIDE: FUNCTIONAL FORM 11 Sensitivity to misspeci cation comes for
free! is is also a test for the consequencing of being wrong about functional form

SENSITIVITY ANALYSIS 12 T ? What thresholds are reasonable for
sensitivity testing? → Terrible question: this question makes (almost) no sense E [ e] transition from a qualitative to a quantitative discussion about unobserved confounding can o en be enlightening A sensitivity analysis raises the bar for the sceptic of a causal estimate – not just any criticism can invalidate the research conclusions e hypothesized confounder now much meet certain standards of strength; otherwise it cannot logically account for all the observed association [It] also raises the bar for defending a causal interpretation of an estimate – proponents must articulate how confounders with certain strengths can be ruled out

REFERENCES 13 Cinelli, C., & Hazlett, C. ( ). Making
sense of sensitivity: Extending omitted variable bias. Journal of the Royal Statistical Society: Series B (Statistical Methodology), ( ), – . Conley, T. G., Hansen, C. B., & Rossi, P. E. ( ). Plausibly exogenous. e Review of Economics and Statistics, ( ), – . Ding, P., & Miratrix, L. W. ( ). To adjust or not to adjust? Sensitivity analysis of m-bias and butter y-bias. Journal of Causal Inference, ( ), – . van Kippersluis, H., & Rietveld, C. A. ( ). Beyond plausibly exogenous. e Econometrics Journal, ( ), – .

Causal Inference 2022 Week 11

Causal Inference 2022 Week 11

Will Lowe

More Decks by Will Lowe

Featured

Transcript

CAUSAL INFERENCE Sensitivity analysis 2022-05-02

SENSITIVITY ANALYSIS 1 S → Collider bias: lolnope! (but see

SENSITIVITY ANALYSIS 1 S → Collider bias: lolnope! (but see

EXAMPLE 2 e e ect harm from janjaweed attacks on

UNMEASURED CONFOUNDING 3 D X єY єD Z Y Possible

OMITTED VARIABLES 4 In Cinelli and Hazlitt’s notation, what we

OMITTED VARIABLES 4 In Cinelli and Hazlitt’s notation, what we

OMITTED VARIABLES FORMULA 5 en ˆ τ res = cov(r(D

OMITTED VARIABLES FORMULA 5 en ˆ τ res = cov(r(D

BIAS IS IMBALANCE TIMES IMPACT 6 e only way in

BIAS IS IMBALANCE TIMES IMPACT 6 e only way in

LIMITATIONS 7 → Hard to interpret if Z is not

LIMITATIONS 7 → Hard to interpret if Z is not

FINALLY, A GOOD USE FOR R-SQUARED 8 e relevant quantities

GRAPHICAL REPORTING 9

GRAPHICAL REPORTING 9 R RVq = RY∼Z X,D = RD∼Z

GRAPHICAL REPORTING 9 R RVq = RY∼Z X,D = RD∼Z

TABULAR REPORTING 10 → Unmeasured confounders with equal e ect

ASIDE: FUNCTIONAL FORM 11 Sensitivity to misspeci cation comes for

SENSITIVITY ANALYSIS 12 T ? What thresholds are reasonable for

REFERENCES 13 Cinelli, C., & Hazlett, C. ( ). Making