Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Causal Inference 2022 Week 3

Will Lowe
February 27, 2022
16

Causal Inference 2022 Week 3

Will Lowe

February 27, 2022
Tweet

Transcript

  1. EXPERIMENTS VS THE REST? 1 An old but popular view:

    S Randomization (and randomized controlled trials) are the gold standard for causal inference. Everything else is → at best a quasi-experiment → at worst ‘mere description’ and possibly...mush C RCTs lack external validity e world is terribly complicated and interconnected (Cartwright, ) James Heathers is grumpy about animal study headlines
  2. EXPERIMENTS VS THE REST? 2 What we’ll argue here: S

    Randomization and RCTs are great, but as soon as they go wrong → we’ll need all the tools from observational causal inference to x them C RCTs can lack external validity → at’s part of why we like them → we’ll still need all the tools from observational causal inference to generalize them Say what you like about the anti-RCT crowd...
  3. EXPERIMENTS AND CAUSAL EFFECTS 3 Why so serious (about experiments)?

    An operational equivalence: → e change in Y when you step into a system to change X → the change in Y when you randomise X in a (large enough) experiment Some types: → Lab experiments → Field experiments → ‘Natural’ experiments in rough order of → how seriously people take them → how hard they are to analyze
  4. A FIELD EXPERIMENT 4 Gerber et al. ( ) tried

    to get eligible voters in New Haven to actually vote in a primary election by sending them postcards Past attempts: → telephone calls → personal visits Each of four postcard messages was a randomized treatment X for about , households → Voting is your civic duty
  5. A FIELD EXPERIMENT 4 Gerber et al. ( ) tried

    to get eligible voters in New Haven to actually vote in a primary election by sending them postcards Past attempts: → telephone calls → personal visits Each of four postcard messages was a randomized treatment X for about , households → Voting is your civic duty
  6. A FIELD EXPERIMENT 5 Gerber et al. ( ) tried

    to get eligible voters in New Haven to actually vote by sending them postcards Past attempts: → telephone calls → personal visits Each of four postcard messages was a randomized treatment X for about , households → Voting is your civic duty → You are being studied (by us)
  7. A FIELD EXPERIMENT 6 Gerber et al. ( ) tried

    to get eligible voters in New Haven to actually vote by sending them postcards Past attempts: → telephone calls → personal visits Each of four postcard messages was a randomized treatment X for about , households → Voting is your civic duty → You are being studied (by us) → We know whether you voted last time
  8. A FIELD EXPERIMENT 7 Gerber et al. ( ) tried

    to get eligible voters in New Haven to actually vote by sending them postcards Past attempts: → telephone calls → personal visits Each of four postcard messages was a randomized treatment X for about , households → Voting is your civic duty → You are being studied (by us) → We know whether you voted last time → And now your neighbours know too...
  9. WHAT HATH RANDOMIZATION WROUGHT? 11 Demographics (D) make it both

    more likely you’ll get a GOTV mailing, e.g. a postcard (P) and also more likely you’ll turn out to vote (T) єP P D T єT єD → Average T for values of P are not informative about the e ect of P T( ), T( )  ⊥ ⊥ P
  10. WHAT HATH RANDOMIZATION WROUGHT? 11 Demographics (D) make it both

    more likely you’ll get a GOTV mailing, e.g. a postcard (P) and also more likely you’ll turn out to vote (T) єP P D T єT єD → Average T for values of P are not informative about the e ect of P T( ), T( )  ⊥ ⊥ P Randomizing P breaks the connection between D and P (because it is an intervention to replace єP with єR). D is now ‘noise’ єP єR P D T єT єD → Average T for values of P are informative about the e ect of P T( ), T( ) ⊥ ⊥ P
  11. WHAT HATH RANDOMIZATION WROUGHT? 12 I → Identi es the

    e ect of P on T → generates approximate balance in D → generates approximate balance in D even if it is unobserved → vastly simpli es uncertainty computations, e.g. con dence intervals for the e ect of P on T → allows e ect estimation. (Remember, things like ATEs are population-speci c)
  12. WHAT HATH RANDOMIZATION WROUGHT? 12 I → Identi es the

    e ect of P on T → generates approximate balance in D → generates approximate balance in D even if it is unobserved → vastly simpli es uncertainty computations, e.g. con dence intervals for the e ect of P on T → allows e ect estimation. (Remember, things like ATEs are population-speci c) S → Variation in T is a function of єD and єT → Variation in T conditional on D is only a function of єT
  13. WHAT HATH RANDOMIZATION WROUGHT? 12 I → Identi es the

    e ect of P on T → generates approximate balance in D → generates approximate balance in D even if it is unobserved → vastly simpli es uncertainty computations, e.g. con dence intervals for the e ect of P on T → allows e ect estimation. (Remember, things like ATEs are population-speci c) S → Variation in T is a function of єD and єT → Variation in T conditional on D is only a function of єT єP єR P D T єT єD
  14. WHAT HATH RANDOMIZATION WROUGHT? 12 I → Identi es the

    e ect of P on T → generates approximate balance in D → generates approximate balance in D even if it is unobserved → vastly simpli es uncertainty computations, e.g. con dence intervals for the e ect of P on T → allows e ect estimation. (Remember, things like ATEs are population-speci c) S → Variation in T is a function of єD and єT → Variation in T conditional on D is only a function of єT єP єR P D T єT єD S → Variation in causal e ect estimates depends on variation in the potential outcomes → So we should∗ condition on D when estimating P → T * statistically speaking
  15. CONTROL: BLOCKING AND CONDITIONING 13 Gerber et al. ( )

    do a bit of both ey block using the postal route (it’s a bit unclear from the paper) and statistically control for a set of known predictors of voting in primaries: turnout history in previous primary nd general elections, gender, number of registered voters in the household, and age. Either way: → More precision in estimating T( ) and/or T( ), then more precision for the ATE → Not always (Freedman, ), but mostly (Lin, ) e smaller the experiment (< cases), the more that blocking is preferable → Removes chance imbalance between P and T
  16. NOT QUITE RANDOMIZATION 14 L If єP were somehow observed,

    D would still connected to P, but we could → partition variation in P between D and єP → isolate the variation in T due to єP A ‘natural’ experiment... In general we are always hunting for independent noise like єP to leverage → Randomize P to generate it → Control for D to isolate it → Model P = f (A,єP ), where A is an instrument to infer it єP P D T єT єD A єA єP P D T єT єD
  17. THINGS FALL APART 15 In many experimental situations, people don’t

    – or can’t – ‘comply’ with their treatment assignments
  18. THINGS FALL APART 15 In many experimental situations, people don’t

    – or can’t – ‘comply’ with their treatment assignments Call the assignment A T - - Anything can happen in the A → P relationship For binary A and P A = A = P = . . P = . .
  19. THINGS FALL APART 15 In many experimental situations, people don’t

    – or can’t – ‘comply’ with their treatment assignments Call the assignment A T - - Anything can happen in the A → P relationship For binary A and P A = A = P = . . P = . . O - - Nobody is treated despite being assigned not to be in the control group A = A = P = . . P = × . is is a form of monotonicity assumption → Assignment never discourages (although it might not encourage either)
  20. COMPLIANCE 16 Grant rejection For much policy work, expect one-sided

    non-compliance → You can change a law, not everyone will follow it → When it would be unethical to coerce, you ‘encourage’: invitations, coupons, cheques in the mail In the vote experiment → Accidentally toss it as junk mail → Deliberately toss it as junk mail → e postal service could lose it
  21. TROUBLE WITH ONESIDEDNESS 17 Non-compliance breaks our experiment → we

    are not randomizing P anymore So what to do with one-sided non-compliance? → If we never observe it, we can’t do much! → What if we observe both A and P? A єA єP P D T єT єD Some natural options: . Compare A = to A = . Compare P = to A = (de nitely untreated) . Compare P = to everyone else
  22. TROUBLE WITH ONESIDEDNESS 17 Non-compliance breaks our experiment → we

    are not randomizing P anymore So what to do with one-sided non-compliance? → If we never observe it, we can’t do much! → What if we observe both A and P? A єA єP P D T єT єD Some natural options: . Compare A = to A = . Compare P = to A = (de nitely untreated) . Compare P = to everyone else None of these are good! → successfully answers a di erent question. Maybe we like that question! → and recreates an observational study. If there are common causes of not taking treatment that also a ect outcomes, it is confounded
  23. TROUBLE WITH ONESIDEDNESS 18 A єA єP P D T

    єT єD Our treatment variable P is now → P(A= ) = if assigned, and got one → P(A= ) = if assigned, but didn’t get treated One-sided non-compliance implies → P(A= ) = We can now de ne two types of subject (Angrist et al., ) Complier ∶ P(A= ) = and P(A= ) = Never taker ∶ P(A= ) = and P(A= ) = We can’t know for sure who is in which group → We only see one of P(A= ) and P(A= ) But we can see the consequences...
  24. RECONSIDERING THE OPTIONS 19 C = = → (Compliers +

    Never takers) vs (Compliers + Never takers) C = = → Compliers vs (Compliers + Never takers) C = → Compliers vs (Compliers + Never takers) It’s not hard to imagine that Compliers are not really comparable to Never takers → It’s hard to learn too much about them → But not impossible (Marbach & Hangartner, )
  25. ESTIMATION 20 It’s useful here to de ne some new

    causal e ects I Two estimable e ects of A, on P on T: ITT = E[T(A= ) − T(A= )] ρcomp = E[P(A= ) − P(A= )] = P(Complier)
  26. ESTIMATION 20 It’s useful here to de ne some new

    causal e ects I Two estimable e ects of A, on P on T: ITT = E[T(A= ) − T(A= )] ρcomp = E[P(A= ) − P(A= )] = P(Complier) C ITTc = E[T( ) − T( ) P(A= ) = ]
  27. ESTIMATION 20 It’s useful here to de ne some new

    causal e ects I Two estimable e ects of A, on P on T: ITT = E[T(A= ) − T(A= )] ρcomp = E[P(A= ) − P(A= )] = P(Complier) C ITTc = E[T( ) − T( ) P(A= ) = ] which we can estimate because ITT = ITTc ρc + ITTa ρa + ITTd ρd + ITTn ρn = ITTc ρc
  28. ESTIMATION 20 It’s useful here to de ne some new

    causal e ects I Two estimable e ects of A, on P on T: ITT = E[T(A= ) − T(A= )] ρcomp = E[P(A= ) − P(A= )] = P(Complier) C ITTc = E[T( ) − T( ) P(A= ) = ] which we can estimate because ITT = ITTc ρc + ITTa ρa + ITTd ρd + ITTn ρn = ITTc ρc A єA єP P D T єT єD With an exclusion restriction → A does not a ect T except through P e complier average treatment e ect is ITT ρc
  29. BOTH SIDES! 21 N , Always taker ∶ P(A= )

    = and P(A= ) = Complier ∶ P(A= ) = and P(A= ) = De er ∶ P(A= ) = and P(A= ) = Never taker ∶ P(A= ) = and P(A= ) = We don’t know the proportions of each type so it seems like anything can happen N , → monotonicity: ere are no De ers A єA єP P D T єT єD R IV → Being an instrument is representable in the graph → monotonicity is not because it’s a functional form restriction
  30. EXTERNAL VALIDITY 22 ‘External validity’ asks the question of generalizability:

    To what populations, settings, treatment variables, and mea- surement variables can this e ect be generalized? (Shadish et al., ) An experiment is said to have “external validity” if the distri- bution of outcomes realized by a treatment group is the same as the distribution of outcome that would be realized in an actual program. (Manski, ) Extrapolation across studies requires some understanding of the reasons for the di erences. (Cox, )
  31. EXTERNAL VALIDITY 23 Average treatment e ects are averages (duh)

    → over individuals → over subgroups, e.g. ATE = ρATT + ( − ρ)ATC where ρ is the probability of treatment
  32. EXTERNAL VALIDITY 23 Average treatment e ects are averages (duh)

    → over individuals → over subgroups, e.g. ATE = ρATT + ( − ρ)ATC where ρ is the probability of treatment S In linear models, subgroup e ects are estimated with interactions T = β + PβP + DβD + (P × D)βPD + є ATE(D) = βP + DβPD
  33. EXTERNAL VALIDITY 23 Average treatment e ects are averages (duh)

    → over individuals → over subgroups, e.g. ATE = ρATT + ( − ρ)ATC where ρ is the probability of treatment S In linear models, subgroup e ects are estimated with interactions T = β + PβP + DβD + (P × D)βPD + є ATE(D) = βP + DβPD єP P D T єT єD ATE = d ATE(d)P(D = d) Average treatment e ects are averages!
  34. EXTERNAL VALIDITY 24 We sometimes hear dark warnings about generalizing

    e ects to new populations → is can work when the e ect is constant → But usually not when if the e ect di ers by group, because group distributions may also di er How do we transport them? James Heathers still grumpy about animal study headlines
  35. EFFECT TRANSPORTATION 25 Populations can di er by → Propensity

    to be treated (by P) → eir distribution of subgroups (by D)
  36. EFFECT TRANSPORTATION 25 Populations can di er by → Propensity

    to be treated (by P) → eir distribution of subgroups (by D) If we are learning about the causal e ect of P then we actually don’t need to worry about the distribution of P in the new population → e causal e ect is conditional on P by de nition Di erences in propensity to receive treat- ment do not matter for transportability of causal e ects. What matters are potential e ect-modi ers. (Cinelli & Bareinboim, ) Trouble when the distribution of e ect modi ers changes: P(D) ≠ P(D∗) → happily we can measure that
  37. EFFECT TRANSPORTATION 25 Populations can di er by → Propensity

    to be treated (by P) → eir distribution of subgroups (by D) If we are learning about the causal e ect of P then we actually don’t need to worry about the distribution of P in the new population → e causal e ect is conditional on P by de nition Di erences in propensity to receive treat- ment do not matter for transportability of causal e ects. What matters are potential e ect-modi ers. (Cinelli & Bareinboim, ) Trouble when the distribution of e ect modi ers changes: P(D) ≠ P(D∗) → happily we can measure that So to infer the e ect on the new population (Bareinboim & Pearl, ) → Gather up subgroup e ects, e.g. ATE(D) → Measure the new subgroup population distribution P(D∗) → Reconstruct a new ATE, weighting by ATE(D) by P(D∗)
  38. UNREPRESENTATIVENESS FTW 26 I Generalization failures are due to population

    composition changes Consequently, representativeness is less helpful than ‘coverage’: lots of data on subgroups of interest → Rothman et al. ( ) ‘Why representativeness should be avoided’ → Harrell ( ) ‘Implications of interactions in treatment comparisons’
  39. THE WILD FRONTIER 27 Experiments with SUTVA violations and other

    practical problems → Spillover (Aronow et al., ) → Bookmark: https://egap.org/methods-guides/ Experiments as optimizations → Optimal experimental design (Smucker et al., ) → Blocking and randomization trade-o s (Kasy, ) A a.k.a. bandits → Batch experimentation (O er-Westort et al., ) → Tracking moving e ects (mostly in industrial applications) Pearl Hart - ?
  40. REFERENCES 29 Angrist, J. D., Imbens, G. W., & Rubin,

    D. B. ( ). “Identi cation of causal e ects using instrumental variables.” Journal of the American Statistical Association, ( ), – . Aronow, P. M., Eckles, D., Samii, C., & Zonszein, S. ( , January ). Spillover e ects in experimental data (arXiv No. . ). Bareinboim, E., & Pearl, J. ( ). “Causal inference and the data-fusion problem.” Proceedings of the National Academy of Sciences, ( ), – . Cartwright, N. ( ). “Hunting causes and using them: Approaches in philosophy and economics.” Cambridge University Press. Cinelli, C., & Bareinboim, E. ( , September). Generalizability in Causal Inference. University of Caifornia at Riverside. Cox, D. R. ( ). “Some problems connected with statistical inference.” e Annals of Mathematical Statistics, ( ), – . Freedman, D. A. ( ). “Randomization does not justify logistic regression.” Statistical Science, ( ), – . Gerber, A. S., Green, D. P., & Larimer, C. W. ( ). “Social pressure and voter turnout: Evidence from a large-scale eld experiment.” American Political Science Review, ( ).
  41. REFERENCES 30 Kasy, M. ( ). “Why experimenters might not

    always want to randomize, and what they could do instead.” Political Analysis, ( ), – . Lin, W. ( ). “Agnostic notes on regression adjustments to experimental data: Reexamining Freedman’s critique.” e Annals of Applied Statistics. Manski, C. F. ( ). “Identi cation for Prediction and Decision.” Harvard University Press. Marbach, M., & Hangartner, D. ( ). “Pro ling compliers and noncompliers for instrumental-variable analysis.” Political Analysis, ( ), – . O er-Westort, M., Coppock, A., & Green, D. P. ( ). “Adaptive experimental design: Prospects and applications in political science.” American Journal of Political Science. Rothman, K. J., Gallacher, J. E., & Hatch, E. E. ( ). “Why representativeness should be avoided.” International Journal of Epidemiology, ( ), – . Shadish, W. R., Cook, T. D., & Campbell, D. T. ( ). “Experimental and quasi-experimental designs for generalized causal inference.” Houghton Mi in. Smucker, B., Krzywinski, M., & Altman, N. ( ). “Optimal experimental design.” Nature Methods, ( ), – .