Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Causal Inference 2022 Week 12

Will Lowe
May 09, 2022
11

Causal Inference 2022 Week 12

Will Lowe

May 09, 2022
Tweet

Transcript

  1. MANIPULABILITY 2 T No causation without manipulation (Holland, ) I

    If you can’t see how you’d manipulate it, it can’t be a cause. Some unintuitive consequences. For Holland → Gender, race don’t cause anything → e relevant counterfactuals don’t exist Sometimes this can prompt a helpful reconceptualisation → ‘e ect of race’ vs ‘e ect of racism’ R Unpack into manipulable components: the variable “sex” earns the label “cause” by virtue of having responders such as “hormone content” or “height” which are gender dependent (Pearl, ) H Unmanipulable features used to be called ‘immutable’ or ‘essential’. Contrasted with ‘mutable’ or ‘accidental’ features → e.g. Aristotle (Categories, ch. , Metaphysics, : )
  2. 3

  3. NAMING AND NECESSITY 4 A Agan and Starr ( )

    → , ctitious job applications → employers in NJ and NYC → varying race and felony convictions → measured the callback rates Callback rates before legislation (with and without the box) Callback rates for employers that removed the box to comply with legislation
  4. MORE TROUBLE 5 If race is the treatment, is everything

    post-treatment? Moreover, race is unstable as a variable (Sen & Wasow, )
  5. COMPOSITE TREATMENTS 7 For Sen and Wasow the ‘problem’ with

    race as a variable is that it is an ill-de ned or at least composite treatment Many examples of these → Democracy a a bundle of institutions → Obesity as a bundle of habits and measurements → Gender as a bundle of biological and social features
  6. TESTOSTERONE? 8 Sapienza et al. ( ) ‘Gender di erences

    in nancial risk aversion and career choices are a ected by testosterone’ Coates et al. ( ) ‘Second-to-fourth digit ratio predicts success among high-frequency nancial traders’
  7. OR GENDER IDENTITY 9 We can also think of gender

    as an aspect of social identity Turns out, that’s really individually variable - you can prime it (Willer et al., ) T Di erent components of ‘gender’ cause di erent outcomes → We don’t need to nd an essence (which is fortunate) D’Acunto ( ) ’Identity, overcon dence, and investment decisions’
  8. MEASUREMENT MODELS 11 Q . . . QK (e.g., K

    survey questions) measure a C (e.g., ideology, support, neuroticism, math skills) if P(Q . . . QK C) = K k P(Qk C) is is o en called local independence We generate a measurement most straightforwardly by asserting a prior over C P(C) applying Bayes theorem P(Ci Qi . . . QiK ) = ∏K k P(Qik Ci ) P(Ci ) P(Qi . . . QiK ) and using the posterior expectation C Di erent choices of P(Qk C) and P(C) give di erent measurement models A Causal structures like this make measurement assumptions true C Q Q Q є є є
  9. MEASUREMENT FAILURES 12 When items are associated even conditional on

    C C Q Q Q we might ask why: → C is not what we think it is → Q is measures more than C, e.g. the ‘double-barreled’ question → Q depend on context: a.k.a. di erential item functioning C C Q Q Q S є є є But we know that’s not the only reason variables are associated → What are the others?
  10. MEASUREMENT FAILURES 13 D C Q Q Q є є

    є Association may also indicate → Q cues or primes Q C / C Q Q Q S є є є → “Only t our model to people who responded to all the questions” → Normally a so ware problem: under local independence, Bayes theorem will deal just ne...
  11. MISSING DATA 14 S → which case is observed, e.g.

    in matching → which variable value is observed E → Age (A), Gender (G), Obesity (O) → Reported (R = ) O∗ = ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ O if R = missing otherwise I If R is (or can be made) independent of the variables with missing data (here, O) then we can progress (Mohan & Pearl, ) F G A O G A O O∗ R Age and gender cause obesity, but it’s not always reported (R) → Can we just use the cases we observe to learn about P(O, G, A)? Here yes, because O, G, A ⊥ ⊥ R
  12. QUICK PROOF 15 F G A O G A O

    O∗ R How does that independence help? P(O, G, A) = P(O, G, A R = ) because O, G, A ⊥ ⊥ R = P(O∗ , G, A R = ) by the def. of O∗
  13. MISSING DATA 16 S G A O O∗ R Age

    makes people less likely to report obesity → Can we just use the cases we observe to learn about P(O, G, A)? Here no, because O, G, A ⊥ ⊥ R Happily it is the case that O, G ⊥ ⊥ R A because A d-separates R from O and G How does that conditional independence help? P(O, G, A) = P(O, G A)P(A) conditional prob = P(O, G A, R = )P(A) O, G ⊥ ⊥ R A = P(O∗ , G A, R = )P(A) by the def. of O∗ T → Compute P(A) → Compute P(O∗ , G A, R = ) where O is observed then combine them for whatever your purposes
  14. MISSING DATA 17 T G A O O∗ R Obese

    people are less likely to report obesity → Can we just use the cases we observe to learn about P(O, G, A)? No. ere is nothing to d-separate R from the rest O (Little & Rubin, ; Rubin, ) → Our rst problem is ‘Missing completely at random’ (MCAR) → Our second problem is ‘Missing at Random’ (MAR) → Our third problem is ‘Missing not at Random’ (MNAR) Most previous work prays for MCAR or tries to turn MNAR into MAR and uses multiple imputation → See Pepinsky ( ) for a cautionary note on multiple imputation
  15. PRACTICAL IMPLICATIONS 18 What to do about missing data? O

    → Use the fully observed cases a.k.a. listwise deletion → Build a model conditioned on G and A and use it to impute missing values of O a.k.a. multiple imputation → model the missingness mechanism, e.g. O → R in the third problem Which DAG determines which of these works and which is biased Real data plus data scientist
  16. 19

  17. 20

  18. REFERENCES 22 Agan, A., & Starr, S. ( ). Ban

    the box, criminal records, and racial discrimination: A eld experiment. e Quarterly Journal of Economics, ( ), – . Coates, J. M., Gurnell, M., & Rustichini, A. ( ). Second-to-fourth digit ratio predicts success among high-frequency nancial traders. Proceedings of the National Academy of Sciences, ( ), – . D’Acunto, F. ( ). Identity, overcon dence, and investment decisions. SSRN Electronic Journal. Holland, P. W. ( ). Statistics and causal inference. Journal of the American Statistical Association, ( ), – . Little, R. J. A., & Rubin, D. B. ( ). Statistical analysis with missing data. John Wiley & Sons. Mohan, K., & Pearl, J. ( ). Graphical models for processing missing data. Journal of the American Statistical Association, ( ), – . Pearl, J. ( ). Does obesity shorten life? Or is it the soda? On non-manipulable causes. Journal of Causal Inference, ( ). Pepinsky, T. ( ). A note on listwise deletion versus multiple imputation. Rubin, D. B. ( ). Inference and missing data. Biometrika, ( ), – .
  19. REFERENCES 23 Sapienza, P., Zingales, L., & Maestripieri, D. (

    ). Gender di erences in nancial risk aversion and career choices are a ected by testosterone. Proceedings of the National Academy of Sciences, ( ), – . Sen, M., & Wasow, O. ( ). Race as a bundle of sticks: Designs that estimate e ects of seemingly immutable characteristics. Annual Review of Political Science, ( ), – . Willer, R., Rogalin, C. L., Conlon, B., & Wojnowicz, M. T. ( ). Overdoing gender: A test of the masculine overcompensation thesis. American Journal of Sociology, ( ), – .