Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Causal Inference 2022 Week 7

Will Lowe
March 28, 2022
10

Causal Inference 2022 Week 7

Will Lowe

March 28, 2022
Tweet

Transcript

  1. PLAN 1 → Why care about collider bias? → ‘Explaining

    away’ in probabilities (and in your head) → Album charts → Selection on the dependent variable → Learning in (and from) social networks → Administrative data problems
  2. COLLIDERS EVERYWHERE 2 e social sciences have huge numbers of

    names for only apparently distinct forms of bias (Hern´ an et al., ) C X Z Y → Confounding bias → (Only for economists: selection bias / endogeneity) M X Z Y → Bad controls (Angrist & Pischke, , sec. . . ) → Post-treatment bias C X Z Y → Sample selection bias → Ascertainment bias → Truncation bias → Selecting on the dependent variable → Attrition bias
  3. EXPLAINING AWAY 4 R I B Physically, the the Brightness

    B of a surface is a combination of → surface illumination (I, larger when not in shadow) → intrinsic re ectance (R, larger for lighter colours) So what to conclude about re ectance? B P(R B, I) ∝ P(B I, R) mechanism P(I, R) environment → BA = BB = b → IB < IA because it is in shadow Conditional on this information P(RA BA = b, IA = high) < P(RB BB = b, IB = low)
  4. THREE WAYS TO CONDITION ON A COLLIDER 5 (Elwert &

    Winship, , Fig. ) C is the collider → Conditioning on C generates non-causal association between its causes → Conditioning on any consequence of C generates non-causal association between C’s causes −3 −2 −1 0 1 2 3 −2 0 2 Mathematics Reading/Writing Admission No Yes
  5. EVEN IF YOU’RE NOT LOOKING 6 A reminder of our

    old friend from week X Y Z єY
  6. EVEN IF YOU’RE NOT LOOKING 7 A reminder of our

    old friend from week X Y Z єY S Recall: єY represents all the other causes of Y (except Z) that don’t also cause X Controlling for Z was good → until we selected on S B → Unconditionally balanced across values of X (and Z) → Unbalanced, conditional on S In regression speak → “ e errors are correlated with independent variables” We care because (OLS) model residuals r = Y − ˆ Y are always uncorrelated with X Yi = E[Yi Xi Zi] + єi so when r no longer tracks є we’re going to be wrong about the expectation (and causation)
  7. THE NATURE OF SELECTION 8 S Conditioning involves at least

    one of: → making C an explanatory variable in a regression model → selecting and analyzing data where C = k (or C > k) In the latter case C either is, or drives a sample selection node P Adminstrative data is almost always the result of a selection process → Where are the selection nodes? → What is selected on and what comes along with it as correlates? this is evolutionary theory’s distinction between ‘selection for’ and ‘selection of’ (Sober, ) (Sober, )
  8. AND WHEN THERE IS ALSO AN EFFECT 9 Why are

    albums on Rolling Stone’s Best Albums (R= ) less likely to top the Billboard charts (B= )? (Elwert & Winship, ) D → Rolling Stone’s best albums → albums from the Billboard charts → S is a sample selection indicator A → causal association between B and R → non-causal association between B and R due to conditioning on S → B = ‘explains away’ R = (and vice versa) e sign of the nal association depends on all three causal e ects R When B → S and R → S share a sign → negative non-causal association to B → R otherwise positive.
  9. FRIENDS WITHOUT BENEFITS 10 (Christakis & Fowler, ) S →

    Why do your friends seem to have more friends than you? → Do fat friends make you fat? → Is smoking contagious? T You just suck? No, a randomly chosen friend of yours just has higher network centrality than a randomly chosen person S → Not everyone is on a social network → Not everyone is friends with everyone else is can mean trouble for causal inference
  10. ANTISOCIAL NETWORKS 11 Nothing about con ict remains local in

    space (or time Ghobarah et al., ) S Optional treatments, e.g. information on how to apply for a business loan → Treatment assignment: we tell you how to apply for a loan → Not treatment assignment: you tell your friends or family members and they apply too Now some of your ‘untreated’ friends are treated → is ‘non-compliance’ is a SUTVA violation → Typically, treatment assignment functions have to model spillover (see, e.g. Aronow et al., )
  11. IDENTIFYING PEER PRESSURE 12 P Example from Shalizi and omas

    ( , blog precis: link) Your friend Joey jumped o a bridge, why would you jump too? → Joey inspires you: social contagion or in uence → Joey infects you with a parasite which suppresses fear of falling: actual contagion → You’re friends because you both like to jump o bridges: manifest homophily → You’re friends because you both like roller-coasters, and have a common risk-seeking propensity: latent homophily → Because you’re both on it when it starts collapsing and that’s the only way o : external causation” How about...not?
  12. CONTAGION VS HOMOPHILY 13 E You (i in blue) and

    Joey (j in black) have → current observed behaviour (Yit, Yjt) a → past observed behaviour (Yit− , Yjt− ) → unobserved characteristics (Ui, Uj) → friendship indicator, e.g. Fij, being Facebook friends T Identify just the causal e ect of Joey jumping Yjt− on whether you jump Yjt from social network data P (Non-parametric) causal inference is not possible Yt Yt Yt− Yt− U U F (simpli ed from Shalizi & omas, )
  13. ASIDE: M-BIAS 14 C e basic structural problem we have

    here is similar to m-bias in regression Z is correlated with X and correlated with Y → Nevertheless, conditioning on Z will wreck identi cation X Z Y U V D Conditioning on Z stops it confounding → but adds some collider bias Who knows how big each is? X Z Y U V
  14. POST TREATMENT BIAS 15 R e role of suspect race

    in stylized Stop-question-frisk → Race (R) → Stopped by police (S) → Use of force (F) R S F → Questioning (Q) which generates a report a.k.a. our data set
  15. POST TREATMENT BIAS 15 R e role of suspect race

    in stylized Stop-question-frisk → Race (R) → Stopped by police (S) → Use of force (F) R S F Q → Questioning (Q) which generates a report a.k.a. our data set
  16. POST TREATMENT BIAS 16 R S Q F E e

    causal e ect of race on use of force G → Should we condition on S? Why (or why, or why not?) → Do we condition on S → Could we not condition on S if we wanted to? A : → Q measures S: P(Q = S = ) → If it measures it well, then it’s almost as good as observing S directly S Q Q Q є є є Use Bayes eorem to compute E[S Q , Q , Q ]
  17. POST TREATMENT BIAS: POLICING 17 M → Behaviour (B) causes

    stops (S) and the use of force (F), conditional on S → Unmeasured animosity (A) causes stops and the use of force F, conditional on S R S Q F A B Where is the collider bias? P → An important estimand that’s harder than expected to de ne and estimate → Direct experimentation would be unethical → Multiple paths to an undesirable outcome (mediation) → Not everything can be measured (IV) → Data available only on part of the process (conditioning) → Sample selection built into the data generation process
  18. POST TREATMENT BIAS: POLICING 17 M → Behaviour (B) causes

    stops (S) and the use of force (F), conditional on S → Unmeasured animosity (A) causes stops and the use of force F, conditional on S R S Q F A B Where is the collider bias? P → An important estimand that’s harder than expected to de ne and estimate → Direct experimentation would be unethical → Multiple paths to an undesirable outcome (mediation) → Not everything can be measured (IV) → Data available only on part of the process (conditioning) → Sample selection built into the data generation process
  19. POST TREATMENT BIAS: HEALTH STATISTICS 18 M → Infection (R)

    causes hospitalization (S), but so does smoking (B) and diet (A) → All of these cause intensive care (F) R S Q F A B (Gri th et al., ) P → An important estimand that’s harder than expected to de ne and estimate → Direct experimentation would be unethical → Multiple paths to an undesirable outcome (mediation) → Not everything can be measured (IV) → Data available only on part of the process (conditioning) → Sample selection built into the data generation process
  20. MORE COLLISIONS 19 ere is only one good cartoon about

    collider bias, and this is it. → Why would it be rational to respond poorly?
  21. REFERENCES 20 Angrist, J. D., & Pischke, J.-S. ( ).

    Mostly harmless econometrics: An empiricist’s companion. Princeton University Press. Aronow, P. M., Eckles, D., Samii, C., & Zonszein, S. ( , January ). Spillover e ects in experimental data (arXiv No. . ). Christakis, N. A., & Fowler, J. H. ( ). e spread of obesity in a large social network over years. New England Journal of Medicine, ( ), – . Elwert, F., & Winship, C. ( ). Endogenous selection bias: e problem of conditioning on a collider variable. Annual Review of Sociology, ( ), – . Ghobarah, H. A., Huth, P., & Russett, B. ( ). Civil wars kill and maim people – long a er the shooting stops. e American Political Science Review, ( ), – . Gri th, G., Morris, T. T., Tudball, M., Herbert, A., Mancano, G., Pike, L., Sharp, G. C., Palmer, T. M., Davey Smith, G., Tilling, K., Zuccolo, L., Davies, N. M., & Hemani, G. ( , May ). Collider bias undermines our understanding of COVID- disease risk and severity (preprint). Hern´ an, M. A., Hern´ andez-D´ ıaz, S., & Robins, J. M. ( ). A structural approach to selection bias. Epidemiology, ( ), – .
  22. REFERENCES 21 Shalizi, C. R., & omas, A. C. (

    ). Homophily and contagion are generically confounded in observational social network studies. Sociological Methods & Research, ( ), – . Sober, E. ( ). e nature of selection. Chicago University Press. Sober, E. ( ). Natural selection, causality, and laws: What Fodor and Piatelli-Palmarini got wrong. Philosophy of Science, ( ), – .