Will Lowe
March 28, 2022
6

# Causal Inference 2022 Week 7

March 28, 2022

## Transcript

1. ### CAUSAL INFERENCE Collider Bias Will Lowe Hertie School Data Science

Lab 2022-04-05
2. ### PLAN 1 → Why care about collider bias? → ‘Explaining

away’ in probabilities (and in your head) → Album charts → Selection on the dependent variable → Learning in (and from) social networks → Administrative data problems
3. ### COLLIDERS EVERYWHERE 2 e social sciences have huge numbers of

names for only apparently distinct forms of bias (Hern´ an et al., ) C X Z Y → Confounding bias → (Only for economists: selection bias / endogeneity) M X Z Y → Bad controls (Angrist & Pischke, , sec. . . ) → Post-treatment bias C X Z Y → Sample selection bias → Ascertainment bias → Truncation bias → Selecting on the dependent variable → Attrition bias

5. ### EXPLAINING AWAY 4 R I B Physically, the the Brightness

B of a surface is a combination of → surface illumination (I, larger when not in shadow) → intrinsic re ectance (R, larger for lighter colours) So what to conclude about re ectance? B P(R B, I) ∝ P(B I, R) mechanism P(I, R) environment → BA = BB = b → IB < IA because it is in shadow Conditional on this information P(RA BA = b, IA = high) < P(RB BB = b, IB = low)
6. ### THREE WAYS TO CONDITION ON A COLLIDER 5 (Elwert &

Winship, , Fig. ) C is the collider → Conditioning on C generates non-causal association between its causes → Conditioning on any consequence of C generates non-causal association between C’s causes −3 −2 −1 0 1 2 3 −2 0 2 Mathematics Reading/Writing Admission No Yes
7. ### EVEN IF YOU’RE NOT LOOKING 6 A reminder of our

old friend from week X Y Z єY
8. ### EVEN IF YOU’RE NOT LOOKING 7 A reminder of our

old friend from week X Y Z єY S Recall: єY represents all the other causes of Y (except Z) that don’t also cause X Controlling for Z was good → until we selected on S B → Unconditionally balanced across values of X (and Z) → Unbalanced, conditional on S In regression speak → “ e errors are correlated with independent variables” We care because (OLS) model residuals r = Y − ˆ Y are always uncorrelated with X Yi = E[Yi Xi Zi] + єi so when r no longer tracks є we’re going to be wrong about the expectation (and causation)
9. ### THE NATURE OF SELECTION 8 S Conditioning involves at least

one of: → making C an explanatory variable in a regression model → selecting and analyzing data where C = k (or C > k) In the latter case C either is, or drives a sample selection node P Adminstrative data is almost always the result of a selection process → Where are the selection nodes? → What is selected on and what comes along with it as correlates? this is evolutionary theory’s distinction between ‘selection for’ and ‘selection of’ (Sober, ) (Sober, )
10. ### AND WHEN THERE IS ALSO AN EFFECT 9 Why are

albums on Rolling Stone’s Best Albums (R= ) less likely to top the Billboard charts (B= )? (Elwert & Winship, ) D → Rolling Stone’s best albums → albums from the Billboard charts → S is a sample selection indicator A → causal association between B and R → non-causal association between B and R due to conditioning on S → B = ‘explains away’ R = (and vice versa) e sign of the nal association depends on all three causal e ects R When B → S and R → S share a sign → negative non-causal association to B → R otherwise positive.
11. ### FRIENDS WITHOUT BENEFITS 10 (Christakis & Fowler, ) S →

Why do your friends seem to have more friends than you? → Do fat friends make you fat? → Is smoking contagious? T You just suck? No, a randomly chosen friend of yours just has higher network centrality than a randomly chosen person S → Not everyone is on a social network → Not everyone is friends with everyone else is can mean trouble for causal inference
12. ### ANTISOCIAL NETWORKS 11 Nothing about con ict remains local in

space (or time Ghobarah et al., ) S Optional treatments, e.g. information on how to apply for a business loan → Treatment assignment: we tell you how to apply for a loan → Not treatment assignment: you tell your friends or family members and they apply too Now some of your ‘untreated’ friends are treated → is ‘non-compliance’ is a SUTVA violation → Typically, treatment assignment functions have to model spillover (see, e.g. Aronow et al., )
13. ### IDENTIFYING PEER PRESSURE 12 P Example from Shalizi and omas

( , blog precis: link) Your friend Joey jumped o a bridge, why would you jump too? → Joey inspires you: social contagion or in uence → Joey infects you with a parasite which suppresses fear of falling: actual contagion → You’re friends because you both like to jump o bridges: manifest homophily → You’re friends because you both like roller-coasters, and have a common risk-seeking propensity: latent homophily → Because you’re both on it when it starts collapsing and that’s the only way o : external causation” How about...not?
14. ### CONTAGION VS HOMOPHILY 13 E You (i in blue) and

Joey (j in black) have → current observed behaviour (Yit, Yjt) a → past observed behaviour (Yit− , Yjt− ) → unobserved characteristics (Ui, Uj) → friendship indicator, e.g. Fij, being Facebook friends T Identify just the causal e ect of Joey jumping Yjt− on whether you jump Yjt from social network data P (Non-parametric) causal inference is not possible Yt Yt Yt− Yt− U U F (simpli ed from Shalizi & omas, )
15. ### ASIDE: M-BIAS 14 C e basic structural problem we have

here is similar to m-bias in regression Z is correlated with X and correlated with Y → Nevertheless, conditioning on Z will wreck identi cation X Z Y U V D Conditioning on Z stops it confounding → but adds some collider bias Who knows how big each is? X Z Y U V
16. ### POST TREATMENT BIAS 15 R e role of suspect race

in stylized Stop-question-frisk → Race (R) → Stopped by police (S) → Use of force (F) R S F → Questioning (Q) which generates a report a.k.a. our data set
17. ### POST TREATMENT BIAS 15 R e role of suspect race

in stylized Stop-question-frisk → Race (R) → Stopped by police (S) → Use of force (F) R S F Q → Questioning (Q) which generates a report a.k.a. our data set
18. ### POST TREATMENT BIAS 16 R S Q F E e

causal e ect of race on use of force G → Should we condition on S? Why (or why, or why not?) → Do we condition on S → Could we not condition on S if we wanted to? A : → Q measures S: P(Q = S = ) → If it measures it well, then it’s almost as good as observing S directly S Q Q Q є є є Use Bayes eorem to compute E[S Q , Q , Q ]
19. ### POST TREATMENT BIAS: POLICING 17 M → Behaviour (B) causes

stops (S) and the use of force (F), conditional on S → Unmeasured animosity (A) causes stops and the use of force F, conditional on S R S Q F A B Where is the collider bias? P → An important estimand that’s harder than expected to de ne and estimate → Direct experimentation would be unethical → Multiple paths to an undesirable outcome (mediation) → Not everything can be measured (IV) → Data available only on part of the process (conditioning) → Sample selection built into the data generation process
20. ### POST TREATMENT BIAS: POLICING 17 M → Behaviour (B) causes

stops (S) and the use of force (F), conditional on S → Unmeasured animosity (A) causes stops and the use of force F, conditional on S R S Q F A B Where is the collider bias? P → An important estimand that’s harder than expected to de ne and estimate → Direct experimentation would be unethical → Multiple paths to an undesirable outcome (mediation) → Not everything can be measured (IV) → Data available only on part of the process (conditioning) → Sample selection built into the data generation process
21. ### POST TREATMENT BIAS: HEALTH STATISTICS 18 M → Infection (R)

causes hospitalization (S), but so does smoking (B) and diet (A) → All of these cause intensive care (F) R S Q F A B (Gri th et al., ) P → An important estimand that’s harder than expected to de ne and estimate → Direct experimentation would be unethical → Multiple paths to an undesirable outcome (mediation) → Not everything can be measured (IV) → Data available only on part of the process (conditioning) → Sample selection built into the data generation process
22. ### MORE COLLISIONS 19 ere is only one good cartoon about

collider bias, and this is it. → Why would it be rational to respond poorly?
23. ### REFERENCES 20 Angrist, J. D., & Pischke, J.-S. ( ).

Mostly harmless econometrics: An empiricist’s companion. Princeton University Press. Aronow, P. M., Eckles, D., Samii, C., & Zonszein, S. ( , January ). Spillover e ects in experimental data (arXiv No. . ). Christakis, N. A., & Fowler, J. H. ( ). e spread of obesity in a large social network over years. New England Journal of Medicine, ( ), – . Elwert, F., & Winship, C. ( ). Endogenous selection bias: e problem of conditioning on a collider variable. Annual Review of Sociology, ( ), – . Ghobarah, H. A., Huth, P., & Russett, B. ( ). Civil wars kill and maim people – long a er the shooting stops. e American Political Science Review, ( ), – . Gri th, G., Morris, T. T., Tudball, M., Herbert, A., Mancano, G., Pike, L., Sharp, G. C., Palmer, T. M., Davey Smith, G., Tilling, K., Zuccolo, L., Davies, N. M., & Hemani, G. ( , May ). Collider bias undermines our understanding of COVID- disease risk and severity (preprint). Hern´ an, M. A., Hern´ andez-D´ ıaz, S., & Robins, J. M. ( ). A structural approach to selection bias. Epidemiology, ( ), – .
24. ### REFERENCES 21 Shalizi, C. R., & omas, A. C. (

). Homophily and contagion are generically confounded in observational social network studies. Sociological Methods & Research, ( ), – . Sober, E. ( ). e nature of selection. Chicago University Press. Sober, E. ( ). Natural selection, causality, and laws: What Fodor and Piatelli-Palmarini got wrong. Philosophy of Science, ( ), – .