Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Causal Inference 2022 Week 9

Will Lowe
April 11, 2022

Causal Inference 2022 Week 9

Will Lowe

April 11, 2022


  1. PLAN 1 ree kinds of fairness What can be fair

    ...and with respect to what? Traditional classi er performance measures e intuitive but ine ectual strategy Case study and a fundamental problem Counterfactual fairness Open questions
  2. THREE KINDS OF FAIRNESS 2 Fairness motivated mathematics from the

    start See al-Khwarizm¯ ı’s hit CE textbook → e compendious book on calculation by completion and balancing Chapter devoted to establishing ‘fair division’ in inheritance problems al Jabr (algebra) is all about maintaining equalities A A mechanism, e.g. an allocation, is fair with respect to characteristic A if Y(a) = Y(a′) Muhammad ibn M¯ usa al-Khwarizm¯ ı + al-Jabr
  3. THREE KINDS OF FAIRNESS 3 e world is cruel, and

    the only morality in a cruel world is chance It’s not about what I want, it’s about what’s fair! Harvey Dent S Fairness ≈ equal probabilities A mechanism is fair with respect to characteristic A if P(Y A = a) = P(Y A = a′)

    ectiveness of multilateral UN operations in civil wars (Doyle & Sambanis, ). Reexamined by King and Zeng ( ) with response (Sambanis & Doyle, )
  5. WHAT CAN BE FAIR? 7 P → Mechanisms, rules, procedures,

    decision procedures → Allocations, enforcement, outcomes, decisions People and organisations have rules and make decisions → Decisions are made according to, mostly according to, or despite the rules → Rules may be internally inconsistent and require balancing or weighting (looking at you, lawyers) We won’t have much to say about implementation issues here... [gestures in the direction of all Hertie]
  6. WHAT CAN BE FAIR 8 M L It is o

    en argued that these issues are made worse by the presence of ‘algorithmic’ or ML decision-making tools → All explicit decision-making processes are algorithms → Removing the human element helps theorizing (even if if hinders other things) → Some of the best work on fairness currently happens in Computer Science departments → In the eld of algorithmic fairness (Barocas et al., ) Automated decision making systems German style (fax machine not shown)
  7. FAIRNESS WITH RESPECT TO WHAT? 9 P → Variables, e.g.

    gender, race, etc. → Measurable on the individual level → O en aggregated to groups U Broadly we can de ne fairness for → individuals → groups C → Outcomes, e.g. ˆ Yi → Implementation, e.g. conditioning variables, internal rules V → Y the outcome e.g. loan-worthiness, recidivism → ˆ Y a prediction of Y, e.g. probability (or amount) of eventual loan repayment, whether caught committing another crime → X a non-protected characteristic, e.g. criminal record → A a protected characteristic → U non-protected but unobserved characteristics that might also predict Y ˆ Y is a function of X, A, or both. O en thresholded at τ to make a decision.
  8. TRADITIONAL PERFORMANCE MEASURES 10 C Classi ers make probabilistic predictions

    ˆ P → E[Y = X, . . .] = P(Y = X, . . .) which are converted to decisions by comparing to a threshold τ ˆ Y = ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ if ˆ P > τ otherwise P P(Y = ˆ Y = ) (Precision) P( ˆ Y = Y = ) (Recall) Precision and recall are related by Bayes theorem, obvs. P( ˆ Y = Y = ) = P(Y = ˆ Y = )P( ˆ Y = ) P(Y = ) sometimes this can be surprisingly useful (King & Lowe, ) C A probability estimator is calibrated when P(Y = ˆ P = p) = p so Y= in p% of cases where ˆ P = p (for all p) Calibrated classi ers don’t have to be any good, they have to know how good they are
  9. DESIDERATA 12 (R ) P(Y = ˆ P = p,

    A = ) = P(Y ˆ P = p, A = ) No requirement that P(Y = ˆ Y = p) = p → Equal (mis)calibration across levels of A Motivation: → ˆ P should mean the same thing across groups P Relatedly, we could also ask for equal precision P(Y = ˆ Y = , A = ) = P(Y = ˆ Y = , A = ) E Equal false positive rate P( ˆ Y = Y = , A = ) = P( ˆ Y = Y = , A = ) Equal false negative rate P( ˆ Y = Y = , A = ) = P( ˆ Y = Y = , A = ) M Classi cation errors should not vary across groups N → Equal calibration is (basically) about precision → Equal error rates are (basically) about recall
  10. CASE STUDY 13 ProPublica (Larson et al., ) noted that

    a commercial recidivism prediction tool COMPAS had quite di erent error rates by race
  11. A FUNDAMENTAL PROBLEM 15 P When recidivism prevalence di ers,

    i.e. P(Y = A = a) ≠ P(Y = A = a′) then provably we cannot have equal calibration and both error rates equal (Chouldechova, ; Kleinberg et al., ) if an instrument satis es predictive par- ity – that is, if the PPV [i.e. precision] is the same across groups – but the preva- lence di ers between groups, the instru- ment cannot achieve equal false positive and false negative rates [i.e. recall] across those groups. (Chouldechova, ) is should not surprise us → Recall and precision are related through Bayes theorem → Re-weighting depends on prevalence P(Y = )! Statistical reconstructions of intuitive notions of fairness seem incomplete and/or inconsistent R → Nihilism: fairness is incoherent, so choose your poison → Optimism: there is a coherent notion of fairness but we haven’t got it yet → Causal inference! (Kusner, Lo us, et al., )

    a) = P( ˆ Y(A=a′) Ai = a) ˆ Yi is fair if it would not have been di erent had Ai taken a di erent value is has a number of interesting properties: → individual-oriented: Ai = a (not group-oriented: A = a) → exactly half outcome-oriented: Y(Ai =a) → Sometimes it is necessary for a prediction to condition on A to be fair with respect to it With apologies to Simon Munzert who loves this meme more than any of us

    us, et al. ( ) → Race (A) → Car choice (X) → Speeding tendency (U) → Accidents (Y) No causal e ect of A on Y, but some association Consider predicting accidents using one of ˆ Y = β + XβX (Model ) ˆ Y = β + XβX + AβA (Model ) M Using Model is counterfactually unfair → Holding U constant, but changing A, changes X which changes ˆ Y M Using Model is counterfactually fair → Holding U constant, but changing A, still changes X but this doesn’t change ˆ Y because X is controlled in it
  14. COUNTERFACTUAL FAIRNESS 18 We are now (imho) at the state

    of the art: → Group-based fairness → Individual-based fairness → Counterfactual individual-based fairness Maybe this approach is general. I hope so, but I’m biased... I According to Kusner et al. a su cient (but not necessary) condition for fairness is → “Conditioning on non-children of A will always be fair” Does not hold for some other de nitions... A : → Counterfactually de ned fairness is not entirely new → We met it with mediation analysis D In the previous lecture we thought about proving discrimination by establishing a Natural Direct E ect (NDE)
  15. OPEN QUESTIONS 19 e central question in any employment- discrimination

    case is whether the employer would have taken the same action had the employee been of a di erent race (age, sex, religion, national origin, etc.) and every- thing else had remained the same. (Carson v. Bethlehem Steel Corp., ) Everything else? Lots of things mediate the e ects of Gender! G J Y → Gender (G) → Choice of Job type (J) → Outcome (Y) Kusner et al. aren’t clear on the solution... Maybe there are as many types of bias / discrimination as there are distinguishable causal e ects? → Yikes...
  16. PLAN 20 ree kinds of fairness What can be fair

    ...and with respect to what? Traditional classi er performance measures e intuitive but ine ectual strategy Case study and a fundamental problem Counterfactual fairness Open questions
  17. REFERENCES 21 Barocas, S., Hardt, M., & Narayanan, A. (

    ). Fairness and machine learning. fairmlbook.org. Carson v. Bethlehem Steel Corp. Chouldechova, A. ( , February ). Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Doyle, M. W., & Sambanis, N. ( ). International peacebuilding: A theoretical and quantitative analysis. American Political Science Review, ( ), – . Hedden, B. ( ). On statistical criteria of algorithmic fairness. Philosophy & Public A airs, ( ), – . eprint: https://onlinelibrary.wiley.com/doi/pdf/ . /papa. King, G., & Lowe, W. ( ). An automated information extraction tool for international con ict data with performance as good as human coders: A rare events evaluation design. International Organization, ( ), – . King, G., & Zeng, L. ( ). When can history be our guide? e pitfalls of counterfactual inference. International Studies Quarterly, ( ), – . Kleinberg, J., Mullainathan, S., & Raghavan, M. ( , November ). Inherent trade-o s in the fair determination of risk scores.
  18. REFERENCES 22 Kusner, M. J., Lo us, J., Russell, C.,

    & Silva, R. ( ). Counterfactual fairness. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, & R. Garnett (Eds.), Advances in Neural Information Processing Systems (pp. – ). Curran Associates, Inc. Kusner, M. J., Lo us, J. R., Russell, C., & Silva, R. ( , March ). Counterfactual fairness. Larson, J., Mattu, S., Kirchner, L., & Angwin, J. ( , May ). How we analyzed the COMPAS recidivism algorithm. ProPublica. Sambanis, N., & Doyle, M. W. ( ). No easy choices: Estimating the e ects of united nations peacekeeping (Response to King and Zeng). International Studies Quarterly, ( ), – .