Manojit Nandi - Measures and Mismeasures of algorithmic fairness

Slide 1

Slide 1 text

Measures and mismeasures of algorithmic fairness Manojit Nandi Senior Data Scientist, J.P Morgan Chase @mnandi92

Slide 2

Slide 2 text

About Me (According to Google Cloud Vision API) ● Dancer? Aerial dancer and circus acrobat. ● Entertaining? Hopefully. ● Fun? Most of the time. ● Girl?!?

Slide 3

Slide 3 text

What is Algorithmic Fairness?

Slide 4

Slide 4 text

Algorithmic Fairness ● Algorithmic Fairness is a growing ﬁeld of research that aims to mitigate the eﬀects of unwarranted bias/discrimination on people in machine learning. ● Primarily focused on mathematical formalisms of fairness and developing solutions for these formalisms. ● IMPORTANT: Fairness is inherently a social and ethical concept. Source: Fairness and Abstraction in Socio-technical Systems; Selbst, boyd, Friedler, Venkatasubramanian & Vertesi (2018)

Slide 5

Slide 5 text

BuT mAtH cAn’T bE rAcist!! ● No one is sincerely arguing that mathematics or computer science is inherently discriminatory. ● However, the way people apply mathematical models or algorithms to real-world problems can reinforce societal inequalities.

Slide 6

Slide 6 text

Fairness, Accountability, Transparency (FAT*) ML ● Interdisciplinary research area that focuses on creating machine-learning systems that work towards goals, such as fairness and justice. ● Many open-source libraries (FairTest, thesis-ml, AI 360) developed based on this research. ● ACM FAT* 2019 Conference held in Atlanta, GA back in January. Photo credits: Moritz Hardt

Slide 7

Slide 7 text

Algorithmic Fairness in Popular Media

Slide 8

Slide 8 text

Legal Regulations In the United States, many industries have legal regulations to prevent disparate impact against vulnerable populations. ● Education (Education Amendments Act) ● Employment (Civil Rights Act) ● Credit (Equal Credit Opportunity Act) ● Housing (Fair Housing Act)

Slide 9

Slide 9 text

Types of Algorithmic Biases Kate Crawford Hanna Wallach Solan Barocas Aaron Shapiro Microsoft Research Microsoft Research Cornell University Microsoft Research

Slide 10

Slide 10 text

Bias in Allocation ● Most commonly researched family of algorithmic fairness problem (why we invented the math definitions). ● Algorithmic Idea: How do models perform in binary classification problems across different groups? ● Fundamental Idea: When allocating finite resources (credit loans, gainful employment), we often favor the privileged class over the more vulnerable. Source: Reuters News

Slide 11

Slide 11 text

Bias in representation ● Focused on looking at how harmful labels/representations are propagated. ● Often related to language and computer vision problems. ● Harder to quantify error compared to bias in allocation problems.

Slide 12

Slide 12 text

● Concerned with algorithms promoting harmful stereotypes and lack of recognition. SnapChat filters.

Slide 13

Slide 13 text

Weaponization of Machine Learning ● As data scientist, we are often not taught to think about how models could be used inappropriately. ● With the increasing usage of AI in high-stakes situations, we must be careful not to harm vulnerable populations. Source: Why Stanford Researcher tried to Create A “Gaydar” Machine; New York Times

Slide 14

Slide 14 text

Types of Fairness Measures Sam Corbett-Davies Stanford University Sharad Goel Stanford University

Slide 15

Slide 15 text

“21 Definitions of Algorithmic Fairness” ● There are more than 30 different mathematical definitions of fairness in the academic literature. ● There isn’t a one, true definition of fairness. ● These definitions can be grouped together into three families: ○ Anti-Classification ○ Classification Parity ○ Calibration Pictured: Princeton CS professor, Arvind Narayanan

Slide 16

Slide 16 text

Anti-Classiﬁcation ● Heuristic: Algorithmic decisions “ignore” protected attributes. (Individual Fairness) ● In addition to excluding protected attributes, one must also be concerned about learning proxy features. ● Useful for deﬁning loss function of fairness-aware models. Same Outcome “Unprotected” features

Slide 17

Slide 17 text

Fairness-Aware Algorithms ● Given a set of features X, labels Y, and protected characteristics Z, we want to create a model that learns to predict the labels Y, but also doesn’t “accidentally” learn to predict the protected characteristics Z. ● Can view this constrained optimization as akin to regularization. Sometimes referred to as accuracy-fairness trade-oﬀ. Source: Towards Fairness in ML with Adversarial Networks (GoDrivenData) Is good classifier? Learning protected attributes?

Slide 18

Slide 18 text

Dangers of Anti-Classification Measures ● By “removing” protected features, we ignore the underlying processes that affect different demographics. ● Fairness metrics are focused on making outcomes equal. ● DANGER! Sometimes making outcomes equal adversely impacts a vulnerable demographic. Source: Corbett-Davies, Goel (2019)

Slide 19

Slide 19 text

Classification Parity ● Given some traditional classification measure (accuracy, false positive rate), is our measure equal across different protected groups. (Group Fairness) ● Most commonly used to audit algorithms from a legal perspective. Source: Gender Shades, Buolamwini & Gebru (2018)

Slide 20

Slide 20 text

Demographic Parity ● Demographic Parity looks at the proportion of positive outcomes by protected attribute group. ● Demographic Parity is used to audit models for disparate impact (80% rule). ● DANGER! Satisfying immediate constraint may have potential negative long-term consequences. Source: Delayed Impact of Fair Machine Learning, Liu et. al (2018)

Slide 21

Slide 21 text

Parity of False Positive Rates ● As the name suggest, this measures looks at false positive rate across diﬀerent protected groups. ● Sometimes called “Equal Opportunity” ● It’s possible to have improve false positive rate by increasing number of true negatives. ● DANGER! If we don’t take into considerations societal factors, we may end up harming vulnerable populations. Ignore number of false positives, just increase this.

Slide 22

Slide 22 text

Calibration ● In case of risk assessment (recidivism, child protective services), we use a scoring function s(x) to estimate the true risk to the individual. ● We deﬁne some threshold t to make a decision when s(x) > t. ● Example: Child Protective Services (CPS) assigns a risk score (1-20) to child. CPS intervenes if the perceived risk to the child is high enough.

Slide 23

Slide 23 text

Statistical Calibration ● Heuristic: Two individuals with the same risk score s have the same likelihood of receiving the outcome. ● A risk score of 10 should mean the same thing for a white individual as it does for a black individual.

Slide 24

Slide 24 text

Debate about Northpointe’s COMPAS ● COMPAS is used to assign a recidivism risk score to prisoners. ● ProPublica Claim: Black defendants have higher false positive rates. ● Northpointe Defense: Risk scores are well-calibrated by groups.

Slide 25

Slide 25 text

Datasheets, Model Cards, and Checklists

Slide 26

Slide 26 text

Datasheets for Data Sets ● Taking inspiration from safety standards in other industries, such as automobile testing and clinical drug trials, Gebru et. al (2017) propose standards for documenting datasets. ● Documentation questions include: ○ How was the data collection? What time frame? ○ Why was the dataset created? Who funded its creation? ○ Does the data contain any sensitive information? ○ How was the dataset pre-processed/cleaned? ○ If data relates to people, were they informed about the intended use of the data? ● What makes for a good dataset?

Slide 27

Slide 27 text

Model Cards for Model Reporting ● Google researchers propose a standard for documenting deployed models. ● Sections include: ○ Intended Use ○ Factors (evaluation amongst demographic groups) ○ Ethical Concerns ○ Caveats and Recommendations. ● More transparent model reporting will allows users to better understand when they should (or should not) use your model. Mitchell et. al (2019)

Slide 28

Slide 28 text

Deon: Ethical Checklist for Data Science ● Deon (by DrivenData) is a ethics checklist for data projects. ○ Data Collection ○ Data Storage ○ Analysis ○ Modeling ○ Deployment ● CLI tool creates Markdown ﬁle in your repo with this checklist.

Slide 29

Slide 29 text

AI Now Institute ● New York University research institute that focuses on understanding the societal and cultural impact of AI and machine learning. ● Hosts an annual symposium on Ethics, Organizing, and Accountability. ● Recently produced report on diversity crisis in AI and how it aﬀects the development of technical systems.

Slide 30

Slide 30 text

Papers Referenced 1. The Measures and Mismeasures of Fairness: A Critical Review of Fair Machine Learning; https://5harad.com/papers/fair-ml.pdf 2. The Misgendering Machine: Trans/HCI Implications of Automatic Gender Recognition; https://ironholds.org/resources/papers/agr_paper.pdf 3. Delayed Impact of Fair Machine Learning; https://arxiv.org/pdf/1803.04383.pdf 4. Data Sheets for Datasets; https://arxiv.org/pdf/1803.09010.pdf 5. Model Cards for Model Reporting; https://arxiv.org/pdf/1810.03993.pdf 6. Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classiﬁcation; http://proceedings.mlr.press/v81/buolamwini18a/buolamwini18a.pdf 7. Fairness and Abstraction in Sociotechnical Systems; https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3265913 8. Discriminating Systems: Gender Race and Power in AI; https://ainowinstitute.org/discriminatingsystems.pdf