Manojit Nandi - Measures and Mismeasures of algorithmic fairness

Measures and mismeasures of algorithmic fairness Manojit Nandi Senior Data
Scientist, J.P Morgan Chase @mnandi92

About Me (According to Google Cloud Vision API) • Dancer?
Aerial dancer and circus acrobat. • Entertaining? Hopefully. • Fun? Most of the time. • Girl?!?

What is Algorithmic Fairness?

Algorithmic Fairness • Algorithmic Fairness is a growing ﬁeld of
research that aims to mitigate the eﬀects of unwarranted bias/discrimination on people in machine learning. • Primarily focused on mathematical formalisms of fairness and developing solutions for these formalisms. • IMPORTANT: Fairness is inherently a social and ethical concept. Source: Fairness and Abstraction in Socio-technical Systems; Selbst, boyd, Friedler, Venkatasubramanian & Vertesi (2018)

BuT mAtH cAn’T bE rAcist!! • No one is sincerely
arguing that mathematics or computer science is inherently discriminatory. • However, the way people apply mathematical models or algorithms to real-world problems can reinforce societal inequalities.

Fairness, Accountability, Transparency (FAT*) ML • Interdisciplinary research area that
focuses on creating machine-learning systems that work towards goals, such as fairness and justice. • Many open-source libraries (FairTest, thesis-ml, AI 360) developed based on this research. • ACM FAT* 2019 Conference held in Atlanta, GA back in January. Photo credits: Moritz Hardt

Algorithmic Fairness in Popular Media

Legal Regulations In the United States, many industries have legal
regulations to prevent disparate impact against vulnerable populations. • Education (Education Amendments Act) • Employment (Civil Rights Act) • Credit (Equal Credit Opportunity Act) • Housing (Fair Housing Act)

Types of Algorithmic Biases Kate Crawford Hanna Wallach Solan Barocas
Aaron Shapiro Microsoft Research Microsoft Research Cornell University Microsoft Research

Bias in Allocation • Most commonly researched family of algorithmic
fairness problem (why we invented the math definitions). • Algorithmic Idea: How do models perform in binary classification problems across different groups? • Fundamental Idea: When allocating finite resources (credit loans, gainful employment), we often favor the privileged class over the more vulnerable. Source: Reuters News

Bias in representation • Focused on looking at how harmful
labels/representations are propagated. • Often related to language and computer vision problems. • Harder to quantify error compared to bias in allocation problems.

• Concerned with algorithms promoting harmful stereotypes and lack of
recognition. SnapChat filters.

Weaponization of Machine Learning • As data scientist, we are
often not taught to think about how models could be used inappropriately. • With the increasing usage of AI in high-stakes situations, we must be careful not to harm vulnerable populations. Source: Why Stanford Researcher tried to Create A “Gaydar” Machine; New York Times

Types of Fairness Measures Sam Corbett-Davies Stanford University Sharad Goel
Stanford University

“21 Definitions of Algorithmic Fairness” • There are more than
30 different mathematical definitions of fairness in the academic literature. • There isn’t a one, true definition of fairness. • These definitions can be grouped together into three families: ◦ Anti-Classification ◦ Classification Parity ◦ Calibration Pictured: Princeton CS professor, Arvind Narayanan

Anti-Classiﬁcation • Heuristic: Algorithmic decisions “ignore” protected attributes. (Individual Fairness)
• In addition to excluding protected attributes, one must also be concerned about learning proxy features. • Useful for deﬁning loss function of fairness-aware models. Same Outcome “Unprotected” features

Fairness-Aware Algorithms • Given a set of features X, labels
Y, and protected characteristics Z, we want to create a model that learns to predict the labels Y, but also doesn’t “accidentally” learn to predict the protected characteristics Z. • Can view this constrained optimization as akin to regularization. Sometimes referred to as accuracy-fairness trade-oﬀ. Source: Towards Fairness in ML with Adversarial Networks (GoDrivenData) Is good classifier? Learning protected attributes?

Dangers of Anti-Classification Measures • By “removing” protected features, we
ignore the underlying processes that affect different demographics. • Fairness metrics are focused on making outcomes equal. • DANGER! Sometimes making outcomes equal adversely impacts a vulnerable demographic. Source: Corbett-Davies, Goel (2019)

Classification Parity • Given some traditional classification measure (accuracy, false
positive rate), is our measure equal across different protected groups. (Group Fairness) • Most commonly used to audit algorithms from a legal perspective. Source: Gender Shades, Buolamwini & Gebru (2018)

Demographic Parity • Demographic Parity looks at the proportion of
positive outcomes by protected attribute group. • Demographic Parity is used to audit models for disparate impact (80% rule). • DANGER! Satisfying immediate constraint may have potential negative long-term consequences. Source: Delayed Impact of Fair Machine Learning, Liu et. al (2018)

Parity of False Positive Rates • As the name suggest,
this measures looks at false positive rate across diﬀerent protected groups. • Sometimes called “Equal Opportunity” • It’s possible to have improve false positive rate by increasing number of true negatives. • DANGER! If we don’t take into considerations societal factors, we may end up harming vulnerable populations. Ignore number of false positives, just increase this.

Calibration • In case of risk assessment (recidivism, child protective
services), we use a scoring function s(x) to estimate the true risk to the individual. • We deﬁne some threshold t to make a decision when s(x) > t. • Example: Child Protective Services (CPS) assigns a risk score (1-20) to child. CPS intervenes if the perceived risk to the child is high enough.

Statistical Calibration • Heuristic: Two individuals with the same risk
score s have the same likelihood of receiving the outcome. • A risk score of 10 should mean the same thing for a white individual as it does for a black individual.

Debate about Northpointe’s COMPAS • COMPAS is used to assign
a recidivism risk score to prisoners. • ProPublica Claim: Black defendants have higher false positive rates. • Northpointe Defense: Risk scores are well-calibrated by groups.

Datasheets, Model Cards, and Checklists

Datasheets for Data Sets • Taking inspiration from safety standards
in other industries, such as automobile testing and clinical drug trials, Gebru et. al (2017) propose standards for documenting datasets. • Documentation questions include: ◦ How was the data collection? What time frame? ◦ Why was the dataset created? Who funded its creation? ◦ Does the data contain any sensitive information? ◦ How was the dataset pre-processed/cleaned? ◦ If data relates to people, were they informed about the intended use of the data? • What makes for a good dataset?

Model Cards for Model Reporting • Google researchers propose a
standard for documenting deployed models. • Sections include: ◦ Intended Use ◦ Factors (evaluation amongst demographic groups) ◦ Ethical Concerns ◦ Caveats and Recommendations. • More transparent model reporting will allows users to better understand when they should (or should not) use your model. Mitchell et. al (2019)

Deon: Ethical Checklist for Data Science • Deon (by DrivenData)
is a ethics checklist for data projects. ◦ Data Collection ◦ Data Storage ◦ Analysis ◦ Modeling ◦ Deployment • CLI tool creates Markdown ﬁle in your repo with this checklist.

AI Now Institute • New York University research institute that
focuses on understanding the societal and cultural impact of AI and machine learning. • Hosts an annual symposium on Ethics, Organizing, and Accountability. • Recently produced report on diversity crisis in AI and how it aﬀects the development of technical systems.

Papers Referenced 1. The Measures and Mismeasures of Fairness: A
Critical Review of Fair Machine Learning; https://5harad.com/papers/fair-ml.pdf 2. The Misgendering Machine: Trans/HCI Implications of Automatic Gender Recognition; https://ironholds.org/resources/papers/agr_paper.pdf 3. Delayed Impact of Fair Machine Learning; https://arxiv.org/pdf/1803.04383.pdf 4. Data Sheets for Datasets; https://arxiv.org/pdf/1803.09010.pdf 5. Model Cards for Model Reporting; https://arxiv.org/pdf/1810.03993.pdf 6. Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classiﬁcation; http://proceedings.mlr.press/v81/buolamwini18a/buolamwini18a.pdf 7. Fairness and Abstraction in Sociotechnical Systems; https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3265913 8. Discriminating Systems: Gender Race and Power in AI; https://ainowinstitute.org/discriminatingsystems.pdf

Manojit Nandi - Measures and Mismeasures of alg...

Manojit Nandi - Measures and Mismeasures of algorithmic fairness

PyCon 2019

More Decks by PyCon 2019

Other Decks in Programming

Featured

Transcript

Measures and mismeasures of algorithmic fairness Manojit Nandi Senior Data

About Me (According to Google Cloud Vision API) • Dancer?

What is Algorithmic Fairness?

Algorithmic Fairness • Algorithmic Fairness is a growing ﬁeld of

BuT mAtH cAn’T bE rAcist!! • No one is sincerely

Fairness, Accountability, Transparency (FAT*) ML • Interdisciplinary research area that

Algorithmic Fairness in Popular Media

Legal Regulations In the United States, many industries have legal

Types of Algorithmic Biases Kate Crawford Hanna Wallach Solan Barocas

Bias in Allocation • Most commonly researched family of algorithmic

Bias in representation • Focused on looking at how harmful

• Concerned with algorithms promoting harmful stereotypes and lack of

Weaponization of Machine Learning • As data scientist, we are

Types of Fairness Measures Sam Corbett-Davies Stanford University Sharad Goel

“21 Deﬁnitions of Algorithmic Fairness” • There are more than

Anti-Classiﬁcation • Heuristic: Algorithmic decisions “ignore” protected attributes. (Individual Fairness)

Fairness-Aware Algorithms • Given a set of features X, labels

Dangers of Anti-Classiﬁcation Measures • By “removing” protected features, we

Classiﬁcation Parity • Given some traditional classiﬁcation measure (accuracy, false

Demographic Parity • Demographic Parity looks at the proportion of

Parity of False Positive Rates • As the name suggest,

Calibration • In case of risk assessment (recidivism, child protective

Statistical Calibration • Heuristic: Two individuals with the same risk

Debate about Northpointe’s COMPAS • COMPAS is used to assign

Datasheets, Model Cards, and Checklists

Datasheets for Data Sets • Taking inspiration from safety standards

Model Cards for Model Reporting • Google researchers propose a

Deon: Ethical Checklist for Data Science • Deon (by DrivenData)

AI Now Institute • New York University research institute that

Papers Referenced 1. The Measures and Mismeasures of Fairness: A