Within the last few years, researchers have come to understand that machine learning systems may display discriminatory behavior with regards to certain protected characteristics, such as gender or race. To combat these harmful behaviors, we have created multiple definitions of fairness to enable equity in machine learning algorithms. In this talk, I will cover these different definitions of algorithmic fairness and discuss both the strengths and limitations of these formalizations. In addition, I will cover other best practices to better mitigate the unintended bias of data products.
Measures and mismeasures
of algorithmic fairness
Senior Data Scientist,
J.P Morgan Chase
About Me (According to Google Cloud Vision API)
● Dancer? Aerial dancer and
● Entertaining? Hopefully.
● Fun? Most of the time.
What is Algorithmic Fairness?
● Algorithmic Fairness is a growing
ﬁeld of research that aims to
mitigate the eﬀects of unwarranted
bias/discrimination on people in
● Primarily focused on mathematical
formalisms of fairness and
developing solutions for these
● IMPORTANT: Fairness is inherently
a social and ethical concept.
Source: Fairness and Abstraction in
Socio-technical Systems; Selbst, boyd,
Friedler, Venkatasubramanian & Vertesi
BuT mAtH cAn’T bE rAcist!!
● No one is sincerely arguing that
mathematics or computer science is
● However, the way people apply
mathematical models or algorithms to
real-world problems can reinforce
Fairness, Accountability, Transparency (FAT*) ML
● Interdisciplinary research area that
focuses on creating
machine-learning systems that
work towards goals, such as
fairness and justice.
● Many open-source libraries
(FairTest, thesis-ml, AI 360)
developed based on this research.
● ACM FAT* 2019 Conference held in
Atlanta, GA back in January.
Photo credits: Moritz Hardt
Algorithmic Fairness in Popular Media
In the United States, many industries have legal
regulations to prevent disparate impact against
● Education (Education Amendments Act)
● Employment (Civil Rights Act)
● Credit (Equal Credit Opportunity Act)
● Housing (Fair Housing Act)
Types of Algorithmic Biases
Kate Crawford Hanna Wallach Solan Barocas Aaron Shapiro
Microsoft Research Microsoft Research Cornell University Microsoft Research
Bias in Allocation
● Most commonly researched family
of algorithmic fairness problem
(why we invented the math
● Algorithmic Idea: How do models
perform in binary classiﬁcation
problems across diﬀerent groups?
● Fundamental Idea: When
allocating ﬁnite resources (credit
loans, gainful employment), we
often favor the privileged class
over the more vulnerable.
Source: Reuters News
Bias in representation
● Focused on looking at how harmful
labels/representations are propagated.
● Often related to language and
computer vision problems.
● Harder to quantify error compared to
bias in allocation problems.
● Concerned with algorithms
stereotypes and lack of
Weaponization of Machine Learning
● As data scientist, we are often not
taught to think about how models
could be used inappropriately.
● With the increasing usage of AI in
high-stakes situations, we must be
careful not to harm vulnerable
Source: Why Stanford Researcher tried to Create
A “Gaydar” Machine; New York Times
Types of Fairness Measures
“21 Deﬁnitions of Algorithmic Fairness”
● There are more than 30 diﬀerent
mathematical deﬁnitions of fairness in
the academic literature.
● There isn’t a one, true deﬁnition of
● These deﬁnitions can be grouped
together into three families:
○ Classiﬁcation Parity
Pictured: Princeton CS
professor, Arvind Narayanan
● Heuristic: Algorithmic decisions “ignore”
protected attributes. (Individual Fairness)
● In addition to excluding protected
attributes, one must also be concerned
about learning proxy features.
● Useful for deﬁning loss function of
Same Outcome “Unprotected” features
● Given a set of features X, labels Y,
and protected characteristics Z, we
want to create a model that learns
to predict the labels Y, but also
doesn’t “accidentally” learn to
predict the protected characteristics
● Can view this constrained
optimization as akin to
regularization. Sometimes referred
to as accuracy-fairness trade-oﬀ. Source: Towards Fairness in ML with
Adversarial Networks (GoDrivenData)
Is good classifier? Learning protected attributes?
Dangers of Anti-Classiﬁcation Measures
● By “removing” protected features, we
ignore the underlying processes that
aﬀect diﬀerent demographics.
● Fairness metrics are focused on making
● DANGER! Sometimes making outcomes
equal adversely impacts a vulnerable
Source: Corbett-Davies, Goel (2019)
● Given some traditional
(accuracy, false positive rate),
is our measure equal across
diﬀerent protected groups.
● Most commonly used to audit
algorithms from a legal
Source: Gender Shades,
Buolamwini & Gebru (2018)
● Demographic Parity looks at the
proportion of positive outcomes by
protected attribute group.
● Demographic Parity is used to
audit models for disparate impact
● DANGER! Satisfying immediate
constraint may have potential
negative long-term consequences.
Source: Delayed Impact of Fair Machine Learning,
Liu et. al (2018)
Parity of False Positive Rates
● As the name suggest, this measures looks
at false positive rate across diﬀerent
● Sometimes called “Equal Opportunity”
● It’s possible to have improve false positive
rate by increasing number of true
● DANGER! If we don’t take into
considerations societal factors, we may
end up harming vulnerable populations.
Ignore number of false
positives, just increase
● In case of risk assessment (recidivism, child
protective services), we use a scoring
function s(x) to estimate the true risk to the
● We deﬁne some threshold t to make a
decision when s(x) > t.
● Example: Child Protective Services (CPS)
assigns a risk score (1-20) to child. CPS
intervenes if the perceived risk to the child is
● Heuristic: Two individuals with the same risk score s have the same likelihood
of receiving the outcome.
● A risk score of 10 should mean the same thing for a white individual as it does
for a black individual.
Debate about Northpointe’s COMPAS
● COMPAS is used to assign a recidivism
risk score to prisoners.
● ProPublica Claim: Black defendants have
higher false positive rates.
● Northpointe Defense: Risk scores are
well-calibrated by groups.
Datasheets, Model Cards, and Checklists
Datasheets for Data Sets
● Taking inspiration from safety standards in
other industries, such as automobile testing
and clinical drug trials, Gebru et. al (2017)
propose standards for documenting datasets.
● Documentation questions include:
○ How was the data collection? What time frame?
○ Why was the dataset created? Who funded its
○ Does the data contain any sensitive information?
○ How was the dataset pre-processed/cleaned?
○ If data relates to people, were they informed about
the intended use of the data?
● What makes for a good dataset?
Model Cards for Model Reporting
● Google researchers propose a
standard for documenting deployed
● Sections include:
○ Intended Use
○ Factors (evaluation amongst
○ Ethical Concerns
○ Caveats and Recommendations.
● More transparent model reporting
will allows users to better
understand when they should (or
should not) use your model.
Mitchell et. al (2019)
Deon: Ethical Checklist for Data Science
● Deon (by DrivenData) is a ethics
checklist for data projects.
○ Data Collection
○ Data Storage
● CLI tool creates Markdown ﬁle in
your repo with this checklist.
AI Now Institute
● New York University research
institute that focuses on
understanding the societal
and cultural impact of AI and
● Hosts an annual symposium
on Ethics, Organizing, and
● Recently produced report on
diversity crisis in AI and how it
aﬀects the development of
1. The Measures and Mismeasures of Fairness: A Critical Review of Fair Machine
2. The Misgendering Machine: Trans/HCI Implications of Automatic Gender
3. Delayed Impact of Fair Machine Learning; https://arxiv.org/pdf/1803.04383.pdf
4. Data Sheets for Datasets; https://arxiv.org/pdf/1803.09010.pdf
5. Model Cards for Model Reporting; https://arxiv.org/pdf/1810.03993.pdf
6. Gender Shades: Intersectional Accuracy Disparities in Commercial Gender
7. Fairness and Abstraction in Sociotechnical Systems;
8. Discriminating Systems: Gender Race and Power in AI;