Florian Pfisterer- Fairness in automated decision making

Fairness in Automated Decision Making Florian Pﬁsterer @ Datageeks Munich
30.03.2022

About Me • PhD Statistics at LMU Munich ◦ Automated
Machine Learning ◦ Fairness and Ethics in AI • Currently: Figuring out what's next • First Datageeks Meetup: April 2015

Outline • What is automated decision making (ADM) ? •
What makes a decision making system unfair? ◦ Which types of harms occur? ◦ What are sources of bias? ◦ How can we detect unfair systems? • How can we prevent unfair systems?

Automated decision making (ADM) Model Data Decision A model can
be any system that produces a decision based on data. Example: Set of business rules, Logistic Regression, Deep Neural Networks Name Age Income Job Joe 33 50000 Mgr. ADMs automate decisions in many domains, e.g. credit checks, fraud detection, setting insurance premiums, hiring decisions, ....

Fairness in ADM - Why should we care?

Hiring www.pnas.org/content/117/23/12592 Healthcare https://www.reuters.com/article/us-amazon-com-jobs-automation-insight/amazon-scraps-secret-ai-recruiting-t ool-that-showed-bias-against-women-idUSKCN1MK08G http://web.br.de/interaktiv/ki-bewerbung/en/ Work https://algorithmwatch.org/en/austrias-employment-agency-ams-rolls-out-discriminatory-algorithm/

Harms - Individuals Allocation Extending or withholding opportunities, resources or
information Stereotyping The system reinforces stereotypes Quality-of-service System does not work equally well for all groups Representation System over- or under- represents certain groups Denigration The system is actively offensive or derogatory Procedural The system makes decisions that violate social norms Weerts H., An Introduction to Algorithmic Fairness, 2022 Only affected individuals experience this!

Types of Biases - Where do harms come from?

Types of Bias - Historical Bias Data reﬂects how things
were in the past - We might not want to perpetuate all of it! • Texts often reﬂect historical inequalities • Poor areas have higher police presence → more arrests / reoffenders Models can pick up biases and perpetuate them into the future!

Types of Bias - Representation Bias Data is often not
representative of the whole population we care about • Collecting data from underrepresented groups is often neglected as it is expensive • Data often does not exist: Women were often not included in studies → Gender medicine http://gendershades.org/

Types of Bias - Other Measurement Bias Difference in how
a given variable is measured across sub-populations • Better data quality between different hospitals Model Bias Biases introduced during modeling, e.g. due to under-speciﬁed models • e.g. Models only learn the prediction mechanism of the larger class Feedback Loops Model decisions shape data collected in the future • Can, e.g. lead to representation bias if sub-populations are excluded systematically

Example - Loan Application Process Loan Applicant Credit Risk Scoring
Denied Repayment Process Historical Data What kinds of biases might occur here?

Example - Loan Application Process Loan Applicant Credit Risk Scoring
Denied Repayment Process Historical Data Historical Bias: Some groups might not have been able to pay back loans in the past but this has changed Feedback loops: We do not learn anything about rejected applicants! This extends indeﬁnitely into the future! Model Bias: The model might not pick up important differences between, e.g. genders Representation Bias: Insufﬁcient data about some groups leading to higher uncertainty.

Diagnosing Bias

Approaches to measuring fairness Individual Fairness "Similar people should be
treated similarly" Group Fairness "On average, different groups of people should be treated equally. " Two Perspectives

Approaches to measuring fairness Individual Fairness "Similar people should be
treated similarly" Group Fairness "On average, different groups of people should be treated equally. " Two Perspectives What are similar people? What is similar treatment ? Are the groups comparable?

Approaches to measuring fairness We introduce two concepts: Goal: Fairness
for legally protected groups (age, sex, disability, ethnic origin,...) • Sensitive attribute A: A describes which group an individual belongs to, either 0 or 1. • Decision D: An ADM produces a decision D, and it can be 👎 or 👍. Decisions should ﬁt the true outcome Y, (also 👎 or 👍). Example - Credit risk assessment Individual with characteristics X=X, A=1 (female). She receives the decision D = 👎 but would have paid back a loan (Y = 👍). Our system made an error!

Auditing models for potential harms Statistical Parity (Treatment Equality) "Decisions
should be independent of sensitive attribute" P(D=👍|A=1) = P(D=👍|A=0) Equality of Opportunity "Chance to deservedly obtain a favourable outcome is independent of sensitive attribute" P(D=👍|A=1,Y=👍) = P(D=👍|A=0,Y=👍) Angus Maguire Interaction Institute for Social Change

Auditing models for potential harms 1. Bias preserving fairness metrics
Define fairness based on errors (or lack thereof) between a true outcome Y and the decision D. Examples: Equality of opportunity: True positive rates across groups should be equal! Accuracy equality: Accuracy across groups should be equal! 2. Bias transforming fairness metrics This defines fairness based only on decision D. Examples: Statistical parity: Positive rates should be equal across groups Choosing a fairness definition requires an ethical judgement! Conditional statistical parity: Positive rates should be equal given some condition "Acceptance at fire departments should be equal given a minimum height requirement"

Auditing models for potential harms Example: A D Y poor
👎 👍 rich 👍 👍 rich 👍 👎 poor 👍 👍 • True positive rate parity (D = 👍, Y = 👍) rich: 1/2, poor: 1/1 • Statistical parity (D = 👍) rich: 2/2, poor: 1/2 Metric: Absolute difference between groups |ɸA=0 - ɸA=1| • Fairness needs to be evaluated on a representative dataset

Auditing models for potential harms Fairness metrics reduce many important
considerations into a single number They can not guarantee that a system is fair

Dealing with Biases

No fairness through unawareness! Naive Idea: Remove the protected attribute!
The model directly uses race as a feature The model picks up information about race through the proxy-variable ZIP code

Remedies - Algorithmic Solutions Several 'technical' fixes have been proposed
• Preprocessing Changes the data so resulting models are fairer Example: Balance distributions across different groups • Fair models Learn models that take a fairness criterion into account Example: Linear model with fairness constraints • Postprocessing Adapt decisions to satisfy fairness metrics Example: Accept more people from the disadvantaged group

Remedies - Algorithmic Solutions Technical solutions can only hope to
'fix' the symptoms but do not address root causes!

Remedies - Recourse • Problem formulation Deciding which problems to
prioritize using ADM systems is important: Example: Detecting welfare fraud vs. Identifying underserved cases • Accountability & Recourse ◦ Automated systems will make errors - developers need to ensure that humans responsible for addressing errors exist and that they can resolve such errors. ◦ Access to an explanation on how the decision was made and what steps can be taken to address unfavourable decisions. • Documentation Errors often result from using data and models beyond their intended purpose - Data Sheets and Model Cards help to document intended use and important caveats.

Summary Ensuring fair decisions can be difficult: • Organizational Support
To succeed, fairness needs to be embedded at the product development and engineering level. This requires creating awareness in engineering teams that develop and maintain ADM systems. • Diverse perspectives Harms can occur in many different forms. Considering diverse perspectives and involving stakeholders during system development is essential. • Fairness metrics can be a useful tool to diagnose bias, but to understand what they mean, they need to be grounded in real world quantities.

Thank you Get in contact: Illustrations: @drawdespiteitall www.linkedin.com/in/pfistfl/ [email protected]

Resources Books & Articles • Fairness and Machine Learning -
Limitations and Opportunities (Barocas et al., 2019) • An Introduction to Algorithmic Fairness (Weerts, 2021) with online notes: https://hildeweerts.github.io/responsiblemachinelearning/ • Algorithmic Fairness: Choices, Assumptions, and Definitions (Mitchell et al., 2021) • Why Fairness Cannot Be Automated: Bridging the Gap Between EU Non-Discrimination Law and AI (Wachter, 2021) Software • Fairlearn (Python) https://fairlearn.org/ • aif360 (Python, R) https://aif360.mybluemix.net/ • fairmodels (R) https://fairmodels.drwhy.ai/ • mlr3fairness (R) https://github.com/mlr-org/mlr3fairness

Discussion

Harms - ADM provider Legal Our systems should not discriminate,
e.g. Article 21 of the EU Charter of Fundamental Rights Public Image Increased scrutiny on ADM-based products by media and consumer advocacy groups Ethical We want our decision making to reflect our ethical values Regulatory Demonstrate non-discrimination to regulatory bodies

Remedies - Documentation Data Sheets and Model Cards Harm often
stems from using datasets or models beyond their intended use, this can be prevented by better documentation! • Dataset docs Include information on datasets, how it was collected etc. Example: Datasheets for datasets (Gebru et al., 2018) • Model docs Information on used data, intended use and target demographic. Example: Model Cards for Model Reporting (Mitchell et al., 2019)

Discussion Can simple business rules be unfair? Does fairness matter
for less consequential decisions? Reconciling fairness and the financial bottom line?

Discussion

Fairness in European Law Disclaimer: I know only few things
about law and fairness! This might be wrong! 1. The current legal situation in the EU is unclear, it is likely that this will be shaped by case law in the different member states 2. Error-based metrics are widely used to assess fairness. If they are sufficient depends on whether we can assume that data is 'fair' 3. Wachter et al. (2021) argue that EU law might require a form of conditional statistical parity introduced today.

Algorithmic Fairness ADM's should behave and treat people fairly: without
unjust treatment on the grounds of sensitive characteristics sensitive characteristics: e.g., legally protected groups (age, sex, disability, ethnic origin, race, ...)

Auditing models for potential harms 1. Students who might succeed
are admitted equally often between students from poor and rich households. Students are selected according to their ability (measured via grades) Argument: Resources should be made available to those with the highest chance of success. 2. Students from poor and rich households are admitted equally often. Does not take ability into account Argument: Poor students have less access to tutoring - It is not fair to base admission on grades only. The construct grades does not adequately measure ability. Choosing a fairness definition requires an ethical judgement!

Auditing models for potential harms 1. Students with good grades
are admitted equally between students from poor and rich households. This defines fairness based on errors (or lack thereof) between a true outcome Y (= will succeed) and the decision D. Examples: Equality of opportunity: True positive rates across groups should be equal! Accuracy equality: Accuracy across groups should be equal! 2. Students from poor and rich households are admitted equally often. This defines fairness based only on decision D. Examples: Statistical parity: Positive rates should be equal across groups Conditional statistical parity: Positive rates should be equal given some condition "Acceptance at fire departments should be equal given a minimum height requirement" Choosing a fairness definition requires an ethical judgement!

A practical example

Florian Pfisterer- Fairness in automated decisi...

Florian Pfisterer- Fairness in automated decision making

More Decks by MunichDataGeeks

Other Decks in Science

Featured

Transcript