Slide 1

Slide 1 text

Fairness in Automated Decision Making Florian Pfisterer @ Datageeks Munich 30.03.2022

Slide 2

Slide 2 text

About Me ● PhD Statistics at LMU Munich ○ Automated Machine Learning ○ Fairness and Ethics in AI ● Currently: Figuring out what's next ● First Datageeks Meetup: April 2015

Slide 3

Slide 3 text

Outline ● What is automated decision making (ADM) ? ● What makes a decision making system unfair? ○ Which types of harms occur? ○ What are sources of bias? ○ How can we detect unfair systems? ● How can we prevent unfair systems?

Slide 4

Slide 4 text

Automated decision making (ADM) Model Data Decision A model can be any system that produces a decision based on data. Example: Set of business rules, Logistic Regression, Deep Neural Networks Name Age Income Job Joe 33 50000 Mgr. ADMs automate decisions in many domains, e.g. credit checks, fraud detection, setting insurance premiums, hiring decisions, ....

Slide 5

Slide 5 text

Fairness in ADM - Why should we care?

Slide 6

Slide 6 text

Hiring www.pnas.org/content/117/23/12592 Healthcare https://www.reuters.com/article/us-amazon-com-jobs-automation-insight/amazon-scraps-secret-ai-recruiting-t ool-that-showed-bias-against-women-idUSKCN1MK08G http://web.br.de/interaktiv/ki-bewerbung/en/ Work https://algorithmwatch.org/en/austrias-employment-agency-ams-rolls-out-discriminatory-algorithm/

Slide 7

Slide 7 text

Harms - Individuals Allocation Extending or withholding opportunities, resources or information Stereotyping The system reinforces stereotypes Quality-of-service System does not work equally well for all groups Representation System over- or under- represents certain groups Denigration The system is actively offensive or derogatory Procedural The system makes decisions that violate social norms Weerts H., An Introduction to Algorithmic Fairness, 2022 Only affected individuals experience this!

Slide 8

Slide 8 text

Types of Biases - Where do harms come from?

Slide 9

Slide 9 text

Types of Bias - Historical Bias Data reflects how things were in the past - We might not want to perpetuate all of it! ● Texts often reflect historical inequalities ● Poor areas have higher police presence → more arrests / reoffenders Models can pick up biases and perpetuate them into the future!

Slide 10

Slide 10 text

Types of Bias - Representation Bias Data is often not representative of the whole population we care about ● Collecting data from underrepresented groups is often neglected as it is expensive ● Data often does not exist: Women were often not included in studies → Gender medicine http://gendershades.org/

Slide 11

Slide 11 text

Types of Bias - Other Measurement Bias Difference in how a given variable is measured across sub-populations ● Better data quality between different hospitals Model Bias Biases introduced during modeling, e.g. due to under-specified models ● e.g. Models only learn the prediction mechanism of the larger class Feedback Loops Model decisions shape data collected in the future ● Can, e.g. lead to representation bias if sub-populations are excluded systematically

Slide 12

Slide 12 text

Example - Loan Application Process Loan Applicant Credit Risk Scoring Denied Repayment Process Historical Data What kinds of biases might occur here?

Slide 13

Slide 13 text

Example - Loan Application Process Loan Applicant Credit Risk Scoring Denied Repayment Process Historical Data Historical Bias: Some groups might not have been able to pay back loans in the past but this has changed Feedback loops: We do not learn anything about rejected applicants! This extends indefinitely into the future! Model Bias: The model might not pick up important differences between, e.g. genders Representation Bias: Insufficient data about some groups leading to higher uncertainty.

Slide 14

Slide 14 text

Diagnosing Bias

Slide 15

Slide 15 text

Approaches to measuring fairness Individual Fairness "Similar people should be treated similarly" Group Fairness "On average, different groups of people should be treated equally. " Two Perspectives

Slide 16

Slide 16 text

Approaches to measuring fairness Individual Fairness "Similar people should be treated similarly" Group Fairness "On average, different groups of people should be treated equally. " Two Perspectives What are similar people? What is similar treatment ? Are the groups comparable?

Slide 17

Slide 17 text

Approaches to measuring fairness We introduce two concepts: Goal: Fairness for legally protected groups (age, sex, disability, ethnic origin,...) ● Sensitive attribute A: A describes which group an individual belongs to, either 0 or 1. ● Decision D: An ADM produces a decision D, and it can be 👎 or 👍. Decisions should fit the true outcome Y, (also 👎 or 👍). Example - Credit risk assessment Individual with characteristics X=X, A=1 (female). She receives the decision D = 👎 but would have paid back a loan (Y = 👍). Our system made an error!

Slide 18

Slide 18 text

Auditing models for potential harms Statistical Parity (Treatment Equality) "Decisions should be independent of sensitive attribute" P(D=👍|A=1) = P(D=👍|A=0) Equality of Opportunity "Chance to deservedly obtain a favourable outcome is independent of sensitive attribute" P(D=👍|A=1,Y=👍) = P(D=👍|A=0,Y=👍) Angus Maguire Interaction Institute for Social Change

Slide 19

Slide 19 text

Auditing models for potential harms 1. Bias preserving fairness metrics Define fairness based on errors (or lack thereof) between a true outcome Y and the decision D. Examples: Equality of opportunity: True positive rates across groups should be equal! Accuracy equality: Accuracy across groups should be equal! 2. Bias transforming fairness metrics This defines fairness based only on decision D. Examples: Statistical parity: Positive rates should be equal across groups Choosing a fairness definition requires an ethical judgement! Conditional statistical parity: Positive rates should be equal given some condition "Acceptance at fire departments should be equal given a minimum height requirement"

Slide 20

Slide 20 text

Auditing models for potential harms Example: A D Y poor 👎 👍 rich 👍 👍 rich 👍 👎 poor 👍 👍 ● True positive rate parity (D = 👍, Y = 👍) rich: 1/2, poor: 1/1 ● Statistical parity (D = 👍) rich: 2/2, poor: 1/2 Metric: Absolute difference between groups |ɸA=0 - ɸA=1| ● Fairness needs to be evaluated on a representative dataset

Slide 21

Slide 21 text

Auditing models for potential harms Fairness metrics reduce many important considerations into a single number They can not guarantee that a system is fair

Slide 22

Slide 22 text

Dealing with Biases

Slide 23

Slide 23 text

No fairness through unawareness! Naive Idea: Remove the protected attribute! The model directly uses race as a feature The model picks up information about race through the proxy-variable ZIP code

Slide 24

Slide 24 text

Remedies - Algorithmic Solutions Several 'technical' fixes have been proposed ● Preprocessing Changes the data so resulting models are fairer Example: Balance distributions across different groups ● Fair models Learn models that take a fairness criterion into account Example: Linear model with fairness constraints ● Postprocessing Adapt decisions to satisfy fairness metrics Example: Accept more people from the disadvantaged group

Slide 25

Slide 25 text

Remedies - Algorithmic Solutions Technical solutions can only hope to 'fix' the symptoms but do not address root causes!

Slide 26

Slide 26 text

Remedies - Recourse ● Problem formulation Deciding which problems to prioritize using ADM systems is important: Example: Detecting welfare fraud vs. Identifying underserved cases ● Accountability & Recourse ○ Automated systems will make errors - developers need to ensure that humans responsible for addressing errors exist and that they can resolve such errors. ○ Access to an explanation on how the decision was made and what steps can be taken to address unfavourable decisions. ● Documentation Errors often result from using data and models beyond their intended purpose - Data Sheets and Model Cards help to document intended use and important caveats.

Slide 27

Slide 27 text

Summary Ensuring fair decisions can be difficult: ● Organizational Support To succeed, fairness needs to be embedded at the product development and engineering level. This requires creating awareness in engineering teams that develop and maintain ADM systems. ● Diverse perspectives Harms can occur in many different forms. Considering diverse perspectives and involving stakeholders during system development is essential. ● Fairness metrics can be a useful tool to diagnose bias, but to understand what they mean, they need to be grounded in real world quantities.

Slide 28

Slide 28 text

Thank you Get in contact: Illustrations: @drawdespiteitall www.linkedin.com/in/pfistfl/ [email protected]

Slide 29

Slide 29 text

Resources Books & Articles ● Fairness and Machine Learning - Limitations and Opportunities (Barocas et al., 2019) ● An Introduction to Algorithmic Fairness (Weerts, 2021) with online notes: https://hildeweerts.github.io/responsiblemachinelearning/ ● Algorithmic Fairness: Choices, Assumptions, and Definitions (Mitchell et al., 2021) ● Why Fairness Cannot Be Automated: Bridging the Gap Between EU Non-Discrimination Law and AI (Wachter, 2021) Software ● Fairlearn (Python) https://fairlearn.org/ ● aif360 (Python, R) https://aif360.mybluemix.net/ ● fairmodels (R) https://fairmodels.drwhy.ai/ ● mlr3fairness (R) https://github.com/mlr-org/mlr3fairness

Slide 30

Slide 30 text

Discussion

Slide 31

Slide 31 text

Harms - ADM provider Legal Our systems should not discriminate, e.g. Article 21 of the EU Charter of Fundamental Rights Public Image Increased scrutiny on ADM-based products by media and consumer advocacy groups Ethical We want our decision making to reflect our ethical values Regulatory Demonstrate non-discrimination to regulatory bodies

Slide 32

Slide 32 text

Remedies - Documentation Data Sheets and Model Cards Harm often stems from using datasets or models beyond their intended use, this can be prevented by better documentation! ● Dataset docs Include information on datasets, how it was collected etc. Example: Datasheets for datasets (Gebru et al., 2018) ● Model docs Information on used data, intended use and target demographic. Example: Model Cards for Model Reporting (Mitchell et al., 2019)

Slide 33

Slide 33 text

Discussion Can simple business rules be unfair? Does fairness matter for less consequential decisions? Reconciling fairness and the financial bottom line?

Slide 34

Slide 34 text

Discussion

Slide 35

Slide 35 text

Fairness in European Law Disclaimer: I know only few things about law and fairness! This might be wrong! 1. The current legal situation in the EU is unclear, it is likely that this will be shaped by case law in the different member states 2. Error-based metrics are widely used to assess fairness. If they are sufficient depends on whether we can assume that data is 'fair' 3. Wachter et al. (2021) argue that EU law might require a form of conditional statistical parity introduced today.

Slide 36

Slide 36 text

Algorithmic Fairness ADM's should behave and treat people fairly: without unjust treatment on the grounds of sensitive characteristics sensitive characteristics: e.g., legally protected groups (age, sex, disability, ethnic origin, race, ...)

Slide 37

Slide 37 text

Auditing models for potential harms 1. Students who might succeed are admitted equally often between students from poor and rich households. Students are selected according to their ability (measured via grades) Argument: Resources should be made available to those with the highest chance of success. 2. Students from poor and rich households are admitted equally often. Does not take ability into account Argument: Poor students have less access to tutoring - It is not fair to base admission on grades only. The construct grades does not adequately measure ability. Choosing a fairness definition requires an ethical judgement!

Slide 38

Slide 38 text

Auditing models for potential harms 1. Students with good grades are admitted equally between students from poor and rich households. This defines fairness based on errors (or lack thereof) between a true outcome Y (= will succeed) and the decision D. Examples: Equality of opportunity: True positive rates across groups should be equal! Accuracy equality: Accuracy across groups should be equal! 2. Students from poor and rich households are admitted equally often. This defines fairness based only on decision D. Examples: Statistical parity: Positive rates should be equal across groups Conditional statistical parity: Positive rates should be equal given some condition "Acceptance at fire departments should be equal given a minimum height requirement" Choosing a fairness definition requires an ethical judgement!

Slide 39

Slide 39 text

A practical example