Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Florian Pfisterer- Fairness in automated decision making

Florian Pfisterer- Fairness in automated decision making

Decisions derived from automated systems, e.g. machine learning models increasingly affect our lives. Ensuring that those systems behave fairly, and e.g. do not discriminate against majorities is an important endeavour. In the talk, I would like to give a brief intro to the field of algorithmic fairness. This includes harms that might arise from the use of biased ML models and some intuition regarding how "un-" fairness could be measured along with approaches towards how we might be able to mitigate biases in such systems.

MunichDataGeeks

April 26, 2023
Tweet

More Decks by MunichDataGeeks

Other Decks in Science

Transcript

  1. About Me • PhD Statistics at LMU Munich ◦ Automated

    Machine Learning ◦ Fairness and Ethics in AI • Currently: Figuring out what's next • First Datageeks Meetup: April 2015
  2. Outline • What is automated decision making (ADM) ? •

    What makes a decision making system unfair? ◦ Which types of harms occur? ◦ What are sources of bias? ◦ How can we detect unfair systems? • How can we prevent unfair systems?
  3. Automated decision making (ADM) Model Data Decision A model can

    be any system that produces a decision based on data. Example: Set of business rules, Logistic Regression, Deep Neural Networks Name Age Income Job Joe 33 50000 Mgr. ADMs automate decisions in many domains, e.g. credit checks, fraud detection, setting insurance premiums, hiring decisions, ....
  4. Harms - Individuals Allocation Extending or withholding opportunities, resources or

    information Stereotyping The system reinforces stereotypes Quality-of-service System does not work equally well for all groups Representation System over- or under- represents certain groups Denigration The system is actively offensive or derogatory Procedural The system makes decisions that violate social norms Weerts H., An Introduction to Algorithmic Fairness, 2022 Only affected individuals experience this!
  5. Types of Bias - Historical Bias Data reflects how things

    were in the past - We might not want to perpetuate all of it! • Texts often reflect historical inequalities • Poor areas have higher police presence → more arrests / reoffenders Models can pick up biases and perpetuate them into the future!
  6. Types of Bias - Representation Bias Data is often not

    representative of the whole population we care about • Collecting data from underrepresented groups is often neglected as it is expensive • Data often does not exist: Women were often not included in studies → Gender medicine http://gendershades.org/
  7. Types of Bias - Other Measurement Bias Difference in how

    a given variable is measured across sub-populations • Better data quality between different hospitals Model Bias Biases introduced during modeling, e.g. due to under-specified models • e.g. Models only learn the prediction mechanism of the larger class Feedback Loops Model decisions shape data collected in the future • Can, e.g. lead to representation bias if sub-populations are excluded systematically
  8. Example - Loan Application Process Loan Applicant Credit Risk Scoring

    Denied Repayment Process Historical Data What kinds of biases might occur here?
  9. Example - Loan Application Process Loan Applicant Credit Risk Scoring

    Denied Repayment Process Historical Data Historical Bias: Some groups might not have been able to pay back loans in the past but this has changed Feedback loops: We do not learn anything about rejected applicants! This extends indefinitely into the future! Model Bias: The model might not pick up important differences between, e.g. genders Representation Bias: Insufficient data about some groups leading to higher uncertainty.
  10. Approaches to measuring fairness Individual Fairness "Similar people should be

    treated similarly" Group Fairness "On average, different groups of people should be treated equally. " Two Perspectives
  11. Approaches to measuring fairness Individual Fairness "Similar people should be

    treated similarly" Group Fairness "On average, different groups of people should be treated equally. " Two Perspectives What are similar people? What is similar treatment ? Are the groups comparable?
  12. Approaches to measuring fairness We introduce two concepts: Goal: Fairness

    for legally protected groups (age, sex, disability, ethnic origin,...) • Sensitive attribute A: A describes which group an individual belongs to, either 0 or 1. • Decision D: An ADM produces a decision D, and it can be 👎 or 👍. Decisions should fit the true outcome Y, (also 👎 or 👍). Example - Credit risk assessment Individual with characteristics X=X, A=1 (female). She receives the decision D = 👎 but would have paid back a loan (Y = 👍). Our system made an error!
  13. Auditing models for potential harms Statistical Parity (Treatment Equality) "Decisions

    should be independent of sensitive attribute" P(D=👍|A=1) = P(D=👍|A=0) Equality of Opportunity "Chance to deservedly obtain a favourable outcome is independent of sensitive attribute" P(D=👍|A=1,Y=👍) = P(D=👍|A=0,Y=👍) Angus Maguire Interaction Institute for Social Change
  14. Auditing models for potential harms 1. Bias preserving fairness metrics

    Define fairness based on errors (or lack thereof) between a true outcome Y and the decision D. Examples: Equality of opportunity: True positive rates across groups should be equal! Accuracy equality: Accuracy across groups should be equal! 2. Bias transforming fairness metrics This defines fairness based only on decision D. Examples: Statistical parity: Positive rates should be equal across groups Choosing a fairness definition requires an ethical judgement! Conditional statistical parity: Positive rates should be equal given some condition "Acceptance at fire departments should be equal given a minimum height requirement"
  15. Auditing models for potential harms Example: A D Y poor

    👎 👍 rich 👍 👍 rich 👍 👎 poor 👍 👍 • True positive rate parity (D = 👍, Y = 👍) rich: 1/2, poor: 1/1 • Statistical parity (D = 👍) rich: 2/2, poor: 1/2 Metric: Absolute difference between groups |ɸA=0 - ɸA=1| • Fairness needs to be evaluated on a representative dataset
  16. Auditing models for potential harms Fairness metrics reduce many important

    considerations into a single number They can not guarantee that a system is fair
  17. No fairness through unawareness! Naive Idea: Remove the protected attribute!

    The model directly uses race as a feature The model picks up information about race through the proxy-variable ZIP code
  18. Remedies - Algorithmic Solutions Several 'technical' fixes have been proposed

    • Preprocessing Changes the data so resulting models are fairer Example: Balance distributions across different groups • Fair models Learn models that take a fairness criterion into account Example: Linear model with fairness constraints • Postprocessing Adapt decisions to satisfy fairness metrics Example: Accept more people from the disadvantaged group
  19. Remedies - Algorithmic Solutions Technical solutions can only hope to

    'fix' the symptoms but do not address root causes!
  20. Remedies - Recourse • Problem formulation Deciding which problems to

    prioritize using ADM systems is important: Example: Detecting welfare fraud vs. Identifying underserved cases • Accountability & Recourse ◦ Automated systems will make errors - developers need to ensure that humans responsible for addressing errors exist and that they can resolve such errors. ◦ Access to an explanation on how the decision was made and what steps can be taken to address unfavourable decisions. • Documentation Errors often result from using data and models beyond their intended purpose - Data Sheets and Model Cards help to document intended use and important caveats.
  21. Summary Ensuring fair decisions can be difficult: • Organizational Support

    To succeed, fairness needs to be embedded at the product development and engineering level. This requires creating awareness in engineering teams that develop and maintain ADM systems. • Diverse perspectives Harms can occur in many different forms. Considering diverse perspectives and involving stakeholders during system development is essential. • Fairness metrics can be a useful tool to diagnose bias, but to understand what they mean, they need to be grounded in real world quantities.
  22. Resources Books & Articles • Fairness and Machine Learning -

    Limitations and Opportunities (Barocas et al., 2019) • An Introduction to Algorithmic Fairness (Weerts, 2021) with online notes: https://hildeweerts.github.io/responsiblemachinelearning/ • Algorithmic Fairness: Choices, Assumptions, and Definitions (Mitchell et al., 2021) • Why Fairness Cannot Be Automated: Bridging the Gap Between EU Non-Discrimination Law and AI (Wachter, 2021) Software • Fairlearn (Python) https://fairlearn.org/ • aif360 (Python, R) https://aif360.mybluemix.net/ • fairmodels (R) https://fairmodels.drwhy.ai/ • mlr3fairness (R) https://github.com/mlr-org/mlr3fairness
  23. Harms - ADM provider Legal Our systems should not discriminate,

    e.g. Article 21 of the EU Charter of Fundamental Rights Public Image Increased scrutiny on ADM-based products by media and consumer advocacy groups Ethical We want our decision making to reflect our ethical values Regulatory Demonstrate non-discrimination to regulatory bodies
  24. Remedies - Documentation Data Sheets and Model Cards Harm often

    stems from using datasets or models beyond their intended use, this can be prevented by better documentation! • Dataset docs Include information on datasets, how it was collected etc. Example: Datasheets for datasets (Gebru et al., 2018) • Model docs Information on used data, intended use and target demographic. Example: Model Cards for Model Reporting (Mitchell et al., 2019)
  25. Discussion Can simple business rules be unfair? Does fairness matter

    for less consequential decisions? Reconciling fairness and the financial bottom line?
  26. Fairness in European Law Disclaimer: I know only few things

    about law and fairness! This might be wrong! 1. The current legal situation in the EU is unclear, it is likely that this will be shaped by case law in the different member states 2. Error-based metrics are widely used to assess fairness. If they are sufficient depends on whether we can assume that data is 'fair' 3. Wachter et al. (2021) argue that EU law might require a form of conditional statistical parity introduced today.
  27. Algorithmic Fairness ADM's should behave and treat people fairly: without

    unjust treatment on the grounds of sensitive characteristics sensitive characteristics: e.g., legally protected groups (age, sex, disability, ethnic origin, race, ...)
  28. Auditing models for potential harms 1. Students who might succeed

    are admitted equally often between students from poor and rich households. Students are selected according to their ability (measured via grades) Argument: Resources should be made available to those with the highest chance of success. 2. Students from poor and rich households are admitted equally often. Does not take ability into account Argument: Poor students have less access to tutoring - It is not fair to base admission on grades only. The construct grades does not adequately measure ability. Choosing a fairness definition requires an ethical judgement!
  29. Auditing models for potential harms 1. Students with good grades

    are admitted equally between students from poor and rich households. This defines fairness based on errors (or lack thereof) between a true outcome Y (= will succeed) and the decision D. Examples: Equality of opportunity: True positive rates across groups should be equal! Accuracy equality: Accuracy across groups should be equal! 2. Students from poor and rich households are admitted equally often. This defines fairness based only on decision D. Examples: Statistical parity: Positive rates should be equal across groups Conditional statistical parity: Positive rates should be equal given some condition "Acceptance at fire departments should be equal given a minimum height requirement" Choosing a fairness definition requires an ethical judgement!