Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Manojit Nandi - Measures and Mismeasures of algorithmic fairness

Manojit Nandi - Measures and Mismeasures of algorithmic fairness

Within the last few years, researchers have come to understand that machine learning systems may display discriminatory behavior with regards to certain protected characteristics, such as gender or race. To combat these harmful behaviors, we have created multiple definitions of fairness to enable equity in machine learning algorithms. In this talk, I will cover these different definitions of algorithmic fairness and discuss both the strengths and limitations of these formalizations. In addition, I will cover other best practices to better mitigate the unintended bias of data products.

https://us.pycon.org/2019/schedule/presentation/226/

PyCon 2019

May 04, 2019
Tweet

More Decks by PyCon 2019

Other Decks in Programming

Transcript

  1. Measures and mismeasures
    of algorithmic fairness
    Manojit Nandi
    Senior Data Scientist,
    J.P Morgan Chase
    @mnandi92

    View Slide

  2. About Me (According to Google Cloud Vision API)
    ● Dancer? Aerial dancer and
    circus acrobat.
    ● Entertaining? Hopefully.
    ● Fun? Most of the time.
    ● Girl?!?

    View Slide

  3. What is Algorithmic Fairness?

    View Slide

  4. Algorithmic Fairness
    ● Algorithmic Fairness is a growing
    field of research that aims to
    mitigate the effects of unwarranted
    bias/discrimination on people in
    machine learning.
    ● Primarily focused on mathematical
    formalisms of fairness and
    developing solutions for these
    formalisms.
    ● IMPORTANT: Fairness is inherently
    a social and ethical concept.
    Source: Fairness and Abstraction in
    Socio-technical Systems; Selbst, boyd,
    Friedler, Venkatasubramanian & Vertesi
    (2018)

    View Slide

  5. BuT mAtH cAn’T bE rAcist!!
    ● No one is sincerely arguing that
    mathematics or computer science is
    inherently discriminatory.
    ● However, the way people apply
    mathematical models or algorithms to
    real-world problems can reinforce
    societal inequalities.

    View Slide

  6. Fairness, Accountability, Transparency (FAT*) ML
    ● Interdisciplinary research area that
    focuses on creating
    machine-learning systems that
    work towards goals, such as
    fairness and justice.
    ● Many open-source libraries
    (FairTest, thesis-ml, AI 360)
    developed based on this research.
    ● ACM FAT* 2019 Conference held in
    Atlanta, GA back in January.
    Photo credits: Moritz Hardt

    View Slide

  7. Algorithmic Fairness in Popular Media

    View Slide

  8. Legal Regulations
    In the United States, many industries have legal
    regulations to prevent disparate impact against
    vulnerable populations.
    ● Education (Education Amendments Act)
    ● Employment (Civil Rights Act)
    ● Credit (Equal Credit Opportunity Act)
    ● Housing (Fair Housing Act)

    View Slide

  9. Types of Algorithmic Biases
    Kate Crawford Hanna Wallach Solan Barocas Aaron Shapiro
    Microsoft Research Microsoft Research Cornell University Microsoft Research

    View Slide

  10. Bias in Allocation
    ● Most commonly researched family
    of algorithmic fairness problem
    (why we invented the math
    definitions).
    ● Algorithmic Idea: How do models
    perform in binary classification
    problems across different groups?
    ● Fundamental Idea: When
    allocating finite resources (credit
    loans, gainful employment), we
    often favor the privileged class
    over the more vulnerable.
    Source: Reuters News

    View Slide

  11. Bias in representation
    ● Focused on looking at how harmful
    labels/representations are propagated.
    ● Often related to language and
    computer vision problems.
    ● Harder to quantify error compared to
    bias in allocation problems.

    View Slide

  12. ● Concerned with algorithms
    promoting harmful
    stereotypes and lack of
    recognition.
    SnapChat filters.

    View Slide

  13. Weaponization of Machine Learning
    ● As data scientist, we are often not
    taught to think about how models
    could be used inappropriately.
    ● With the increasing usage of AI in
    high-stakes situations, we must be
    careful not to harm vulnerable
    populations.
    Source: Why Stanford Researcher tried to Create
    A “Gaydar” Machine; New York Times

    View Slide

  14. Types of Fairness Measures
    Sam Corbett-Davies
    Stanford University
    Sharad Goel
    Stanford University

    View Slide

  15. “21 Definitions of Algorithmic Fairness”
    ● There are more than 30 different
    mathematical definitions of fairness in
    the academic literature.
    ● There isn’t a one, true definition of
    fairness.
    ● These definitions can be grouped
    together into three families:
    ○ Anti-Classification
    ○ Classification Parity
    ○ Calibration
    Pictured: Princeton CS
    professor, Arvind Narayanan

    View Slide

  16. Anti-Classification
    ● Heuristic: Algorithmic decisions “ignore”
    protected attributes. (Individual Fairness)
    ● In addition to excluding protected
    attributes, one must also be concerned
    about learning proxy features.
    ● Useful for defining loss function of
    fairness-aware models.
    Same Outcome “Unprotected” features

    View Slide

  17. Fairness-Aware Algorithms
    ● Given a set of features X, labels Y,
    and protected characteristics Z, we
    want to create a model that learns
    to predict the labels Y, but also
    doesn’t “accidentally” learn to
    predict the protected characteristics
    Z.
    ● Can view this constrained
    optimization as akin to
    regularization. Sometimes referred
    to as accuracy-fairness trade-off. Source: Towards Fairness in ML with
    Adversarial Networks (GoDrivenData)
    Is good classifier? Learning protected attributes?

    View Slide

  18. Dangers of Anti-Classification Measures
    ● By “removing” protected features, we
    ignore the underlying processes that
    affect different demographics.
    ● Fairness metrics are focused on making
    outcomes equal.
    ● DANGER! Sometimes making outcomes
    equal adversely impacts a vulnerable
    demographic.
    Source: Corbett-Davies, Goel (2019)

    View Slide

  19. Classification Parity
    ● Given some traditional
    classification measure
    (accuracy, false positive rate),
    is our measure equal across
    different protected groups.
    (Group Fairness)
    ● Most commonly used to audit
    algorithms from a legal
    perspective.
    Source: Gender Shades,
    Buolamwini & Gebru (2018)

    View Slide

  20. Demographic Parity
    ● Demographic Parity looks at the
    proportion of positive outcomes by
    protected attribute group.
    ● Demographic Parity is used to
    audit models for disparate impact
    (80% rule).
    ● DANGER! Satisfying immediate
    constraint may have potential
    negative long-term consequences.
    Source: Delayed Impact of Fair Machine Learning,
    Liu et. al (2018)

    View Slide

  21. Parity of False Positive Rates
    ● As the name suggest, this measures looks
    at false positive rate across different
    protected groups.
    ● Sometimes called “Equal Opportunity”
    ● It’s possible to have improve false positive
    rate by increasing number of true
    negatives.
    ● DANGER! If we don’t take into
    considerations societal factors, we may
    end up harming vulnerable populations.
    Ignore number of false
    positives, just increase
    this.

    View Slide

  22. Calibration
    ● In case of risk assessment (recidivism, child
    protective services), we use a scoring
    function s(x) to estimate the true risk to the
    individual.
    ● We define some threshold t to make a
    decision when s(x) > t.
    ● Example: Child Protective Services (CPS)
    assigns a risk score (1-20) to child. CPS
    intervenes if the perceived risk to the child is
    high enough.

    View Slide

  23. Statistical Calibration
    ● Heuristic: Two individuals with the same risk score s have the same likelihood
    of receiving the outcome.
    ● A risk score of 10 should mean the same thing for a white individual as it does
    for a black individual.

    View Slide

  24. Debate about Northpointe’s COMPAS
    ● COMPAS is used to assign a recidivism
    risk score to prisoners.
    ● ProPublica Claim: Black defendants have
    higher false positive rates.
    ● Northpointe Defense: Risk scores are
    well-calibrated by groups.

    View Slide

  25. Datasheets, Model Cards, and Checklists

    View Slide

  26. Datasheets for Data Sets
    ● Taking inspiration from safety standards in
    other industries, such as automobile testing
    and clinical drug trials, Gebru et. al (2017)
    propose standards for documenting datasets.
    ● Documentation questions include:
    ○ How was the data collection? What time frame?
    ○ Why was the dataset created? Who funded its
    creation?
    ○ Does the data contain any sensitive information?
    ○ How was the dataset pre-processed/cleaned?
    ○ If data relates to people, were they informed about
    the intended use of the data?
    ● What makes for a good dataset?

    View Slide

  27. Model Cards for Model Reporting
    ● Google researchers propose a
    standard for documenting deployed
    models.
    ● Sections include:
    ○ Intended Use
    ○ Factors (evaluation amongst
    demographic groups)
    ○ Ethical Concerns
    ○ Caveats and Recommendations.
    ● More transparent model reporting
    will allows users to better
    understand when they should (or
    should not) use your model.
    Mitchell et. al (2019)

    View Slide

  28. Deon: Ethical Checklist for Data Science
    ● Deon (by DrivenData) is a ethics
    checklist for data projects.
    ○ Data Collection
    ○ Data Storage
    ○ Analysis
    ○ Modeling
    ○ Deployment
    ● CLI tool creates Markdown file in
    your repo with this checklist.

    View Slide

  29. AI Now Institute
    ● New York University research
    institute that focuses on
    understanding the societal
    and cultural impact of AI and
    machine learning.
    ● Hosts an annual symposium
    on Ethics, Organizing, and
    Accountability.
    ● Recently produced report on
    diversity crisis in AI and how it
    affects the development of
    technical systems.

    View Slide

  30. Papers Referenced
    1. The Measures and Mismeasures of Fairness: A Critical Review of Fair Machine
    Learning; https://5harad.com/papers/fair-ml.pdf
    2. The Misgendering Machine: Trans/HCI Implications of Automatic Gender
    Recognition; https://ironholds.org/resources/papers/agr_paper.pdf
    3. Delayed Impact of Fair Machine Learning; https://arxiv.org/pdf/1803.04383.pdf
    4. Data Sheets for Datasets; https://arxiv.org/pdf/1803.09010.pdf
    5. Model Cards for Model Reporting; https://arxiv.org/pdf/1810.03993.pdf
    6. Gender Shades: Intersectional Accuracy Disparities in Commercial Gender
    Classification; http://proceedings.mlr.press/v81/buolamwini18a/buolamwini18a.pdf
    7. Fairness and Abstraction in Sociotechnical Systems;
    https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3265913
    8. Discriminating Systems: Gender Race and Power in AI;
    https://ainowinstitute.org/discriminatingsystems.pdf

    View Slide