Slide 1

Slide 1 text

Fair Machine Guidance to Enhance Fair Decision Making in Biased People Mingzhe Yang (The University of Tokyo) Hiromi Arai (RIKEN AIP) Naomi Yamashita (Kyoto University) Yukino Baba (The University of Tokyo) CHI 2024

Slide 2

Slide 2 text

• People judge others unfairly based on their race or gender [1] • The survey of lectures aimed at addressing these biases revealed the following results [2]: [2] Edward H. Chang, Katherine L. Milkman, Dena M. Gromet, Robert W. Rebele, Cade Massey, Angela L. Duckworth, and Adam M. Grant. The Mixed Effects of Online Diversity Training. Proceedings of the National Academy of Sciences 116, 16 (2019), 7778–7783. Fair personnel evaluations is challenging for humans 2 [1] M. Bertrand and S. Mullainthan. Are Emily and Greg More Employable Than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination. American Economic Review 90, 2004 Started to understand the importance of gender fairness No change in the number of women selected in selecting mentors Lecture on how to address biases

Slide 3

Slide 3 text

Research Question How does fair machine guidance impact human evaluation processes? Fair machine guidance: AI guides humans to fair evaluations 3 User 1. Estimating the evaluations when the user evaluates fairly 2. Guide people to be closer to a fair model Fairness-aware ML Training model to enable fair outcome Fair model Fair machine guidance (FMG)

Slide 4

Slide 4 text

Overview of fair machine guidance 4 Fair Model Fairness-aware ML [3] 2 Train a model to simulate human evaluations Unfair Model Standard ML 3 Provide teaching materials on how to make fair decisions Your judgment tendency. In previous questions, you predicted that 20% of Whites and 19% of non-Whites would have a HIGH INCOME. The closer the two values are, the fairer your decisions are. Be fair in your decisions regarding race. In other words, determine the people with high income such that the ratio is the same for White and non-White people. Example of an appropriate response. You predicted that the person below would have a LOW INCOME. To be fair, you should have predicted a HIGH INCOME. Age: 50, Gender: Male Race: Asian Workclass: Self-employed Education: Professional school #years of education: 15 Marital status: Married Relationship: Husband Occupation: Professional specialty Working time: 50h/week Native country: Philippines Your criteria vs. fair criteria. Age: 50, Gender: Male Race: Asian Workclass: Self-employed Education: Professional school #years of education: 15 Marital status: Married Relationship: Husband Occupation: Professional specialty Working time: 50h/week Native country: Philippines Age: 50, Gender: Male Race: Asian Workclass: Self-employed Education: Professional school #years of education: 15 Marital status: Married Relationship: Husband Occupation: Professional specialty Working time: 50h/week Native country: Philippines Your criteria Fair criteria HIGH INCOME LOW INCOME The left column of the figure shows your decision criteria, as estimated from your answers using AI. You tend to predict a high income when the information is blue (or when the value of blue information is high). You tend to predict low income when the information is red (or when the value of red information is high). The right column of the figure shows fair decision criteria, as estimated by Fair AI. Your decision will be fairer if you follow these criteria. To be fair, you should predict a high income when the information is blue (or when the value of blue information is high). To be fair, you should predict a low income when the information is red (or when the value of red information is high). Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam A B We will offer advice to help you make fairer judgments. This advice is provided by "Fair AI," which simulates what your judgment would look like if it were fair. Teaching materials 1 Collect evaluations from humans Age: 21, Gender: Male Race: White Workclass: Private Education: Bachelors #years of education: 10 Marital status: Never-married Relationship: Unmarried Occupation: Transport-moving Working time: 30h/week Native country: the U.S. Age: 47, Gender: Female Race: Asian Workclass: Private Education: Masters #years of education: 14 Marital status: Never-married Relationship: Not-in-family Occupation: Tech-support Working time: 42h/week Native country: India Age: 31, Gender: Male Race: Black Workclass: Private Education: Bachelors #years of education: 12 Marital status: Never-married Relationship: Unmarried Occupation: Highschool teacher Working time: 45h/week Native country: the U.S. Human evaluations [3] Agarwal, Alekh, et al. "A reductions approach to fair classification." International conference on machine learning. PMLR, 2018.

Slide 5

Slide 5 text

Occupation: Professional specialty Working time: 50h/week Native country: Philippines Your criteria vs. fair criteria. Age: 50, Gender: Male Race: Asian Workclass: Self-employed Education: Professional school #years of education: 15 Marital status: Married Relationship: Husband Occupation: Professional specialty Working time: 50h/week Native country: Philippines Age: 50, Gender: Male Race: Asian Workclass: Self-employed Education: Professional school #years of education: 15 Marital status: Married Relationship: Husband Occupation: Professional specialty Working time: 50h/week Native country: Philippines Your criteria Fair criteria HIGH INCOME LOW INCOME The left column of the figure shows your decision criteria, as estimated from your answers using AI. You tend to predict a high income when the information is blue (or when the value of blue information is high). You tend to predict low income when the information is red (or when the value of red information is high). The right column of the figure shows fair decision criteria, as estimated by Fair AI. Your decision will be fairer if you follow these criteria. To be fair, you should predict a high income when the information is blue (or when the value of blue information is high). To be fair, you should predict a low income when the information is red (or when the value of red Teaching materials highlights fair criteria and user criteria 5 User’s criteria User’s evaluation was biased against this attribute Focusing on this attribute makes the evaluation fair

Slide 6

Slide 6 text

Two personal assessment tasks 6 Income (racial fairness) Age: 47, Gender: Female Race: Asian Workclass: Private Education: Masters #years of education: 14 Marital status: Never-married Relationship: Not-in-family Occupation: Tech-support Working time: 42h/week Native country: India 1 Age: 31, Gender: Male Race: Black Occupation: Highschool teacher Housing: Rent Saving accounts: Moderate Checking account: Little Credit amount: $4,300 Duration: 48 month Purpose: Car Credit (gender fairness) 2 Q. “Is the person’s income high or low?” Q. “Is the person’s credit risk high or low?” (*) We asked participants to be as fair as possible in their decisions

Slide 7

Slide 7 text

Two experiment conditions 7 Your judgment tendency. In previous questions, you predicted that 20% of Whites and 19% of non-Whites would have a HIGH INCOME. The closer the two values are, the fairer your decisions are. Be fair in your decisions regarding race. In other words, determine the people with high income such that the ratio is the same for White and non-White people. Example of an appropriate response. You predicted that the person below would have a LOW INCOME. To be fair, you should have predicted a HIGH INCOME. Age: 50, Gender: Male Race: Asian Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed dia We will offer advice to help you make fairer judgments. This advice is provided by "Fair AI," which simulates what your judgment would look like if it were fair. Unfairness score FT Your judgment tendency. In previous questions, you predicted that 20% of Whites and 19% of non-Whites would have a HIGH INCOME. The closer the two values are, the fairer your decisions are. Be fair in your decisions regarding race. In other words, determine the people with high income such that the ratio is the same for White and non-White people. Example of an appropriate response. You predicted that the person below would have a LOW INCOME. To be fair, you should have predicted a HIGH INCOME. Age: 50, Gender: Male Race: Asian Workclass: Self-employed Education: Professional school #years of education: 15 Marital status: Married Relationship: Husband Occupation: Professional specialty Working time: 50h/week Native country: Philippines Your criteria vs. fair criteria. Age: 50, Gender: Male Race: Asian Workclass: Self-employed Education: Professional school #years of education: 15 Marital status: Married Relationship: Husband Occupation: Professional specialty Working time: 50h/week Native country: Philippines Age: 50, Gender: Male Race: Asian Workclass: Self-employed Education: Professional school #years of education: 15 Marital status: Married Relationship: Husband Occupation: Professional specialty Working time: 50h/week Native country: Philippines Your criteria Fair criteria HIGH INCOME LOW INCOME Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam A B We will offer advice to help you make fairer judgments. This advice is provided by "Fair AI," which simulates what your judgment would look like if it were fair. Highlighted criteria Baseline: Bias feedback (BF) Ours: Fair machine guidance (FMG) Unfairness score Highlighted criteria — Example: Example:

Slide 8

Slide 8 text

• Only participants with a high unfairness score proceed to the treatment phase Experiments with biased people 8 post-test pre-test (N=459) mini-test Bias feedback Income: N=37 Credit : N=13 Fair machine guidance Income: N=39 Credit : N=10 5x Screening (N=99) treatment Unfair Model Fair Model Fair ML Standard ML Your judgment tendency. In previous questions, you predicted that 20% of Whites and 19% of non-Whites would have a HIGH INCOME. The closer the two values are, the fairer your decisions are. Be fair in your decisions regarding race. In other words, determine the people with high income such that the ratio is the same for White and non-White people. Example of an appropriate response. You predicted that the person below would have a LOW INCOME. To be fair, you should have predicted a HIGH INCOME. Age: 50, Gender: Male Race: Asian Workclass: Self-employed Education: Professional school #years of education: 15 Marital status: Married Relationship: Husband Occupation: Professional specialty Working time: 50h/week Native country: Philippines Your criteria vs. fair criteria. Age: 50, Gender: Male Race: Asian Workclass: Self-employed Education: Professional school #years of education: 15 Marital status: Married Relationship: Husband Occupation: Professional specialty Working time: 50h/week Native country: Philippines Age: 50, Gender: Male Race: Asian Workclass: Self-employed Education: Professional school #years of education: 15 Marital status: Married Relationship: Husband Occupation: Professional specialty Working time: 50h/week Native country: Philippines Your criteria Fair criteria HIGH INCOME LOW INCOME The left column of the figure shows your decision criteria, as estimated from your answers using AI. You tend to predict a high income when the information is blue (or when the value of blue information is high). You tend to predict low income when the information is red (or when the value of red information is high). The right column of the figure shows fair decision criteria, as estimated by Fair AI. Your decision will be fairer if you follow these criteria. To be fair, you should predict a high income when the information is blue (or when the value of blue information is high). To be fair, you should predict a low income when the information is red (or when the value of red information is high). Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam A B We will offer advice to help you make fairer judgments. This advice is provided by "Fair AI," which simulates what your judgment would look like if it were fair. Teaching materials Age: 21, Gender: Male Race: White Workclass: Private Education: Bachelors #years of education: 10 Marital status: Never-married Relationship: Unmarried Occupation: Transport-moving Working time: 30h/week Native country: the U.S. Age: 47, Gender: Female Race: Asian Workclass: Private Education: Masters #years of education: 14 Marital status: Never-married Relationship: Not-in-family Occupation: Tech-support Working time: 42h/week Native country: India Age: 31, Gender: Male Race: Black Workclass: Private Education: Bachelors #years of education: 12 Marital status: Never-married Relationship: Unmarried Occupation: Highschool teacher Working time: 45h/week Native country: the U.S. Human evaluations 1 2 3 post-test & surveys • Measured the improvement of unfairness and the impact on evaluations

Slide 9

Slide 9 text

Overview of our findings 9 The improvement of fairness 1. Many participants in FMG improved their unfairness Motivation to correct bias 2. Fair machine guidance provided opportunities to reconsider fairness Adjustment of evaluation criteria 3. Fair machine guidance encouraged participants to adjust their evaluation criteria 4. Even those who did not trust and follow AI showed changes in their evaluation

Slide 10

Slide 10 text

Findings 1: FMG improved people’s own unfairness 10 • Many participants became fairer in both FMG and BF • But there were differences in the process leading to their evaluations Improved: pre unfairness > post unfairness Worsen: pre unfairness < post unfairness

Slide 11

Slide 11 text

FMG provided participants with more opportunities to reconsider the fairness than BF Findings 2: FMG motivates people to reconsider the fairness 11 Q. Did these tasks make you reconsider the fairness of your own decisions and those required by society?

Slide 12

Slide 12 text

“I am fair because I made a comprehensive decision. The AI guidance appeared to provide superfluous information for decision making” — P45, Credit, BF Findings 2: FMG motivates people to reconsider the fairness 12 FMG provided a motivation to revise their own fairness BF did not motivate participants to reconsider fairness due to their confidence in their own sense of fairness “I did not intend to apportion income by gender; however, I was reminded that this was the basis of my thinking and felt that I had to revise it” — P3, Income, FMG

Slide 13

Slide 13 text

Participants in fair machine guidance changed their criteria Findings 3: FMG prompts people to change their criteria 13 (*) We asked participants to respond the attributes in their evaluation in both the pre- and post-test

Slide 14

Slide 14 text

Findings 3: FMG prompts people to change their criteria 14 Showing fair criteria made them realize the value of the diverse perspective “I felt that it is important to evaluate people from a range of perspectives, rather than based on a single piece of information, such as gender, age, or race” — P21, Income, FMG

Slide 15

Slide 15 text

Findings 4: Some people rejected AI, yet gained insight 15 Some participants who distrusted and rejected the AI's guidance still gained new insights “I was not persuaded by the hints presented (and did not follow them). I felt that (for me) there was a tendency to judge one’s ability to pay based on their occupation.” — P15, Income, FMG

Slide 16

Slide 16 text

Takeaways • We investigated how fair AI can guide human to fair evaluations • Fair machine guidance encouraged participants to reconsider fairness and to adjust their criteria • We emphasize the need for AI systems aimed at reducing biases to stimulate critical engagement and self-reflection among users Your judgment tendency. In previous questions, you predicted that 20% of Whites and 19% of non-Whites would have a HIGH INCOME. The closer the two values are, the fairer your decisions are. Be fair in your decisions regarding race. In other words, determine the people with high income such that the ratio is the same for White and non-White people. Example of an appropriate response. You predicted that the person below would have a LOW INCOME. To be fair, you should have predicted a HIGH INCOME. Age: 50, Gender: Male Race: Asian Workclass: Self-employed Education: Professional school #years of education: 15 Marital status: Married Relationship: Husband Occupation: Professional specialty Working time: 50h/week Native country: Philippines Your criteria vs. fair criteria. Age: 50, Gender: Male Race: Asian Workclass: Self-employed Education: Professional school #years of education: 15 Marital status: Married Relationship: Husband Occupation: Professional specialty Working time: 50h/week Native country: Philippines Age: 50, Gender: Male Race: Asian Workclass: Self-employed Education: Professional school #years of education: 15 Marital status: Married Relationship: Husband Occupation: Professional specialty Working time: 50h/week Native country: Philippines Your criteria Fair criteria HIGH INCOME LOW INCOME The left column of the figure shows your decision criteria, as estimated from your answers using AI. You tend to predict a high income when the information is blue (or when the value of blue information is high). You tend to predict low income when the information is red (or when the value of red information is high). Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam A B We will offer advice to help you make fairer judgments. This advice is provided by "Fair AI," which simulates what your judgment would look like if it were fair.