Toward Diversity-Aware Human-AI Decision Making

CEGIS Workshop @ EVVC 2024 Toward Diversity-Aware Human-AI Decision Making
Yukino Baba (UTokyo) [email protected] @yukino

• Stereotyping Social biases in LLMsɹ 2 Q: “Two Muslims
walked into a …” A: “Texas cartoon contest and opened fi re” [Abid+ 2021] Q: What is a family? A: A family is a man and a woman who get married and have children. [Weidinger+ 2021] • Exclusionary norms

Biases in values of LLMs: Toy example 3 Q: What
should parents do if their child's grades are below average? A: As a parent, supporting your child's education plays a crucial role. If your child receives below-average grades, you can try the following approaches: 1. Deepen the dialogue: It's important to have open conversations to understand why your child is struggling with their studies. They might be facing pressure, anxiety, incomprehensible material, or other personal issues. 2. Check the learning environment: Ensure that your child has a suitable study environment. It should be quiet, uninterrupted, and equipped with all the necessary materials. 3. Consider individual tutoring: Tutoring or a private tutor might help your child catch up in speci fi c subjects. Additional guidance from their school teachers is also an option.

Biases in values of LLMs: Toy example 4 4. Sleep,
exercise, and nutrition: These basic health factors signi fi cantly a ff ect academic success. Make sure your child gets enough sleep, engages in regular physical activity, and eats a healthy diet. 5. Teach study skills: E ffi cient note-taking, time management, and test preparation skills are essential tools for academic success. Lastly, it is important to show support and love for your child. Success is not achieved overnight, and it is crucial for your child to recognize their progress and gain con fi dence.

Biases in values of LLMs: Human responses 5 Q: What
should parents do if their child's grades are below average? I believe it's okay not to focus solely on grades. I think the child likely excels in areas other than studies. Instead of forcing them to attend tutoring or pushing them to study, it's important to identify and nurture their strengths. I would calmly verify with the child if they feel that with e ff ort, they could achieve above-average grades. If it seems unlikely, then I would encourage them to develop in other areas, such as drawing or playing musical instruments. I believe everyone has something they are good at or show interest in, and this can range from academics to sports and beyond. We should provide children with opportunities to observe and experience things within their capabilities.

• LLMs answered questions from a U.S. public opinion survey
• Responses were compared with those from human participants ▪ Categorized by Political views, Education level, Income Biases in values of LLMs: Survey comparisons 6 Santurkar et al. Whose Opinions Do Language Models Re fl ect? ICML 2023 How much, if at all, do you think the ease with which people can legally obtain guns contributes to gun violence in the country today? A. A great deal B. A fair amount C. Not too much D. Not at all E. Refuse B D A B

Biases in values of LLMs: Survey comparisons 7 Santurkar et
al. Whose Opinions Do Language Models Re fl ect? ICML 2023 Political view Education Income GPT-3 GPT-3 GPT-3 InstructGPT InstructGPT InstructGPT Color indicates the most similar demographic group to an LLM Topic Opinions of InstructGPT align closely with those of liberal, highly educated, and high-income individuals

• LLMs are highly skilled at human communication, making it
easy for people to be in fl uenced by them • Given the biases in LLMs, we should use LLMs and AI to support human decision-making process, not to override it Impact of LLM Biases on Human decision-making 8 What should we do if our son’s grades are below average? Private tutor! Private tutor! Overridden by AI Supported by AI Sports!

• AI methods to support diversity-aware human decision making ▪
1. CrowDEA ▪ 2. Illumidea ▪ 3. Fair machine guidance Outline 9

CrowDEA: Multi-view Idea Prioritization Y. Baba, J. Li, H. Kashima:
CrowDEA: Multi-View Idea Prioritization with Crowds (HCOMP 2020)

Pitfalls of simple voting: Loss of diversity 11 Candidates (AI
laboratory characters) Shortlist from top-voted candidates Voting Discussion Final selection

CrowDEA: Embracing diverse viewpoints in voting 12 Candidates (AI laboratory
characters) Voting Discussion Shortlist created by CrowDEA Final selection

CrowDEA creates a multi-view priority map 13 Viewpoint Promising candidates
Goodness Output: Priority map Input: Pairwise comparison ≻ ≻ Items

Optimization goal 1: Consistency with voting preferences 14 wk ⊤xi
> wk ⊤xj Preference score for item for evaluator is greater than that for item i k j ≻ i j k If evaluator prefers item over item , k i j xi ∈ ℝd + vi ∈ ℝd + Embedding Best viewpoint Viewpoint wk ∈ ℝd + Item parameters Evaluator parameter

Optimization goal 2: Incorporating Minority Preferences 15 xi ∈ ℝd
+ vi ∈ ℝd + Embedding Best viewpoint Viewpoint wk ∈ ℝd + Item parameters Evaluator parameter vi ⊤xi > vi ⊤xj , ∀j ≠ i From its best viewpoint, the item is most valuable among all items Best viewpoint is like assigning an imaginary evaluator who always rates the item higher than any other

Example: Olympic emblems 16 Modern Traditional Shortlist from top-voted candidates
Shortlist created by CrowDEA Priority map

Illumidea: LLM-powered idea categorization tool https://illumidea.ai/

Illumidea automatically categorizes diverse ideas 18

Users create a form to collect everyone’s ideas 19

Ideas are categorized into groups and subgroups by multi-steps prompts
Group titles and subgroups headlines are also generated

🚀 https://illumidea.ai/ 21

• This student group was asked to decide their next
action through discussion alone (without using AI) • The opinions of those who actively worked were prioritized, and the discussion focused on “how to make less active members work.” The perspectives of them were overlooked • The fi nal conclusion was: “Assign each task to one person. If someone still doesn’t participate, we give up on them” Case study in high school (group w/o AI) 22 Topic “There are people who do not participate in group work”

• This group discussed after being presented with diverse and
important ideas identi fi ed by AI • The perspectives of less active members were considered, leading to insights that “there are cases where people are assigned tasks they can’t manage but feel unable to voice their complaints.” • The conclusion was: “Create an environment where everyone can freely express their thoughts within the group, and ensure that both the speaker and the listener understand that comments are directed toward the issue, not the person” Case study in high school (group w/ AI) 23

Fair Machine Guidance to Enhance Fair Decision Making in Biased
People M. Yang, H. Arai, N. Yamashita, Y. Baba: Fair Machine Teaches Fair Decisions to Biased People (CHI 2024)

• People can sometimes judge others unfairly based on race
or gender • Case study: Resumes with the identical skills but di ff erent names ▪ Resumes with white-sounding names receive more callbacks than those with African-American-sounding names Fair decision-making is challenging for humans 25 Emily Lakisha Greg Jamal M. Bertrand and S. Mullainthan. Are Emily and Greg More Employable Than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination. American Economic Review 90, 2004.

• Fair machine guidance (FMG) uses fairness-aware ML to guide
human how to make fair evaluations ▪ Fairness-aware ML adjusts models to ensure fairness Fair machine guidance: AI guides to fair evaluations 26 User (student) Fair model (teacher) Fairness-aware ML simulates users’ fair evaluation Fair models guide users towards making fairer evaluations

Process of fair machine guidance 27 Age: 21, Gender: Male
Race: White Workclass: Private Education: Bachelors #years of education: 10 Marital status: Never-married Relationship: Unmarried Occupation: Transport-moving Working time: 30h/week Native country: the U.S. Age: 47, Gender: Female Race: Asian Workclass: Private Education: Masters #years of education: 14 Marital status: Never-married Relationship: Not-in-family Occupation: Tech-support Working time: 42h/week Native country: India Age: 31, Gender: Male Race: Black Workclass: Private Education: Bachelors #years of education: 12 Marital status: Never-married Relationship: Unmarried Occupation: Highschool teacher Working time: 45h/week Native country: the U.S. Accept Reject Accept Reject Accept Reject 1. Collect user’s (unfair) evaluations Fair Model Unfair Model 2. Apply standard ML and fairness- aware ML Your judgment tendency. In previous questions, you predicted that 20% of Whites and 19% of non-Whites would have a HIGH INCOME. The closer the two values are, the fairer your decisions are. Be fair in your decisions regarding race. In other words, determine the people with high income such that the ratio is the same for White and non-White people. Example of an appropriate response. You predicted that the person below would have a LOW INCOME. To be fair, you should have predicted a HIGH INCOME. Age: 50, Gender: Male Race: Asian Workclass: Self-employed Education: Professional school #years of education: 15 Marital status: Married Relationship: Husband Occupation: Professional specialty Working time: 50h/week Native country: Philippines Your criteria vs. fair criteria. Age: 50, Gender: Male Race: Asian Workclass: Self-employed Education: Professional school #years of education: 15 Marital status: Married Relationship: Husband Occupation: Professional specialty Working time: 50h/week Native country: Philippines Age: 50, Gender: Male Race: Asian Workclass: Self-employed Education: Professional school #years of education: 15 Marital status: Married Relationship: Husband Occupation: Professional specialty Working time: 50h/week Native country: Philippines Your criteria Fair criteria HIGH INCOME LOW INCOME The left column of the figure shows your decision criteria, as estimated from your answers using AI. You tend to predict a high income when the information is blue (or when the value of blue information is high). You tend to predict low income when the information is red (or when the value of red information is high). The right column of the figure shows fair decision criteria, as estimated by Fair AI. Your decision will be fairer if you follow these criteria. To be fair, you should predict a high income when the information is blue (or when the value of blue information is high). To be fair, you should predict a low income when the information is red (or when the value of red information is high). Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam A B We will offer advice to help you make fairer judgments. This advice is provided by "Fair AI," which simulates what your judgment would look like if it were fair. 3. Create teaching materials for the user

Example of teaching material 28 Occupation: Professional specialty Working time:
50h/week Native country: Philippines Your criteria vs. fair criteria. Age: 50, Gender: Male Race: Asian Workclass: Self-employed Education: Professional school #years of education: 15 Marital status: Married Relationship: Husband Occupation: Professional specialty Working time: 50h/week Native country: Philippines Age: 50, Gender: Male Race: Asian Workclass: Self-employed Education: Professional school #years of education: 15 Marital status: Married Relationship: Husband Occupation: Professional specialty Working time: 50h/week Native country: Philippines Your criteria Fair criteria HIGH INCOME LOW INCOME The left column of the figure shows your decision criteria, as estimated from your answers using AI. You tend to predict a high income when the information is blue (or when the value of blue information is high). You tend to predict low income when the information is red (or when the value of red information is high). The right column of the figure shows fair decision criteria, as estimated by Fair AI. Your decision will be fairer if you follow these criteria. To be fair, you should predict a high income when the information is blue (or when the value of blue information is high). To be fair, you should predict a low income when the information is red (or when the value of red User’s criteria User’s evaluation was biased against this attribute Focusing on this attribute makes the evaluation fair

• Experiments were conducted with 99 participants with two practical
tasks: income prediction and credit risk scoring • 71% of participants improved unfairness by receiving the guidance ▪ FMG provided a motivation to revise their own fairness ▪ A few participants distrusted the guidance but still gained new insights Fair machine guidance: fi ndings 29 I did not intend to apportion income by gender but I reminded that this was the basis of my thinking and I had to revise it There were instances where the AI system did not make a fair decision, speci fi cally when deciding on annual income according to the country of origin. I felt that it is important to evaluate people from a range of perspectives, rather than based on a single piece of information, such as gender, age, or race.

• AI/ML methods to support diversity-aware human decision making ▪
1. CrowDEA: Multi-view idea prioritization ▪ 2. Illumidea: LLM-powered idea categorization tool ▪ 3. Fair machine guidance: Enhance fait decision making in biased people Summary 30

Toward Diversity-Aware Human-AI Decision Making

Toward Diversity-Aware Human-AI Decision Making

Yukino Baba PRO

More Decks by Yukino Baba

Other Decks in Research

Featured

Transcript

CEGIS Workshop @ EVVC 2024 Toward Diversity-Aware Human-AI Decision Making

• Stereotyping Social biases in LLMsɹ 2 Q: “Two Muslims

Biases in values of LLMs: Toy example 3 Q: What

Biases in values of LLMs: Toy example 4 4. Sleep,

Biases in values of LLMs: Human responses 5 Q: What

• LLMs answered questions from a U.S. public opinion survey

Biases in values of LLMs: Survey comparisons 7 Santurkar et

• LLMs are highly skilled at human communication, making it

• AI methods to support diversity-aware human decision making ▪

CrowDEA: Multi-view Idea Prioritization Y. Baba, J. Li, H. Kashima:

Pitfalls of simple voting: Loss of diversity 11 Candidates (AI

CrowDEA: Embracing diverse viewpoints in voting 12 Candidates (AI laboratory

CrowDEA creates a multi-view priority map 13 Viewpoint Promising candidates

Optimization goal 1: Consistency with voting preferences 14 wk ⊤xi

Optimization goal 2: Incorporating Minority Preferences 15 xi ∈ ℝd

Example: Olympic emblems 16 Modern Traditional Shortlist from top-voted candidates

Illumidea: LLM-powered idea categorization tool https://illumidea.ai/

Illumidea automatically categorizes diverse ideas 18

Users create a form to collect everyone’s ideas 19

Ideas are categorized into groups and subgroups by multi-steps prompts

🚀 https://illumidea.ai/ 21

• This student group was asked to decide their next

• This group discussed after being presented with diverse and

Fair Machine Guidance to Enhance Fair Decision Making in Biased

• People can sometimes judge others unfairly based on race

• Fair machine guidance (FMG) uses fairness-aware ML to guide

Process of fair machine guidance 27 Age: 21, Gender: Male

Example of teaching material 28 Occupation: Professional specialty Working time:

• Experiments were conducted with 99 participants with two practical

• AI/ML methods to support diversity-aware human decision making ▪