Slide 1

Slide 1 text

Towards Socially Unbiased Generative A rt i fi cial Intelligence Professor Danushka Bollegala University of Liverpool

Slide 2

Slide 2 text

Is AI Socially Biased? 2 Images created by DALL-E (OpenAI) h tt ps:// rt icle/wxdawn/the-ai-that-draws-what-you-type-is-very-racist-shocking-no-one

Slide 3

Slide 3 text

Is AI Socially Biased? 3 Images generated by DALL-E (OpenAI)

Slide 4

Slide 4 text

Is AI Socially Biased? 4 Images generated by Stable Di ff usion (Stability.AI) “Janitor” “Asse rt ive Fire fi ghter” h tt ps:// fi nd-stable-di ff usion-ampli fi es-stereotypes/

Slide 5

Slide 5 text

Is AI Socially Biased? 5 Tweets by Steven Piantadosi (sho rt Dec 4, 2022

Slide 6

Slide 6 text

Machine Learning 101 6 Model Raw Data Algorithm Human Labels Training Data Inference

Slide 7

Slide 7 text

Types of biases 7 Bayeza-Yates 2018

Slide 8

Slide 8 text

The Wisdom of Crowd a Few • 50% of all a rt icles in Wikipedia (at the sta rt ) was wri tt en by 0.04% (ca. 2000) of its editors. • Only 4% of all active users at Amazon write product reviews • 50% of all websites are in English, whereas there are only 5% of native English speakers (becomes 13% if we add non-natives) in the World. • Only 7% of Facebook users produce 50% of the posts • 0.05% of most popular people a tt racts (followed by) 50% of the Twi tt er users • Zip’s Least E ff o rt Principle — many people do only a li tt le while few people do a lot 8

Slide 9

Slide 9 text

Gender Bias 9 Accumulated Fraction of Women’s biographies in Wikipedia Bayeza-Yates 2018

Slide 10

Slide 10 text

But it is not OK for AI to be biased! • Legal argument • Title VII of the Civil Rights Act of 1964 in the UK • Prohibits employment discrimination due to race, religion, gender and ethnicity • EU Cha rt er of Fundamental Rights, Title III (Equality), A rt icle 21 Non-discrimination • 1. Any discrimination based on any ground such as sex, race, colour, ethnic or social origin, genetic features, language, religion or belief, political or any other opinion, membership of a national minority, prope rt y, bi rt h, disability, age or sexual orientation shall be prohibited. • 2. Within the scope of application of the Treaties and without prejudice to any of their speci fi c provisions, any discrimination on grounds of nationality shall be prohibited. • Commercial argument • Your customers will loose trust in your AI-based service • Moral argument • Come on! Why should we let humans to be discriminated by AI. 10

Slide 11

Slide 11 text

Can we measure gender bias in LLMs? 11 She is a nurse He is a nurse 0.9 0.4 Likelihood If an LLM assigns higher likelihood score to one sentence than the other, it is considered to be preferring one gender over the other, hence gender biased. She/He is a nurse Likelihood She: (0.9+0.8+0.6+0.7) / 4 He: (0.8+0.9+0.5+0.6) / 4 Diff 0.2 All Unmasked Likelihood (AUL) score = Percentage of sentence-pairs where the male version has a higher likelihood than the female version Kaneko et al. [AAAI 2022]

Slide 12

Slide 12 text

Multi-lingual Bias Evaluation • Social biases are not limited to English. However, compared to bias evaluation datasets annotated for English, datasets available for other languages are limited. • Annotating datasets eliciting social biases for each language is costly, time consuming, and it might even be di ff i cult to recruit annotators for this purpose. • We proposed a multi-lingual bias evaluation measure using existing parallel translation data. 12 Kaneko et al. [NAACL 2022]

Slide 13

Slide 13 text

Which bias evaluation measure? • Di ff erent intrinsic bias evaluation measures are proposed, which use di ff erent evaluation datasets and criteria • CrowS-Pairs [Nangia et al.], StereoSets [Nadeem et al.], AUL/AULA [Kaneko et al.], Template-based Scores (TBS) [Kurita et al.] 
 • How can we know which intrinsic bias evaluation measure is most appropriate? 13 Kaneko et al. [EACL 2023]

Slide 14

Slide 14 text

Bias controlling in LLMs 14 Figure 2: Average output probabilities for “[MASK] is a/an [Occupation]” produced by the bias-controlled BERT and ALBERT PLMs fine-tuned with different r on the news dataset. (a) r = 1.0 (b) r = 0.7 (c) r = 0.5 [MASK] doesn’t have time for the family due to work obligations.

Slide 15

Slide 15 text

Gender-biases in LLMs 15 Measure BERT ALBERT news book HA news book HA TBS 0.14 0.09 - 0.25 0.14 - SSS 0.22 0.22 0.45 0.31 0.22 0.53 CPS 0.30 0.27 0.57 0.37 0.22 0.48 AUL 0.37 0.32 0.68 0.55 0.36 0.56 AULA 0.42 0.34 0.71 0.60 0.42 0.57 Table 1: Peason correlation between biased PLM order and each bias scores. News and book represent the cor- pus used for biasing, respectively. HA is AUC value of of the proposed m several PLMs and the proposed met tion results of the CPS, AUL, and A ALBERT on new HA is the AUC va (2022)’s method TBS uses templa HA. Peason correlation between biased PLM order and each bias scores. News and book represent the corpus used for biasing, respectively. HA is AUC value of method using human annotation. • For BERT, the proposed method induces the same order among measures 
 (i.e. AULA > AUL > CPS > SSS) as done by HA in both news and book. • For ALBERT, only the rankings of SSS and CPS di ff er between the proposed method and HA. • These results show that the proposed method and the existing method that use human annotations rank the intrinsic gender bias evaluation measures in almost the same order.

Slide 16

Slide 16 text

Debiasing isn’t enough! • Intrinsic bias scores: • Evaluates social biases in LLMs on their own right, independently of any downstream tasks • Extrinsic bias scores: • Evaluates social biases in LLMs when they are applied to solve a speci fi c downstream task such as Natural Language Inference (NLI), Semantic Textual Similarity (STS) or predicting occupations from biographies (BiasBios). • The correlation between intrinsic vs. extrinsic bias scores is weak 16 Kaneko et al. [COLING 2022]

Slide 17

Slide 17 text

Intrinsic vs. Extrinsic Bias Scores 17 -10 -5 0 5 bert-bu bert-lu bert-bc bert-lc roberta-b roberta-l albert-b AT CDA DO (a) SSS -10 -5 0 5 bert-bu bert-lu bert-bc bert-lc roberta-b roberta-l albert-b AT CDA DO (b) CPS -10 -5 0 5 bert-bu bert-lu bert-bc bert-lc roberta-b roberta-l albert-b AT CDA DO (c) AULA -0.5 0.0 0.5 1.0 1.5 bert-bu bert-lu bert-bc bert-lc roberta-b roberta-l albert-b AT CDA DO (d) BiasBios -0.2 -0.1 0.0 0.1 0.2 bert-bu bert-lu bert-bc bert-lc roberta-b roberta-l albert-b AT CDA DO (e) STS-bias -20 -10 0 10 20 bert-bu bert-lu bert-bc bert-lc roberta-b roberta-l albert-b AT CDA DO (f) NLI-bias Figure 1: Differences between the bias scores of original vs. debiased MLMs. Negative values indicate that the debiased MLM has a lower bias than its original (non-debiased) version. Di ff erences between the bias scores of original vs. debiased MLMs. Negative values indicate that the debiased MLM has a lower bias than its original (non-debiased) version.

Slide 18

Slide 18 text

We are not done… (yet) • Debiased LLMs when fi ne-tuned for downstream tasks, can sometimes relearn the social biases! [Kaneko et al. COLING 2022] • Sometimes when we combine (meta-embed) debiased embeddings, they again become biased! [Kaneko et al. EMNLP 2022] • When we debias LLMs, we loose pe rf ormance on downstream tasks • Evaluating social biases across languages and cultures is hard (and no annotated datasets are available) • Methods for automatically evaluating multilingual biases [Kaneko et al. EACL 2023] 18

Slide 19

Slide 19 text

Al learns from its mistakes (unlike humans) 19

Slide 20

Slide 20 text

20 Questions Danushka Bollegala h tt ps:// [email protected] @Bollegala Th ank Y o