Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Open Data Science Conference 2023 -- Invited Talk

Open Data Science Conference 2023 -- Invited Talk

Generative AI (GAI) systems such as ChatGPT have revolutionised the way we interact with AI systems. These models can provide precise and detailed answers to our information needs, expressed in the form of brief text-based prompts. However, some of the responses generated by the GAI systems can contain harmful social biases such as gender or racial biases. Detecting and mitigating such biased responses is an important step towards establishing user trust in GAI. In this talk, I will describe the latest developments in methodologies that can be used to detect social biases in texts generated by GAI systems. In particular, I will describe methods that can be used to detect social biases expressed not only in English but other languages as well, with minimal human intervention. This is particularly important when scaling social bias evaluation for many languages. Second, I will describe methods that can be used to mitigate the identified social biases in large-scale language models. Experiments show that although some of the social biases can be identified and mitigated with high accuracy, the existing techniques are not perfect and indirect associations remain in the generative NLP models. Finally, I will describe on-going work in the NLP community to address these shortcomings and develop not only accurate but also trustworthy AI systems for the future.

Danushka Bollegala

June 14, 2023
Tweet

More Decks by Danushka Bollegala

Other Decks in Research

Transcript

  1. Towards Socially Unbiased Generative A rt i fi cial Intelligence

    Professor Danushka Bollegala University of Liverpool
  2. Is AI Socially Biased? 2 Images created by DALL-E (OpenAI)

    h tt ps://www.vice.com/en/a rt icle/wxdawn/the-ai-that-draws-what-you-type-is-very-racist-shocking-no-one
  3. Is AI Socially Biased? 4 Images generated by Stable Di

    ff usion (Stability.AI) “Janitor” “Asse rt ive Fire fi ghter” h tt ps://techpolicy.press/researchers- fi nd-stable-di ff usion-ampli fi es-stereotypes/
  4. The Wisdom of Crowd a Few • 50% of all

    a rt icles in Wikipedia (at the sta rt ) was wri tt en by 0.04% (ca. 2000) of its editors. • Only 4% of all active users at Amazon write product reviews • 50% of all websites are in English, whereas there are only 5% of native English speakers (becomes 13% if we add non-natives) in the World. • Only 7% of Facebook users produce 50% of the posts • 0.05% of most popular people a tt racts (followed by) 50% of the Twi tt er users • Zip’s Least E ff o rt Principle — many people do only a li tt le while few people do a lot 8
  5. But it is not OK for AI to be biased!

    • Legal argument • Title VII of the Civil Rights Act of 1964 in the UK • Prohibits employment discrimination due to race, religion, gender and ethnicity • EU Cha rt er of Fundamental Rights, Title III (Equality), A rt icle 21 Non-discrimination • 1. Any discrimination based on any ground such as sex, race, colour, ethnic or social origin, genetic features, language, religion or belief, political or any other opinion, membership of a national minority, prope rt y, bi rt h, disability, age or sexual orientation shall be prohibited. • 2. Within the scope of application of the Treaties and without prejudice to any of their speci fi c provisions, any discrimination on grounds of nationality shall be prohibited. • Commercial argument • Your customers will loose trust in your AI-based service • Moral argument • Come on! Why should we let humans to be discriminated by AI. 10
  6. Can we measure gender bias in LLMs? 11 She is

    a nurse He is a nurse 0.9 0.4 Likelihood If an LLM assigns higher likelihood score to one sentence than the other, it is considered to be preferring one gender over the other, hence gender biased. She/He is a nurse Likelihood She: (0.9+0.8+0.6+0.7) / 4 He: (0.8+0.9+0.5+0.6) / 4 Diff 0.2 All Unmasked Likelihood (AUL) score = Percentage of sentence-pairs where the male version has a higher likelihood than the female version Kaneko et al. [AAAI 2022]
  7. Multi-lingual Bias Evaluation • Social biases are not limited to

    English. However, compared to bias evaluation datasets annotated for English, datasets available for other languages are limited. • Annotating datasets eliciting social biases for each language is costly, time consuming, and it might even be di ff i cult to recruit annotators for this purpose. • We proposed a multi-lingual bias evaluation measure using existing parallel translation data. 12 Kaneko et al. [NAACL 2022]
  8. Which bias evaluation measure? • Di ff erent intrinsic bias

    evaluation measures are proposed, which use di ff erent evaluation datasets and criteria • CrowS-Pairs [Nangia et al.], StereoSets [Nadeem et al.], AUL/AULA [Kaneko et al.], Template-based Scores (TBS) [Kurita et al.] 
 • How can we know which intrinsic bias evaluation measure is most appropriate? 13 Kaneko et al. [EACL 2023]
  9. Bias controlling in LLMs 14 Figure 2: Average output probabilities

    for “[MASK] is a/an [Occupation]” produced by the bias-controlled BERT and ALBERT PLMs fine-tuned with different r on the news dataset. (a) r = 1.0 (b) r = 0.7 (c) r = 0.5 [MASK] doesn’t have time for the family due to work obligations.
  10. Gender-biases in LLMs 15 Measure BERT ALBERT news book HA

    news book HA TBS 0.14 0.09 - 0.25 0.14 - SSS 0.22 0.22 0.45 0.31 0.22 0.53 CPS 0.30 0.27 0.57 0.37 0.22 0.48 AUL 0.37 0.32 0.68 0.55 0.36 0.56 AULA 0.42 0.34 0.71 0.60 0.42 0.57 Table 1: Peason correlation between biased PLM order and each bias scores. News and book represent the cor- pus used for biasing, respectively. HA is AUC value of of the proposed m several PLMs and the proposed met tion results of the CPS, AUL, and A ALBERT on new HA is the AUC va (2022)’s method TBS uses templa HA. Peason correlation between biased PLM order and each bias scores. News and book represent the corpus used for biasing, respectively. HA is AUC value of method using human annotation. • For BERT, the proposed method induces the same order among measures 
 (i.e. AULA > AUL > CPS > SSS) as done by HA in both news and book. • For ALBERT, only the rankings of SSS and CPS di ff er between the proposed method and HA. • These results show that the proposed method and the existing method that use human annotations rank the intrinsic gender bias evaluation measures in almost the same order.
  11. Debiasing isn’t enough! • Intrinsic bias scores: • Evaluates social

    biases in LLMs on their own right, independently of any downstream tasks • Extrinsic bias scores: • Evaluates social biases in LLMs when they are applied to solve a speci fi c downstream task such as Natural Language Inference (NLI), Semantic Textual Similarity (STS) or predicting occupations from biographies (BiasBios). • The correlation between intrinsic vs. extrinsic bias scores is weak 16 Kaneko et al. [COLING 2022]
  12. Intrinsic vs. Extrinsic Bias Scores 17 -10 -5 0 5

    bert-bu bert-lu bert-bc bert-lc roberta-b roberta-l albert-b AT CDA DO (a) SSS -10 -5 0 5 bert-bu bert-lu bert-bc bert-lc roberta-b roberta-l albert-b AT CDA DO (b) CPS -10 -5 0 5 bert-bu bert-lu bert-bc bert-lc roberta-b roberta-l albert-b AT CDA DO (c) AULA -0.5 0.0 0.5 1.0 1.5 bert-bu bert-lu bert-bc bert-lc roberta-b roberta-l albert-b AT CDA DO (d) BiasBios -0.2 -0.1 0.0 0.1 0.2 bert-bu bert-lu bert-bc bert-lc roberta-b roberta-l albert-b AT CDA DO (e) STS-bias -20 -10 0 10 20 bert-bu bert-lu bert-bc bert-lc roberta-b roberta-l albert-b AT CDA DO (f) NLI-bias Figure 1: Differences between the bias scores of original vs. debiased MLMs. Negative values indicate that the debiased MLM has a lower bias than its original (non-debiased) version. Di ff erences between the bias scores of original vs. debiased MLMs. Negative values indicate that the debiased MLM has a lower bias than its original (non-debiased) version.
  13. We are not done… (yet) • Debiased LLMs when fi

    ne-tuned for downstream tasks, can sometimes relearn the social biases! [Kaneko et al. COLING 2022] • Sometimes when we combine (meta-embed) debiased embeddings, they again become biased! [Kaneko et al. EMNLP 2022] • When we debias LLMs, we loose pe rf ormance on downstream tasks • Evaluating social biases across languages and cultures is hard (and no annotated datasets are available) • Methods for automatically evaluating multilingual biases [Kaneko et al. EACL 2023] 18