Building Responsible AI with Generative Models

Slide 1

Slide 1 text

Building Responsible AI with Generative Models Kigali 2024 Wesley Kambale kambale.dev

Slide 2

Slide 2 text

The Power and Peril of Generative AI Opportunities and Unique Challenges Principles of Building Responsible AI Doing it the Gemma models’ way Tools and Resources for Building Responsible AI The Agenda…

Slide 3

Slide 3 text

whoami@bio ~ %

Slide 4

Slide 4 text

The Power and Peril of Generative AI Kigali 2024

Slide 5

Slide 5 text

Asking Gemini about DevFest Kigali AI-generated image of attendees at DevFest Kigali What is GenAI?

Slide 6

Slide 6 text

Breakthrough performance in reasoning, math, science, and language-related tasks 3 Creative Potential Generate text, code, audio, images, videos, etc.…which can have a big impact on unlocking creative potential 1 Democratization more people can prototype new AI applications, even without writing any code 2 How has the industry been transformed… Scaling up model size and training data has unlocked powerful capabilities, allowing models to:

Slide 7

Slide 7 text

However, applications using these models can also exhibit harmful behaviors from all the data they are trained on…

Slide 8

Slide 8 text

Principles for building Responsible AI

Slide 9

Slide 9 text

Be made available for uses that accord with these principles 7 Uphold high standards of scientific excellence Google’s AI principles Be socially beneficial 1 Avoid creating or reinforcing unfair bias 2 Be built and tested for safety 3 Be accountable to people 4 Incorporate privacy design principles 5 6 7 goo.gle/rai-principles

Slide 10

Slide 10 text

Generative AI Ecosystem 10 Pretrained Model user input product output GenAI Model GenAI Product Responsible Generation pre-training data user feedback Safe Guards Safe guards Adversarial Testing

Slide 11

Slide 11 text

Designing for Responsibility 11 Built-in model safety & fairness Generate content that is safe and supportive of diverse voices & cultures Align with GenAI content policies Standardized content safety policies Input & output safeguards Detect and avoid showing harmful content to users Adversarial testing & eval Assess model performance with high-quality data to measure risks of failures Data for safety tuning & evals Use or generate high-quality data for safety training, fine-tuning and evaluations Simple, Helpful Explanations Transparency, feedback and user control

Slide 12

Slide 12 text

Pre-training Data Built-in tuning for “non-negotiable” harms ● Fine tuning with Synthetic Data: “Constitutional AI”, chain- of-thought, NPoV ● Improve model equity and inclusion: apply socio technical research ● Reinforcement Learning for more nuanced replies Model Tuning Inference Time Solutions 12 Finer-grained control for steering and configurability (exploratory) ● Control tokens prefixed to text at FT (medical advice) ● Controlled decoding - decoder augmented by re-ranking to steer output ● LoRA for lightweight, sample- efficient tuning Balance data and identify potential for downstream risks ● Dataset permissions ● Responsible Data analysis ● Data cleaning and filtering ● Collect/augment data to reflect diverse perspectives Pretrained Model product output user input GenAI Model Responsible Generation Pre-training data user feedback Safe Guards Safe guards Adversarial Testing Model Mitigations

Slide 13

Slide 13 text

● Policy-aligned ● Leverage LLM capabilities ● Tailored to GenAI ● Multimodal with context ● Multilingual with localization Input and Output Safeguards 13 Pretrained Model product output user input GenAI Model Responsible Generation Pre-training data user feedback Safe Guards Safe guards Adversarial Testing Safeguards

Slide 14

Slide 14 text

● Fairness and safety testing ● Red Team: Security Testing ● Abuse Red Team: Abuse testing Red Teaming Pretrained Model product output user input GenAI Model Responsible Generation Pre-training data user feedback Safe Guards Safe guards Adversarial Testing Synthetic Data Generation ● Human-seeded, leverage LLMs ● Data Quality and Metrics in the Loop ● Multicultural and multilingual Evaluation ● Autoraters for scaling with humans for deep expertise Community Engagement ● Consult community experts - does the model work for their communities ● Global and diverse data collection ● External data challenges (Adversarial Nibbler) Adversarial Testing Adversarial Testing

Slide 15

Slide 15 text

What does it mean to commit to fairness? Use data that represents different groups of users Consider bias in the data collection and evaluation process

Slide 16

Slide 16 text

Harms Taxonomy Allocative Harms Opportunity loss Economic loss Quality of Service Harms Alienation Increased labor Service/benefit loss Social System Harms Information harms Cultural harms Political harms Socio-economic harms Environmental harms Harms Denying opportunity to self-identify Reifying social groups Stereotyping Erasing social groups Alienating social groups Representational Harms Loss of agency or social control Tech-facilitated violence Diminished health & well-being Privacy violations Interpersonal Source: Shelby et al., Identifying Sociotechnical Harms of Algorithmic Systems: Scoping a Taxonomy for Harm Reduction

Slide 17

Slide 17 text

Dimensions Accountability Utility & Use Risk & Impact & of Use Quality Recommendations Consequences

Slide 18

Slide 18 text

Ensuring fairness and mitigating bias in AI starts when problems are defined. Age Culture Disability Education & Literacy Global Relevance Gender Physical Attributes Ethnicity Religion Sexual Orientation Socioeconomic Status Technological Proficiency

Slide 19

Slide 19 text

Simple, helpful explanations Be clear with users that they’re engaging with a new, experimental generative AI technology Offer thorough documentation on how the GenAI service or product works Maintain transparency (e.g., Model Cards, Data Cards) Show people how they can offer feedback, and how they’re in control ● “Google It” button provides Search queries to help users validate fact-based questions ● Thumbs up and down icons for feedback ● Links to report problems and offer support for rapid response to user feedback ● User control for storing or deleting Bard activity 1 2 3 4

Slide 20

Slide 20 text

Responsible AI Throughout GenAI Lifecycle How: Consult the community! Community-based experts, communities themselves, authoritative sources Define problem Consult community- based experts Create policy definitions with diverse communities Select pre-training data Analyze and remediate for fairness Collect global & regional diverse data Build model Improve representation using built-in and inference time capabilities User feedback Seek community input Evaluate Address fairness & inclusion Capture rater disagreement as a feature Evaluate data quality for diversity

Slide 21

Slide 21 text

Where do we go from here?

Slide 22

Slide 22 text

Gemma: The responsible way to build

Slide 23

Slide 23 text

Build and deploy responsibly

Slide 24

Slide 24 text

Will it be socially beneficial? Could it lead to harm in any way?

Slide 25

Slide 25 text

Gemma’s approach to responsible AI Responsible Generative AI Toolkit Safety by design Transparent and robust evaluations

Slide 26

Slide 26 text

Safety by design Filter out certain personal information and other sensitive data. Supervise fine tuning and reinforcement learning with human feedback for safety. Filtered pre-training data Safety tuning

Slide 27

Slide 27 text

Transparent & robust evaluations Human rater have compared Gemma and Mistral over 400 prompts testing basic for safety protocols. Gemma results on 9 academic authoritative safety and fairness benchmarks. Human SxS Academic benchmarks Advanced red teaming. Manual testing for advanced capabilities (e.g., chemical, biological weapon development). Internal safety evaluations Tested for safety, privacy, societal risks, data memorization and dangerous capabilities.

Slide 28

Slide 28 text

The Responsible Generative AI Toolkit

Slide 29

Slide 29 text

Responsible Generative AI Toolkit A hate speech classifier. Methodology to build any classifier with limited data points. The first LLM prompt-debugger, based on saliency methods. Safety classifiers Model debugging Guidance on developing responsible models. RAI guidance

Slide 30

Slide 30 text

Apply Responsible AI best practices Standardized content safety policies. Align with GenAI content policies Generate content that is safe and supportive of diverse voices & cultures. Model mitigations Detect and avoid showing harmful content to users. Input and output filtering Assess model performance to measure risks of failures with high-quality data. Adversarial testing and eval Transparency, feedback and user control. Simple, helpful explorations

Slide 31

Slide 31 text

▪ Toolkit: goo.gle/rai-toolkit ▪ Developers blog: goo.gle/rai-blog ▪ Principles: goo.gle/rai-principles ▪ Codelabs: codelabs.developers.google.com Resources and tools