Building Responsible AI with Generative Models

Building Responsible AI with Generative Models Kigali 2024 Wesley Kambale
kambale.dev

The Power and Peril of Generative AI Opportunities and Unique
Challenges Principles of Building Responsible AI Doing it the Gemma models’ way Tools and Resources for Building Responsible AI The Agenda…

whoami@bio ~ %

The Power and Peril of Generative AI Kigali 2024

Asking Gemini about DevFest Kigali AI-generated image of attendees at
DevFest Kigali What is GenAI?

Breakthrough performance in reasoning, math, science, and language-related tasks 3
Creative Potential Generate text, code, audio, images, videos, etc.…which can have a big impact on unlocking creative potential 1 Democratization more people can prototype new AI applications, even without writing any code 2 How has the industry been transformed… Scaling up model size and training data has unlocked powerful capabilities, allowing models to:

However, applications using these models can also exhibit harmful behaviors
from all the data they are trained on…

Principles for building Responsible AI

Be made available for uses that accord with these principles
7 Uphold high standards of scientific excellence Google’s AI principles Be socially beneficial 1 Avoid creating or reinforcing unfair bias 2 Be built and tested for safety 3 Be accountable to people 4 Incorporate privacy design principles 5 6 7 goo.gle/rai-principles

Generative AI Ecosystem 10 Pretrained Model user input product output
GenAI Model GenAI Product Responsible Generation pre-training data user feedback Safe Guards Safe guards Adversarial Testing

Designing for Responsibility 11 Built-in model safety & fairness Generate
content that is safe and supportive of diverse voices & cultures Align with GenAI content policies Standardized content safety policies Input & output safeguards Detect and avoid showing harmful content to users Adversarial testing & eval Assess model performance with high-quality data to measure risks of failures Data for safety tuning & evals Use or generate high-quality data for safety training, fine-tuning and evaluations Simple, Helpful Explanations Transparency, feedback and user control

Pre-training Data Built-in tuning for “non-negotiable” harms • Fine tuning
with Synthetic Data: “Constitutional AI”, chain- of-thought, NPoV • Improve model equity and inclusion: apply socio technical research • Reinforcement Learning for more nuanced replies Model Tuning Inference Time Solutions 12 Finer-grained control for steering and configurability (exploratory) • Control tokens prefixed to text at FT (medical advice) • Controlled decoding - decoder augmented by re-ranking to steer output • LoRA for lightweight, sample- efficient tuning Balance data and identify potential for downstream risks • Dataset permissions • Responsible Data analysis • Data cleaning and filtering • Collect/augment data to reflect diverse perspectives Pretrained Model product output user input GenAI Model Responsible Generation Pre-training data user feedback Safe Guards Safe guards Adversarial Testing Model Mitigations

• Policy-aligned • Leverage LLM capabilities • Tailored to GenAI
• Multimodal with context • Multilingual with localization Input and Output Safeguards 13 Pretrained Model product output user input GenAI Model Responsible Generation Pre-training data user feedback Safe Guards Safe guards Adversarial Testing Safeguards

• Fairness and safety testing • Red Team: Security Testing
• Abuse Red Team: Abuse testing Red Teaming Pretrained Model product output user input GenAI Model Responsible Generation Pre-training data user feedback Safe Guards Safe guards Adversarial Testing Synthetic Data Generation • Human-seeded, leverage LLMs • Data Quality and Metrics in the Loop • Multicultural and multilingual Evaluation • Autoraters for scaling with humans for deep expertise Community Engagement • Consult community experts - does the model work for their communities • Global and diverse data collection • External data challenges (Adversarial Nibbler) Adversarial Testing Adversarial Testing

What does it mean to commit to fairness? Use data
that represents different groups of users Consider bias in the data collection and evaluation process

Harms Taxonomy Allocative Harms Opportunity loss Economic loss Quality of
Service Harms Alienation Increased labor Service/benefit loss Social System Harms Information harms Cultural harms Political harms Socio-economic harms Environmental harms Harms Denying opportunity to self-identify Reifying social groups Stereotyping Erasing social groups Alienating social groups Representational Harms Loss of agency or social control Tech-facilitated violence Diminished health & well-being Privacy violations Interpersonal Source: Shelby et al., Identifying Sociotechnical Harms of Algorithmic Systems: Scoping a Taxonomy for Harm Reduction

Dimensions Accountability Utility & Use Risk & Impact & of
Use Quality Recommendations Consequences

Ensuring fairness and mitigating bias in AI starts when problems
are defined. Age Culture Disability Education & Literacy Global Relevance Gender Physical Attributes Ethnicity Religion Sexual Orientation Socioeconomic Status Technological Proficiency

Simple, helpful explanations Be clear with users that they’re engaging
with a new, experimental generative AI technology Offer thorough documentation on how the GenAI service or product works Maintain transparency (e.g., Model Cards, Data Cards) Show people how they can offer feedback, and how they’re in control • “Google It” button provides Search queries to help users validate fact-based questions • Thumbs up and down icons for feedback • Links to report problems and offer support for rapid response to user feedback • User control for storing or deleting Bard activity 1 2 3 4

Responsible AI Throughout GenAI Lifecycle How: Consult the community! Community-based
experts, communities themselves, authoritative sources Define problem Consult community- based experts Create policy definitions with diverse communities Select pre-training data Analyze and remediate for fairness Collect global & regional diverse data Build model Improve representation using built-in and inference time capabilities User feedback Seek community input Evaluate Address fairness & inclusion Capture rater disagreement as a feature Evaluate data quality for diversity

Where do we go from here?

Gemma: The responsible way to build

Build and deploy responsibly

Will it be socially beneficial? Could it lead to harm
in any way?

Gemma’s approach to responsible AI Responsible Generative AI Toolkit Safety
by design Transparent and robust evaluations

Safety by design Filter out certain personal information and other
sensitive data. Supervise fine tuning and reinforcement learning with human feedback for safety. Filtered pre-training data Safety tuning

Transparent & robust evaluations Human rater have compared Gemma and
Mistral over 400 prompts testing basic for safety protocols. Gemma results on 9 academic authoritative safety and fairness benchmarks. Human SxS Academic benchmarks Advanced red teaming. Manual testing for advanced capabilities (e.g., chemical, biological weapon development). Internal safety evaluations Tested for safety, privacy, societal risks, data memorization and dangerous capabilities.

The Responsible Generative AI Toolkit

Responsible Generative AI Toolkit A hate speech classifier. Methodology to
build any classifier with limited data points. The first LLM prompt-debugger, based on saliency methods. Safety classifiers Model debugging Guidance on developing responsible models. RAI guidance

Apply Responsible AI best practices Standardized content safety policies. Align
with GenAI content policies Generate content that is safe and supportive of diverse voices & cultures. Model mitigations Detect and avoid showing harmful content to users. Input and output filtering Assess model performance to measure risks of failures with high-quality data. Adversarial testing and eval Transparency, feedback and user control. Simple, helpful explorations

▪ Toolkit: goo.gle/rai-toolkit ▪ Developers blog: goo.gle/rai-blog ▪ Principles: goo.gle/rai-principles
▪ Codelabs: codelabs.developers.google.com Resources and tools

Proprietary + Conﬁdential Safe AI is a Better AI for
Everyone

Thank you! Kigali 2024 @weskambale kambale.dev Wesley Kambale

Building Responsible AI with Generative Models

Building Responsible AI with Generative Models

Wesley Kambale

More Decks by Wesley Kambale

Other Decks in Programming

Featured

Transcript