Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building Responsible AI with Generative Models

Building Responsible AI with Generative Models

During this talk, we will explore the ethical and technical aspects of building AI systems responsibly using generative models.

Key insights will include bias mitigation, model transparency, and regulatory compliance. Attendees will learn how to create fair, interpretable, and accountable AI systems, with insights drawn from real-world case studies.

This talk emphasizes the importance of human oversight and aligning AI outputs with societal values.

Key Takeaways:
- Ethical frameworks for responsible AI
- Bias reduction in generative models
- Model transparency and interpretability
- Compliance with AI regulations
- Human-in-the-loop system design
- Real-world responsible AI use cases

Wesley Kambale

November 16, 2024
Tweet

More Decks by Wesley Kambale

Other Decks in Programming

Transcript

  1. The Power and Peril of Generative AI Opportunities and Unique

    Challenges Principles of Building Responsible AI Doing it the Gemma models’ way Tools and Resources for Building Responsible AI The Agenda…
  2. Breakthrough performance in reasoning, math, science, and language-related tasks 3

    Creative Potential Generate text, code, audio, images, videos, etc.…which can have a big impact on unlocking creative potential 1 Democratization more people can prototype new AI applications, even without writing any code 2 How has the industry been transformed… Scaling up model size and training data has unlocked powerful capabilities, allowing models to:
  3. Be made available for uses that accord with these principles

    7 Uphold high standards of scientific excellence Google’s AI principles Be socially beneficial 1 Avoid creating or reinforcing unfair bias 2 Be built and tested for safety 3 Be accountable to people 4 Incorporate privacy design principles 5 6 7 goo.gle/rai-principles
  4. Generative AI Ecosystem 10 Pretrained Model user input product output

    GenAI Model GenAI Product Responsible Generation pre-training data user feedback Safe Guards Safe guards Adversarial Testing
  5. Designing for Responsibility 11 Built-in model safety & fairness Generate

    content that is safe and supportive of diverse voices & cultures Align with GenAI content policies Standardized content safety policies Input & output safeguards Detect and avoid showing harmful content to users Adversarial testing & eval Assess model performance with high-quality data to measure risks of failures Data for safety tuning & evals Use or generate high-quality data for safety training, fine-tuning and evaluations Simple, Helpful Explanations Transparency, feedback and user control
  6. Pre-training Data Built-in tuning for “non-negotiable” harms • Fine tuning

    with Synthetic Data: “Constitutional AI”, chain- of-thought, NPoV • Improve model equity and inclusion: apply socio technical research • Reinforcement Learning for more nuanced replies Model Tuning Inference Time Solutions 12 Finer-grained control for steering and configurability (exploratory) • Control tokens prefixed to text at FT (medical advice) • Controlled decoding - decoder augmented by re-ranking to steer output • LoRA for lightweight, sample- efficient tuning Balance data and identify potential for downstream risks • Dataset permissions • Responsible Data analysis • Data cleaning and filtering • Collect/augment data to reflect diverse perspectives Pretrained Model product output user input GenAI Model Responsible Generation Pre-training data user feedback Safe Guards Safe guards Adversarial Testing Model Mitigations
  7. • Policy-aligned • Leverage LLM capabilities • Tailored to GenAI

    • Multimodal with context • Multilingual with localization Input and Output Safeguards 13 Pretrained Model product output user input GenAI Model Responsible Generation Pre-training data user feedback Safe Guards Safe guards Adversarial Testing Safeguards
  8. • Fairness and safety testing • Red Team: Security Testing

    • Abuse Red Team: Abuse testing Red Teaming Pretrained Model product output user input GenAI Model Responsible Generation Pre-training data user feedback Safe Guards Safe guards Adversarial Testing Synthetic Data Generation • Human-seeded, leverage LLMs • Data Quality and Metrics in the Loop • Multicultural and multilingual Evaluation • Autoraters for scaling with humans for deep expertise Community Engagement • Consult community experts - does the model work for their communities • Global and diverse data collection • External data challenges (Adversarial Nibbler) Adversarial Testing Adversarial Testing
  9. What does it mean to commit to fairness? Use data

    that represents different groups of users Consider bias in the data collection and evaluation process
  10. Harms Taxonomy Allocative Harms Opportunity loss Economic loss Quality of

    Service Harms Alienation Increased labor Service/benefit loss Social System Harms Information harms Cultural harms Political harms Socio-economic harms Environmental harms Harms Denying opportunity to self-identify Reifying social groups Stereotyping Erasing social groups Alienating social groups Representational Harms Loss of agency or social control Tech-facilitated violence Diminished health & well-being Privacy violations Interpersonal Source: Shelby et al., Identifying Sociotechnical Harms of Algorithmic Systems: Scoping a Taxonomy for Harm Reduction
  11. Dimensions Accountability Utility & Use Risk & Impact & of

    Use Quality Recommendations Consequences
  12. Ensuring fairness and mitigating bias in AI starts when problems

    are defined. Age Culture Disability Education & Literacy Global Relevance Gender Physical Attributes Ethnicity Religion Sexual Orientation Socioeconomic Status Technological Proficiency
  13. Simple, helpful explanations Be clear with users that they’re engaging

    with a new, experimental generative AI technology Offer thorough documentation on how the GenAI service or product works Maintain transparency (e.g., Model Cards, Data Cards) Show people how they can offer feedback, and how they’re in control • “Google It” button provides Search queries to help users validate fact-based questions • Thumbs up and down icons for feedback • Links to report problems and offer support for rapid response to user feedback • User control for storing or deleting Bard activity 1 2 3 4
  14. Responsible AI Throughout GenAI Lifecycle How: Consult the community! Community-based

    experts, communities themselves, authoritative sources Define problem Consult community- based experts Create policy definitions with diverse communities Select pre-training data Analyze and remediate for fairness Collect global & regional diverse data Build model Improve representation using built-in and inference time capabilities User feedback Seek community input Evaluate Address fairness & inclusion Capture rater disagreement as a feature Evaluate data quality for diversity
  15. Safety by design Filter out certain personal information and other

    sensitive data. Supervise fine tuning and reinforcement learning with human feedback for safety. Filtered pre-training data Safety tuning
  16. Transparent & robust evaluations Human rater have compared Gemma and

    Mistral over 400 prompts testing basic for safety protocols. Gemma results on 9 academic authoritative safety and fairness benchmarks. Human SxS Academic benchmarks Advanced red teaming. Manual testing for advanced capabilities (e.g., chemical, biological weapon development). Internal safety evaluations Tested for safety, privacy, societal risks, data memorization and dangerous capabilities.
  17. Responsible Generative AI Toolkit A hate speech classifier. Methodology to

    build any classifier with limited data points. The first LLM prompt-debugger, based on saliency methods. Safety classifiers Model debugging Guidance on developing responsible models. RAI guidance
  18. Apply Responsible AI best practices Standardized content safety policies. Align

    with GenAI content policies Generate content that is safe and supportive of diverse voices & cultures. Model mitigations Detect and avoid showing harmful content to users. Input and output filtering Assess model performance to measure risks of failures with high-quality data. Adversarial testing and eval Transparency, feedback and user control. Simple, helpful explorations