Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Formation au testing des Modèles

etalab-ia
November 08, 2022
46

Formation au testing des Modèles

Formation au testing des modèles de machine learning, animée par Jean-Marie John-Mathews, co-fondateur de Giskard.

etalab-ia

November 08, 2022
Tweet

Transcript

  1. Quality Assurance for AI
    Etalab - Programme 10%

    View Slide

  2. Quality is essential to all industries
    2
    🏦 Finance
    Audit
    🍒 Food
    Nutri-score
    🏭 Manufacturing
    ISO 9000
    💾 Software
    Test-Driven Development

    View Slide

  3. But the AI industry is struggling with incidents
    3
    ⚖ Ethics 🏦 Business 🛡 Security

    View Slide

  4. 4
    The shift:
    AI leaders & regulators
    are pushing for quality
    In the end, I only trust thorough testing.
    You can try to explain to me why a jet
    engine is reliable. But in the end, trust will
    come from looking at crash statistics.
    Yann LeCun, Chief AI Scientist
    Invented Convolutional Neural Networks

    Providers of high-risk AI systems shall put a
    quality management system to ensure
    compliance.
    Article 17, European AI Act
    Proposed in 2021 for approval in 2023

    View Slide

  5. AI quality is correlated with revenue
    6%
    Share of sales revenue
    AI quality penalty risk in the
    upcoming 󰎾 AI regulation
    5

    View Slide

  6. Quality cures growing pains of AI teams
    6
    📔
    Prototype

    Production
    🚀
    Deployment
    Data Scientists
    spend months going back
    and forth with business
    stakeholders to validate
    model quality
    3-6 months
    to evaluate an AI prototype
    ML Engineers
    become firefighters after
    users report bugs which
    overwhelm Data Scientists
    with maintenance
    1,342
    AI incidents reported by
    the AI Incident DB
    ML Engineers
    struggle to implement AI
    model tests which cover all
    quality criteria and test
    cases
    87%
    of AI models never make it
    into production
    Sources: VentureBeat, AI Incident Database

    View Slide

  7. AI quality brings unique technical challenges
    7
    ML quality ≠
    data quality
    ✖ Clean data and bad
    model
    ✖ Errors are caused by
    data that did not exist
    when the model was
    created
    ML testing ≠
    software testing
    ✖ ML cannot be broken
    down into unit
    components
    ✖ Probabilistic
    ✖ Constant updating
    Quality Criteria
    to consider
    ✖ Business: performance,
    robustness
    ✖ Tech: security, reliability
    ✖ Ethics: fairness, privacy
    ✖ Environment: carbon
    impact, efficiency

    View Slide

  8. Behavioral Testing for NLP (Ribeiro et al., 2020)

    View Slide

  9. “Capabilities”, a solution to address NLP Testing
    Capabilities
    ✓ Vocabulary (important words or word
    types for the task)
    ✓ Taxonomy (synonyms, antonyms, etc)
    ✓ Robustness (to typos, irrelevant
    changes, etc)
    ✓ NER (appropriately understanding
    named entities)
    ✓ Fairness
    ✓ Temporal (understanding order of
    events)
    ✓ Negation
    ✓ Logic (ability to handle symmetry,
    consistency, and conjunctions)

    View Slide

  10. How to generate capabilities tests?
    Test types
    1. Minimum Functionality
    - Simple test cases designed to target a
    specific behavior
    - Ex: When “extraordinary” is in the
    sentence, sentiment should be positive
    2. Invariance
    - Perturbations that should not change the
    output of the model
    - Ex: Changing location name, typos
    3. Directional Expectation
    - Perturbations to the input with known
    expected results
    - Ex: adding negative phrases and checking
    that sentiment polarity does not increase
    10
    Test cases generation
    ✓ Templates help to generate edge cases
    - Ex: I {NEGATION} {POS_VERB} the
    {THING}.
    where
    - {NEGATION} = {didn’t, can’t say I, ...}
    - {POS_VERB} = {love, like, ...},
    - {THING} = {food, flight, service, ...}
    ✓ Smart templates with suggestions:

    View Slide

  11. Interactive session

    View Slide

  12. No ML bias in production
    Deliver ML products, better & faster
    Website | GitHub | Email

    View Slide