Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Formation au testing des Modèles

etalab-ia
November 08, 2022
13

Formation au testing des Modèles

Formation au testing des modèles de machine learning, animée par Jean-Marie John-Mathews, co-fondateur de Giskard.

etalab-ia

November 08, 2022
Tweet

Transcript

  1. Quality Assurance for AI Etalab - Programme 10%

  2. Quality is essential to all industries 2 🏦 Finance Audit

    🍒 Food Nutri-score 🏭 Manufacturing ISO 9000 💾 Software Test-Driven Development
  3. But the AI industry is struggling with incidents 3 ⚖

    Ethics 🏦 Business 🛡 Security
  4. 4 The shift: AI leaders & regulators are pushing for

    quality In the end, I only trust thorough testing. You can try to explain to me why a jet engine is reliable. But in the end, trust will come from looking at crash statistics. Yann LeCun, Chief AI Scientist Invented Convolutional Neural Networks “ Providers of high-risk AI systems shall put a quality management system to ensure compliance. Article 17, European AI Act Proposed in 2021 for approval in 2023 “
  5. AI quality is correlated with revenue 6% Share of sales

    revenue AI quality penalty risk in the upcoming 󰎾 AI regulation 5
  6. Quality cures growing pains of AI teams 6 📔 Prototype

    ⚙ Production 🚀 Deployment Data Scientists spend months going back and forth with business stakeholders to validate model quality 3-6 months to evaluate an AI prototype ML Engineers become firefighters after users report bugs which overwhelm Data Scientists with maintenance 1,342 AI incidents reported by the AI Incident DB ML Engineers struggle to implement AI model tests which cover all quality criteria and test cases 87% of AI models never make it into production Sources: VentureBeat, AI Incident Database
  7. AI quality brings unique technical challenges 7 ML quality ≠

    data quality ✖ Clean data and bad model ✖ Errors are caused by data that did not exist when the model was created ML testing ≠ software testing ✖ ML cannot be broken down into unit components ✖ Probabilistic ✖ Constant updating Quality Criteria to consider ✖ Business: performance, robustness ✖ Tech: security, reliability ✖ Ethics: fairness, privacy ✖ Environment: carbon impact, efficiency
  8. Behavioral Testing for NLP (Ribeiro et al., 2020)

  9. “Capabilities”, a solution to address NLP Testing Capabilities ✓ Vocabulary

    (important words or word types for the task) ✓ Taxonomy (synonyms, antonyms, etc) ✓ Robustness (to typos, irrelevant changes, etc) ✓ NER (appropriately understanding named entities) ✓ Fairness ✓ Temporal (understanding order of events) ✓ Negation ✓ Logic (ability to handle symmetry, consistency, and conjunctions)
  10. How to generate capabilities tests? Test types 1. Minimum Functionality

    - Simple test cases designed to target a specific behavior - Ex: When “extraordinary” is in the sentence, sentiment should be positive 2. Invariance - Perturbations that should not change the output of the model - Ex: Changing location name, typos 3. Directional Expectation - Perturbations to the input with known expected results - Ex: adding negative phrases and checking that sentiment polarity does not increase 10 Test cases generation ✓ Templates help to generate edge cases - Ex: I {NEGATION} {POS_VERB} the {THING}. where - {NEGATION} = {didn’t, can’t say I, ...} - {POS_VERB} = {love, like, ...}, - {THING} = {food, flight, service, ...} ✓ Smart templates with suggestions:
  11. Interactive session

  12. No ML bias in production Deliver ML products, better &

    faster Website | GitHub | Email