Formation au testing des Modèles

Quality Assurance for AI Etalab - Programme 10%

Quality is essential to all industries 2 🏦 Finance Audit
🍒 Food Nutri-score 🏭 Manufacturing ISO 9000 💾 Software Test-Driven Development

But the AI industry is struggling with incidents 3 ⚖
Ethics 🏦 Business 🛡 Security

4 The shift: AI leaders & regulators are pushing for
quality In the end, I only trust thorough testing. You can try to explain to me why a jet engine is reliable. But in the end, trust will come from looking at crash statistics. Yann LeCun, Chief AI Scientist Invented Convolutional Neural Networks “ Providers of high-risk AI systems shall put a quality management system to ensure compliance. Article 17, European AI Act Proposed in 2021 for approval in 2023 “

AI quality is correlated with revenue 6% Share of sales
revenue AI quality penalty risk in the upcoming 󰎾 AI regulation 5

Quality cures growing pains of AI teams 6 📔 Prototype
⚙ Production 🚀 Deployment Data Scientists spend months going back and forth with business stakeholders to validate model quality 3-6 months to evaluate an AI prototype ML Engineers become firefighters after users report bugs which overwhelm Data Scientists with maintenance 1,342 AI incidents reported by the AI Incident DB ML Engineers struggle to implement AI model tests which cover all quality criteria and test cases 87% of AI models never make it into production Sources: VentureBeat, AI Incident Database

AI quality brings unique technical challenges 7 ML quality ≠
data quality ✖ Clean data and bad model ✖ Errors are caused by data that did not exist when the model was created ML testing ≠ software testing ✖ ML cannot be broken down into unit components ✖ Probabilistic ✖ Constant updating Quality Criteria to consider ✖ Business: performance, robustness ✖ Tech: security, reliability ✖ Ethics: fairness, privacy ✖ Environment: carbon impact, efficiency

Behavioral Testing for NLP (Ribeiro et al., 2020)

“Capabilities”, a solution to address NLP Testing Capabilities ✓ Vocabulary
(important words or word types for the task) ✓ Taxonomy (synonyms, antonyms, etc) ✓ Robustness (to typos, irrelevant changes, etc) ✓ NER (appropriately understanding named entities) ✓ Fairness ✓ Temporal (understanding order of events) ✓ Negation ✓ Logic (ability to handle symmetry, consistency, and conjunctions)

How to generate capabilities tests? Test types 1. Minimum Functionality
- Simple test cases designed to target a speciﬁc behavior - Ex: When “extraordinary” is in the sentence, sentiment should be positive 2. Invariance - Perturbations that should not change the output of the model - Ex: Changing location name, typos 3. Directional Expectation - Perturbations to the input with known expected results - Ex: adding negative phrases and checking that sentiment polarity does not increase 10 Test cases generation ✓ Templates help to generate edge cases - Ex: I {NEGATION} {POS_VERB} the {THING}. where - {NEGATION} = {didn’t, can’t say I, ...} - {POS_VERB} = {love, like, ...}, - {THING} = {food, ﬂight, service, ...} ✓ Smart templates with suggestions:

Interactive session

No ML bias in production Deliver ML products, better &
faster Website | GitHub | Email

Formation au testing des Modèles

Formation au testing des Modèles

etalab-ia

More Decks by etalab-ia

Featured

Transcript

Quality Assurance for AI Etalab - Programme 10%

Quality is essential to all industries 2 🏦 Finance Audit

But the AI industry is struggling with incidents 3 ⚖

4 The shift: AI leaders & regulators are pushing for

AI quality is correlated with revenue 6% Share of sales

Quality cures growing pains of AI teams 6 📔 Prototype

AI quality brings unique technical challenges 7 ML quality ≠

Behavioral Testing for NLP (Ribeiro et al., 2020)

“Capabilities”, a solution to address NLP Testing Capabilities ✓ Vocabulary

How to generate capabilities tests? Test types 1. Minimum Functionality

Interactive session

No ML bias in production Deliver ML products, better &