quality In the end, I only trust thorough testing. You can try to explain to me why a jet engine is reliable. But in the end, trust will come from looking at crash statistics. Yann LeCun, Chief AI Scientist Invented Convolutional Neural Networks “ Providers of high-risk AI systems shall put a quality management system to ensure compliance. Article 17, European AI Act Proposed in 2021 for approval in 2023 “
⚙ Production 🚀 Deployment Data Scientists spend months going back and forth with business stakeholders to validate model quality 3-6 months to evaluate an AI prototype ML Engineers become firefighters after users report bugs which overwhelm Data Scientists with maintenance 1,342 AI incidents reported by the AI Incident DB ML Engineers struggle to implement AI model tests which cover all quality criteria and test cases 87% of AI models never make it into production Sources: VentureBeat, AI Incident Database
data quality ✖ Clean data and bad model ✖ Errors are caused by data that did not exist when the model was created ML testing ≠ software testing ✖ ML cannot be broken down into unit components ✖ Probabilistic ✖ Constant updating Quality Criteria to consider ✖ Business: performance, robustness ✖ Tech: security, reliability ✖ Ethics: fairness, privacy ✖ Environment: carbon impact, efficiency
(important words or word types for the task) ✓ Taxonomy (synonyms, antonyms, etc) ✓ Robustness (to typos, irrelevant changes, etc) ✓ NER (appropriately understanding named entities) ✓ Fairness ✓ Temporal (understanding order of events) ✓ Negation ✓ Logic (ability to handle symmetry, consistency, and conjunctions)
- Simple test cases designed to target a specific behavior - Ex: When “extraordinary” is in the sentence, sentiment should be positive 2. Invariance - Perturbations that should not change the output of the model - Ex: Changing location name, typos 3. Directional Expectation - Perturbations to the input with known expected results - Ex: adding negative phrases and checking that sentiment polarity does not increase 10 Test cases generation ✓ Templates help to generate edge cases - Ex: I {NEGATION} {POS_VERB} the {THING}. where - {NEGATION} = {didn’t, can’t say I, ...} - {POS_VERB} = {love, like, ...}, - {THING} = {food, flight, service, ...} ✓ Smart templates with suggestions: