apidays Paris 2024 - Evaluation as a Tool for Regulatory Compliance Scratching the AI Regulation Surface - Carlos Muñoz Ferrandis, Alinia AI

Evaluation as a tool for regulatory compliance: Scratching the AI
regulation surface Carlos Muñoz Ferrandis, co-founder & COO

1278% 1000+ Policy initiatives reported by gvmts in 70+ jurisdictions
in OECD database. May 2023. USA 2024: total number of AI-related regulations grew by Increase of AI policy and regulatory initiatives, worldwide AI mentioned in legislative proceedings 2022: 1,247 2023: 2,175 Between 2022 and 2023, AI incidents reported increased by globally... 56.3% Data extracted from OECD AI and AI Index Report 2024

Regulations are coming.

Main challenges... how to interpret regulation? how measure compliance? how
to anticipate and mitigate risk? how to report compliance? ...Challenge for whom? Market Authorities

Prohibited Practices High Risk AI Systems General Purpose AI Evaluation
& Red teaming Art 9, 15, 17 Art 53.1 Guardrails & Monitoring Art 9, 15, 17 Art 55.1 Documentation Art 11, 13, Annex IV Art 53, 55 EU AI Act control & risk mitigation tooling *Similarities with Digital Operational Resilience Act (arts 9, 10, 25)

EU AI Act Prohibited AI systems: “behavioral distortion” High risk
AI systems: “robustness” “perform consistently for their intended purpose” Digital Operational Resilience Act Robustness, Resilience, Reliability of ICT systems Let´s scratch a bit the regulatory surface... How do we measure and monitor unclear requirements?

Define criterion “Robustness” Define metrics/rubrics Train “Evaluator models” Run evals,
monitor at scale Close your eyes and pray Evaluation at scale is a need...and a science.

What is currently missing? Standardizing benchmarks for regulatory compliance Variety
of open Benchmarks + not so good datasets Focus on defining criteria + metrics to measure Transversal criteria vs industry-specific criteria “Cards” everywhere Industry-specific Gen AI evals + guardrails

Safe & controlled deployment of gen AI Thank you!

apidays Paris 2024 - Evaluation as a Tool for R...

apidays Paris 2024 - Evaluation as a Tool for Regulatory Compliance Scratching the AI Regulation Surface - Carlos Muñoz Ferrandis, Alinia AI

apidays PRO

More Decks by apidays

Other Decks in Programming

Featured

Transcript

Evaluation as a tool for regulatory compliance: Scratching the AI

1278% 1000+ Policy initiatives reported by gvmts in 70+ jurisdictions

Regulations are coming.

Main challenges... how to interpret regulation? how measure compliance? how

Prohibited Practices High Risk AI Systems General Purpose AI Evaluation

EU AI Act Prohibited AI systems: “behavioral distortion” High risk

Define criterion “Robustness” Define metrics/rubrics Train “Evaluator models” Run evals,

What is currently missing? Standardizing benchmarks for regulatory compliance Variety

Safe & controlled deployment of gen AI Thank you!