on specific properties (accuracy, relevance, toxicity, etc.). Example: LLM-as-a-judge, Reward Models, small classifiers Advantages ✅This is how models are used ✅Can handle complex tasks ✅Provide direct feedback Limitations ❌Hidden biases (e.g., length, tone) ❌Quality validation needed ❌Costly at scale Clémentine Fourrier and The Hugging Face Community, "LLM Evaluation Guidebook.", 2024. EQ-Bench by Samuel J. Peach