AIRiskAware
AI Governance Glossary
Governance Practice

What Is Model Evaluation?

Model Evaluation is the systematic testing of an AI model's capabilities, limitations, and risks, using benchmarks, structured tests, and adversarial probing.

Definition

Model Evaluationthe systematic testing of an AI model's capabilities, limitations, and risks, using benchmarks, structured tests, and adversarial probing.

Model evaluation (often shortened to "evals") is how teams find out what a model can and cannot do before relying on it — covering accuracy on benchmarks, behaviour on edge cases, safety properties, and resistance to misuse. As frontier models have grown more capable, structured evaluation has become central to both developer safety frameworks and deployer due diligence.

Source: NIST AI RMF; frontier model safety frameworks

Plain-language explanation

Model evaluation (often shortened to "evals") is how teams find out what a model can and cannot do before relying on it — covering accuracy on benchmarks, behaviour on edge cases, safety properties, and resistance to misuse. As frontier models have grown more capable, structured evaluation has become central to both developer safety frameworks and deployer due diligence.

Primary source: NIST AI RMF; frontier model safety frameworks

Related terms

AI Red Teaming AI Safety Model Card Robustness

See where you stand on AI governance

Take the free 7-question maturity assessment and get a personalised action plan.

Free assessment — 3 minutes →