Governance Practice

What Is Model Evaluation?

Model Evaluation is the systematic testing of an AI model's capabilities, limitations, and risks, using benchmarks, structured tests, and adversarial probing.

Definition

Model Evaluation — the systematic testing of an AI model's capabilities, limitations, and risks, using benchmarks, structured tests, and adversarial probing.

Model evaluation (often shortened to "evals") is how teams find out what a model can and cannot do before relying on it — covering accuracy on benchmarks, behaviour on edge cases, safety properties, and resistance to misuse. As frontier models have grown more capable, structured evaluation has become central to both developer safety frameworks and deployer due diligence.

Source: NIST AI RMF; frontier model safety frameworks

Plain-language explanation

Primary source: NIST AI RMF; frontier model safety frameworks

See where you stand on AI governance

Take the free 7-question maturity assessment and get a personalised action plan.

Free assessment — 3 minutes →

What Is Model Evaluation?

Plain-language explanation

Related terms