What Is Model Evaluation?
Model Evaluation is the systematic testing of an AI model's capabilities, limitations, and risks, using benchmarks, structured tests, and adversarial probing.
Model Evaluation — the systematic testing of an AI model's capabilities, limitations, and risks, using benchmarks, structured tests, and adversarial probing.
Model evaluation (often shortened to "evals") is how teams find out what a model can and cannot do before relying on it — covering accuracy on benchmarks, behaviour on edge cases, safety properties, and resistance to misuse. As frontier models have grown more capable, structured evaluation has become central to both developer safety frameworks and deployer due diligence.
Source: NIST AI RMF; frontier model safety frameworks
Plain-language explanation
Model evaluation (often shortened to "evals") is how teams find out what a model can and cannot do before relying on it — covering accuracy on benchmarks, behaviour on edge cases, safety properties, and resistance to misuse. As frontier models have grown more capable, structured evaluation has become central to both developer safety frameworks and deployer due diligence.
Related terms
See where you stand on AI governance
Take the free 7-question maturity assessment and get a personalised action plan.
Free assessment — 3 minutes →