Governance Concept

What Is Benchmark?

Benchmark is a standardised dataset or task used to measure and compare the performance of AI models on a defined capability.

Definition

Benchmark — a standardised dataset or task used to measure and compare the performance of AI models on a defined capability.

Benchmarks make model comparison possible, but they are an imperfect proxy for real-world performance: models can be tuned to score well on a benchmark without being correspondingly better in practice, and benchmark data can leak into training sets. Treating benchmark scores as the whole story is a common governance mistake.

Source: Machine-learning practice

Plain-language explanation

Primary source: Machine-learning practice

Related insights

Assessing AI Capability and Frontier Model Risk: What Enterprise Buyers Actually Need to Evaluate

11 min · Frontier AI

See where you stand on AI governance

Take the free 7-question maturity assessment and get a personalised action plan.

Free assessment — 3 minutes →

What Is Benchmark?

Plain-language explanation

Related insights

Related terms