What Is Benchmark?
Benchmark is a standardised dataset or task used to measure and compare the performance of AI models on a defined capability.
Benchmark โ a standardised dataset or task used to measure and compare the performance of AI models on a defined capability.
Benchmarks make model comparison possible, but they are an imperfect proxy for real-world performance: models can be tuned to score well on a benchmark without being correspondingly better in practice, and benchmark data can leak into training sets. Treating benchmark scores as the whole story is a common governance mistake.
Source: Machine-learning practice
Plain-language explanation
Benchmarks make model comparison possible, but they are an imperfect proxy for real-world performance: models can be tuned to score well on a benchmark without being correspondingly better in practice, and benchmark data can leak into training sets. Treating benchmark scores as the whole story is a common governance mistake.
Related terms
See where you stand on AI governance
Take the free 7-question maturity assessment and get a personalised action plan.
Free assessment โ 3 minutes โ