What Is AI Alignment?
AI Alignment is the problem of ensuring an AI system pursues the goals its designers and society actually intend, rather than unintended proxies.
AI Alignment — the problem of ensuring an AI system pursues the goals its designers and society actually intend, rather than unintended proxies.
Alignment is about making sure a capable system does what we mean, not just what we literally asked. Misalignment can be subtle — a model that optimises a metric in ways that defeat the metric's purpose — and it becomes more consequential as systems become more capable and autonomous. It is a central concern in AI safety research and in the safety frameworks of frontier developers.
Source: AI safety research literature
Plain-language explanation
Alignment is about making sure a capable system does what we mean, not just what we literally asked. Misalignment can be subtle — a model that optimises a metric in ways that defeat the metric's purpose — and it becomes more consequential as systems become more capable and autonomous. It is a central concern in AI safety research and in the safety frameworks of frontier developers.
Related terms
See where you stand on AI governance
Take the free 7-question maturity assessment and get a personalised action plan.
Free assessment — 3 minutes →