Governance Practice

What Is Interpretability?

Interpretability is the degree to which a human can understand the internal mechanics or cause of an AI model's output.

Definition

Interpretability — the degree to which a human can understand the internal mechanics or cause of an AI model's output.

Interpretability is about seeing inside the model — understanding why it behaves the way it does at a mechanical level. It is related to but distinct from explainability, which is more often about producing an after-the-fact, human-readable account of a single decision. Complex models such as deep neural networks tend to be powerful but hard to interpret, which is a central tension in AI risk.

Source: NIST AI RMF; academic literature

Plain-language explanation

Primary source: NIST AI RMF; academic literature

Related insights

AI Safety and Enterprise AI Governance: Where They Overlap and Why Both Matter

9 min read · Emerging Technology

See where you stand on AI governance

Take the free 7-question maturity assessment and get a personalised action plan.

Free assessment — 3 minutes →

What Is Interpretability?

Plain-language explanation

Related insights

Related terms