AIRiskAware
AI Governance Glossary
Governance Practice

What Is Interpretability?

Interpretability is the degree to which a human can understand the internal mechanics or cause of an AI model's output.

Definition

Interpretability โ€” the degree to which a human can understand the internal mechanics or cause of an AI model's output.

Interpretability is about seeing inside the model โ€” understanding why it behaves the way it does at a mechanical level. It is related to but distinct from explainability, which is more often about producing an after-the-fact, human-readable account of a single decision. Complex models such as deep neural networks tend to be powerful but hard to interpret, which is a central tension in AI risk.

Source: NIST AI RMF; academic literature

Plain-language explanation

Interpretability is about seeing inside the model โ€” understanding why it behaves the way it does at a mechanical level. It is related to but distinct from explainability, which is more often about producing an after-the-fact, human-readable account of a single decision. Complex models such as deep neural networks tend to be powerful but hard to interpret, which is a central tension in AI risk.

Primary source: NIST AI RMF; academic literature

Related terms

Explainability AI Transparency Model Card

See where you stand on AI governance

Take the free 7-question maturity assessment and get a personalised action plan.

Free assessment โ€” 3 minutes โ†’