What Is Interpretability?
Interpretability is the degree to which a human can understand the internal mechanics or cause of an AI model's output.
Interpretability โ the degree to which a human can understand the internal mechanics or cause of an AI model's output.
Interpretability is about seeing inside the model โ understanding why it behaves the way it does at a mechanical level. It is related to but distinct from explainability, which is more often about producing an after-the-fact, human-readable account of a single decision. Complex models such as deep neural networks tend to be powerful but hard to interpret, which is a central tension in AI risk.
Source: NIST AI RMF; academic literature
Plain-language explanation
Interpretability is about seeing inside the model โ understanding why it behaves the way it does at a mechanical level. It is related to but distinct from explainability, which is more often about producing an after-the-fact, human-readable account of a single decision. Complex models such as deep neural networks tend to be powerful but hard to interpret, which is a central tension in AI risk.
Related terms
See where you stand on AI governance
Take the free 7-question maturity assessment and get a personalised action plan.
Free assessment โ 3 minutes โ