Governance Practice

What Is Data Leakage?

Data Leakage is the unintended exposure of sensitive information through an AI system — either into a model during training (where it may later be reproduced) or out of a system at inference, through prompts, outputs, logs, or stored context.

Definition

Data Leakage — the unintended exposure of sensitive information through an AI system — either into a model during training (where it may later be reproduced) or out of a system at inference, through prompts, outputs, logs, or stored context.

Data leakage spans two related risks: confidential or personal data absorbed into a model and later emitted, and sensitive data exposed via prompts, embeddings, logs, or an oversized context window. (In a narrower modelling sense, "data leakage" also means test information contaminating training, which inflates evaluation scores.) Both make data handling, retention, and access control central to AI governance.

Source: Machine-learning literature; NIST AI 100-1

Plain-language explanation

Primary source: Machine-learning literature; NIST AI 100-1

See where you stand on AI governance

Take the free 7-question maturity assessment and get a personalised action plan.

Free assessment — 3 minutes →

What Is Data Leakage?

Plain-language explanation

Related terms