What Is Data Leakage?
Data Leakage is the unintended exposure of sensitive information through an AI system — either into a model during training (where it may later be reproduced) or out of a system at inference, through prompts, outputs, logs, or stored context.
Data Leakage — the unintended exposure of sensitive information through an AI system — either into a model during training (where it may later be reproduced) or out of a system at inference, through prompts, outputs, logs, or stored context.
Data leakage spans two related risks: confidential or personal data absorbed into a model and later emitted, and sensitive data exposed via prompts, embeddings, logs, or an oversized context window. (In a narrower modelling sense, "data leakage" also means test information contaminating training, which inflates evaluation scores.) Both make data handling, retention, and access control central to AI governance.
Source: Machine-learning literature; NIST AI 100-1
Plain-language explanation
Data leakage spans two related risks: confidential or personal data absorbed into a model and later emitted, and sensitive data exposed via prompts, embeddings, logs, or an oversized context window. (In a narrower modelling sense, "data leakage" also means test information contaminating training, which inflates evaluation scores.) Both make data handling, retention, and access control central to AI governance.
Related terms
See where you stand on AI governance
Take the free 7-question maturity assessment and get a personalised action plan.
Free assessment — 3 minutes →