Governance Concept

What Is Training Data Governance?

Training Data Governance is the policies and controls applied to the data used to train AI models, covering provenance, quality, representativeness, legality, and documentation.

Definition

Training Data Governance — the policies and controls applied to the data used to train AI models, covering provenance, quality, representativeness, legality, and documentation.

Training data governance is foundational to AI governance because models inherit the properties of their data — its biases, gaps, errors, and legal encumbrances. The EU AI Act Article 10 requires high-risk AI training data to be relevant, representative, free of errors, and complete. Key concerns include: data provenance (where did it come from, was it lawfully obtained), copyright (was licensed content used), personal data (GDPR compliance), and representativeness (does it reflect the population the model will serve).

Source: EU AI Act, Article 10; GDPR

Plain-language explanation

Primary source: EU AI Act, Article 10; GDPR

See where you stand on AI governance

Take the free 7-question maturity assessment and get a personalised action plan.

Free assessment — 3 minutes →