What Is Data Governance?
The policies, processes, and accountabilities governing how data is managed across its lifecycle. A prerequisite for AI governance — you cannot govern AI well without governing the data it uses.
Data governance vs AI governance
Data governance addresses how an organisation manages its data assets across their lifecycle — the policies, standards, and accountability structures for data. AI governance addresses the governance of AI systems specifically — their development, deployment, oversight, and accountability. The two are distinct but deeply interconnected.
AI systems are fundamentally data systems — their behaviour is determined by the data they were trained on, receive as inputs, and produce as outputs. Poor data governance decisions propagate directly into AI governance failures: poor quality data produces poor quality AI outputs; biased training data produces biased AI; improperly consented data creates legal exposure.
Core elements of data governance
AI-specific data governance requirements
Training data provenance: the most frequently unaddressed area. Organisations need to demonstrate where training data came from, what legal basis or consent existed for its use in AI training, and whether the data was representative of the population the AI will serve.
Bias monitoring: assessing whether training data is representative and contains systematic biases that will propagate into model outputs. Requires both technical analysis (demographic distribution) and domain knowledge about what biases affect the specific use case.
Training data version control: knowing which version of training data produced which version of a model. As datasets are updated and models retrained, maintaining this linkage is essential for incident investigation and regulatory compliance.