What an AI audit is, and what it is not
An AI audit is a systematic examination of an AI system across multiple dimensions: its training data and data governance practices, its technical performance, its fairness properties, its human oversight mechanisms, and its compliance with applicable regulatory requirements. It produces documented findings and, critically, documented decisions about how those findings will be addressed.
An AI audit is not a penetration test, a code review, or a performance benchmark, though any of these might be components of a broader audit program. It is not a vendor compliance certification. And it is not a one-time exercise: effective AI auditing is a recurring practice, not a checkbox.
The distinction matters because organisations frequently conduct technical reviews that they describe as audits but that miss the governance dimensions that create the most significant organisational and regulatory risk. A model that is technically accurate but produces discriminatory outcomes has passed a technical review and failed an audit.
When an AI audit is required
AI auditing is moving from best practice to legal requirement across multiple jurisdictions.
The EU AI Act requires conformity assessment for high-risk AI systems before deployment, a structured evaluation of compliance with the Act's technical and governance requirements. For most Annex III high-risk systems, this self-assessment. For biometric identification systems, third-party assessment by a notified body is mandatory. Post-market monitoring, ongoing audit-type activity, is required for all high-risk systems throughout their operational life.
New York City's Local Law 144 requires employers using automated employment decision tools to conduct annual bias audits conducted by independent auditors, and to publish a summary of results. This is the first mandatory AI auditing requirement for private employers in the US, and similar legislation is being considered in other jurisdictions.
Beyond legal requirements, AI auditing is increasingly a due diligence expectation. Enterprise procurement processes, investor due diligence, and insurance underwriting for technology risk all increasingly require documented evidence of AI system auditing.
The five domains of an AI audit
Domain 1: Data governance
Audit questions: Where did the training data come from? Does the organisation have appropriate rights to use this data for this purpose? How was the data collected, labelled, and processed? What quality controls were applied? How representative is the training data of the population the system will be applied to? What is the process for handling data quality issues identified post-deployment?
Common findings: Training data sourced from contexts where different demographics are underrepresented; labelling processes that encoded human biases into training data; insufficient documentation of data provenance; no process for incorporating performance feedback into data governance.
Domain 2: Model performance
Audit questions: What performance metrics were defined before development, and are they appropriate for the use case? How was performance measured, and against what test dataset? Has performance been measured in the actual deployment environment, not just in controlled test conditions? What is the process for detecting and responding to performance drift?
Common findings: Performance metrics defined after development to fit observed results; test datasets that do not represent deployment conditions; no monitoring for performance drift; performance measured only at deployment, not on an ongoing basis.
Domain 3: Fairness and bias
Audit questions: Has the system been tested for disparate impact across relevant protected characteristics? What fairness metrics were used, and are they appropriate for the use case? What disparities were identified, and how were they addressed? Is ongoing bias monitoring in place?
Common findings: No pre-deployment bias testing; bias testing conducted on aggregate data rather than stratified by relevant subgroups; disparities identified but not remediated; no ongoing monitoring after deployment.
Domain 4: Human oversight
Audit questions: Who is accountable for this system's performance and compliance? What decisions does the system make or influence, and what human review mechanisms exist? Can affected individuals request human review of AI decisions? What triggers escalation from automated to human decision-making? Is there a process for suspending the system if serious risks are detected?
Common findings: No named accountability owner; human oversight mechanisms that exist on paper but are not used in practice; no mechanism for affected individuals to request review; no clear criteria for system suspension.
Domain 5: Regulatory compliance
Audit questions: What regulatory obligations apply to this system in each jurisdiction where it is deployed? Has a conformity assessment been conducted where required? Is the system registered in applicable regulatory databases? Is technical documentation complete and current?
Common findings: Regulatory obligations not identified or mapped; no conformity assessment conducted for high-risk systems; technical documentation incomplete or outdated; regulatory status not reviewed when system is modified or deployed in new jurisdictions.
Who should conduct the audit
Effective AI auditing requires a combination of internal and external scrutiny. Internal audits, conducted by audit, compliance, or risk functions independent of the team responsible for the AI system, can assess processes, documentation, and operational controls effectively. They are valuable and should be conducted regularly.
External audits add independence, specialist expertise, and the credibility that internal review cannot provide. For high-risk systems, regulated deployment contexts, or situations where audit findings will be shared with regulators, investors, or enterprise customers, independent external review is appropriate.
The team conducting the audit should include: technical expertise sufficient to understand the model architecture and evaluate performance claims; data governance expertise to assess data provenance and quality; legal or compliance expertise to assess regulatory obligations; and domain expertise relevant to the use case.
What to do with audit findings
Audit findings require documented responses. For each material finding, the organisation must decide: accept the finding and document the rationale for accepting the associated risk; remediate the finding within a defined timeframe with documented ownership; or suspend or modify the system pending remediation.
Findings that are noted, filed, and not acted upon do not constitute governance. They constitute evidence of governance failure that may be more damaging in regulatory proceedings than having conducted no audit at all.