Technical Risk

What Is Model Inversion?

Model Inversion is an attack against a trained AI model that reconstructs sensitive training data by repeatedly querying the model and analysing its outputs.

Definition

Model Inversion — an attack against a trained AI model that reconstructs sensitive training data by repeatedly querying the model and analysing its outputs.

Model inversion attacks exploit the fact that trained models can "memorise" aspects of their training data. An attacker with query access to the model can use its predictions to reconstruct private information — for example, reconstructing a face image from a facial recognition model, or inferring medical information from a clinical prediction model. The attack is particularly relevant for AI systems trained on health, financial, or biometric data. Differential privacy techniques during training can mitigate inversion risk.

Source: Fredrikson et al. (2015); NIST AI 100-2

Plain-language explanation

Primary source: Fredrikson et al. (2015); NIST AI 100-2

See where you stand on AI governance

Take the free 7-question maturity assessment and get a personalised action plan.

Free assessment — 3 minutes →