What Is Model Inversion?
Model Inversion is an attack against a trained AI model that reconstructs sensitive training data by repeatedly querying the model and analysing its outputs.
Model Inversion — an attack against a trained AI model that reconstructs sensitive training data by repeatedly querying the model and analysing its outputs.
Model inversion attacks exploit the fact that trained models can "memorise" aspects of their training data. An attacker with query access to the model can use its predictions to reconstruct private information — for example, reconstructing a face image from a facial recognition model, or inferring medical information from a clinical prediction model. The attack is particularly relevant for AI systems trained on health, financial, or biometric data. Differential privacy techniques during training can mitigate inversion risk.
Source: Fredrikson et al. (2015); NIST AI 100-2
Plain-language explanation
Model inversion attacks exploit the fact that trained models can "memorise" aspects of their training data. An attacker with query access to the model can use its predictions to reconstruct private information — for example, reconstructing a face image from a facial recognition model, or inferring medical information from a clinical prediction model. The attack is particularly relevant for AI systems trained on health, financial, or biometric data. Differential privacy techniques during training can mitigate inversion risk.
See where you stand on AI governance
Take the free 7-question maturity assessment and get a personalised action plan.
Free assessment — 3 minutes →