AIRiskAware
AI Governance Glossary
Governance Concept

What Is Quantization?

Quantization is a technique that reduces the numerical precision of a model's parameters (for example, from 16-bit to 8- or 4-bit), shrinking its size and speeding up inference at some cost to accuracy.

Definition

Quantizationa technique that reduces the numerical precision of a model's parameters (for example, from 16-bit to 8- or 4-bit), shrinking its size and speeding up inference at some cost to accuracy.

Quantization is widely used to run large models on limited hardware, including on-device. The governance point is that a quantized model can behave subtly differently from the original — so accuracy, safety, and bias testing should be repeated on the version actually deployed, not just the full-precision one.

Source: Machine-learning literature

Plain-language explanation

Quantization is widely used to run large models on limited hardware, including on-device. The governance point is that a quantized model can behave subtly differently from the original — so accuracy, safety, and bias testing should be repeated on the version actually deployed, not just the full-precision one.

Primary source: Machine-learning literature

Related terms

Model Distillation Inference (AI) Open-Weight Model Fine-Tuning

See where you stand on AI governance

Take the free 7-question maturity assessment and get a personalised action plan.

Free assessment — 3 minutes →