What Is Quantization?
Quantization is a technique that reduces the numerical precision of a model's parameters (for example, from 16-bit to 8- or 4-bit), shrinking its size and speeding up inference at some cost to accuracy.
Quantization — a technique that reduces the numerical precision of a model's parameters (for example, from 16-bit to 8- or 4-bit), shrinking its size and speeding up inference at some cost to accuracy.
Quantization is widely used to run large models on limited hardware, including on-device. The governance point is that a quantized model can behave subtly differently from the original — so accuracy, safety, and bias testing should be repeated on the version actually deployed, not just the full-precision one.
Source: Machine-learning literature
Plain-language explanation
Quantization is widely used to run large models on limited hardware, including on-device. The governance point is that a quantized model can behave subtly differently from the original — so accuracy, safety, and bias testing should be repeated on the version actually deployed, not just the full-precision one.
Related terms
See where you stand on AI governance
Take the free 7-question maturity assessment and get a personalised action plan.
Free assessment — 3 minutes →