1. Terminology#
This page explains common terms used throughout TAO Quant. It aims to be beginner-friendly and concise.
1.1. Core concepts#
Quantization: Converting model numbers from high precision (e.g., FP32) to lower precision (e.g., INT8 or FP8) to reduce memory and improve speed.
PTQ (Post-Training Quantization): Quantize a pretrained model without fine-tuning; may require a small calibration dataset.
QAT (Quantization-Aware Training): Train (or fine-tune) the model with fake-quant operators to recover or retain accuracy at lower precision. Not covered by current TAO Quant release.
Weights: Learnable parameters of layers (e.g., kernels, matrices).
Activations: Intermediate outputs produced when the model processes inputs.
Dtype (Data type): Numeric precision or format, such as
int8
,fp8_e4m3fn
,fp8_e5m2
, ornative
(use original precision).
1.2. Framework pieces#
Backend: A plug-in that implements the quantization steps for a specific library. TAO supports: -
torchao
: Weight-only PTQ; no calibration loop; ignores activation settings. -modelopt
: Static PTQ with calibration; quantizes weights and activations.Calibration: A short pass over representative data to compute ranges and scales for activations and weights (used by backends like ModelOpt).
Observer or Fake-quant: Modules inserted during quantization to measure ranges or simulate lower-precision behavior during inference and training.
1.3. Configuration terms#
``quantize`` section: Where you specify backend, mode, default dtypes, per-layer rules, and paths.
``mode``: -
weight_only_ptq
: quantize only weights (e.g., TorchAO) -static_ptq
: quantize weights and activations with calibration (e.g., ModelOpt)Per-layer rules:
layers
list withmodule_name
patterns and optionalweights
andactivations
dtypes.``skip_names``: Patterns to exclude modules from quantization.
1.4. Good to know#
FP8 variants (
fp8_e4m3fn
andfp8_e5m2
) are accepted; some backends treat them equivalently.Always validate accuracy after quantization; representativeness of the calibration data matters.