7. API reference#
7.1. High-level API#
nvidia_tao_pytorch.core.quantization.quantizer.ModelQuantizer
__init__(cfg_like)
: Accepts a TAOModelQuantizationConfig
, OmegaConfDictConfig
, ordict
.prepare(model) -> nn.Module
: Prepare the model (backend-specific; often a no-op).calibrate(model, dataloader)
: No-op unless backend implements calibration (ModelOpt).quantize(model=None) -> nn.Module
: Convert prepared model to quantized model.quantize_model(model, calibration_loader=None) -> nn.Module
: End-to-end helper (prepare -> optional calibrate -> quantize).save_model(model=None, path:str='')
: Save an artifact; backends may override format.
7.2. Backends#
torchao
backend
Implements weight-only PTQ. Ignores activations. Accepts
weights.dtype
in{int8, fp8_e4m3fn, fp8_e5m2}
.Saves
state_dict
toquantized_model_torchao.pth
.
modelopt
backend
Implements static PTQ with calibration and both weight and activation dtypes.
Saves structured artifact to
quantized_model_modelopt.pth
(model weights atmodel_state_dict
).
7.3. Configuration schema (selected fields)#
backend
:'torchao'
|'modelopt'
.mode
:'weight_only_ptq'
|'static_ptq'
.algorithm
:'minmax'
|'entropy'
(ModelOpt).default_layer_dtype
:'int8'
|'fp8_e4m3fn'
|'fp8_e5m2'
|'native'
.default_activation_dtype
: same domain (ModelOpt).layers[*].module_name
: string pattern (qualified name or class name).layers[*].weights.dtype
: same domain as above.layers[*].activations.dtype
: same domain as above (ModelOpt).skip_names
: list of patterns to exclude.model_path
: path to trained checkpoint.results_dir
: output directory.
7.4. Pattern matching rules#
Wildcards
*
and?
supported.First match against qualified module name; fall back to class name.
Later layer entries override earlier ones;
skip_names
removes matches.
7.5. Error handling#
Unsupported dtypes raise a clear error listing valid options.
Backends validate supported
mode
and will error if mismatched.Missing or wrong-type configuration fields raise type or validation errors during prepare and quantize.