10. API Reference#
10.1. High-Level API#
nvidia_tao_pytorch.core.quantization.quantizer.ModelQuantizer
__init__(cfg_like): Accepts a TAOModelQuantizationConfig, OmegaConfDictConfig, ordict.prepare(model) -> nn.Module: Prepare the model (backend-specific; often a no-op).calibrate(model, dataloader): No-op unless backend implements calibration (ModelOpt backends).quantize(model=None) -> nn.Module: Convert prepared model to quantized model.quantize_model(model, calibration_loader=None) -> nn.Module: End-to-end helper (prepare -> optional calibrate -> quantize).save_model(model=None, path:str=''): Save an artifact; backends may override format.
10.2. Backends#
torchao backend
Implements weight-only PTQ. Ignores activations. Accepts
weights.dtypein{int8, fp8_e4m3fn, fp8_e5m2}.Works with PyTorch models.
Saves
state_dicttoquantized_model_torchao.pth.
modelopt.pytorch backend
Implements static PTQ with calibration and both weight and activation dtypes.
Works with PyTorch models.
Saves structured artifact to
quantized_model_modelopt.pytorch.pth(model weights atmodel_state_dict).
modelopt.onnx backend
Implements static PTQ with calibration for ONNX models.
Works exclusively with ONNX files specified via
model_path.Saves quantized ONNX model to
quantized_model.onnx.
10.3. Configuration Schema (Selected Fields)#
backend:'torchao'|'modelopt.pytorch'|'modelopt.onnx'.mode:'weight_only_ptq'|'static_ptq'.algorithm:'minmax'|'max'|'entropy'(ModelOpt backends).default_layer_dtype:'int8'|'fp8_e4m3fn'|'fp8_e5m2'|'native'.default_activation_dtype: Same domain (ModelOpt backends).layers[*].module_name: String pattern (qualified name or class name for PyTorch; operator type for ONNX).layers[*].weights.dtype: Same domain as above.layers[*].activations.dtype: Same domain as above (ModelOpt backends).skip_names: List of patterns to exclude.model_path: Path to trained checkpoint (PyTorch) or ONNX file.results_dir: Output directory.
10.4. Pattern Matching Rules#
Wildcards
*and?supported.First match against qualified module name; fall back to class name.
Later layer entries override earlier ones;
skip_namesremoves matches.
10.5. Error Handling#
Unsupported dtypes raise a clear error listing valid options.
Backends validate supported
modeand error if mismatched.Missing or wrong-type configuration fields raise type or validation errors during prepare and quantize.