10. API Reference#

10.1. High-Level API#

nvidia_tao_pytorch.core.quantization.quantizer.ModelQuantizer

__init__(cfg_like): Accepts a TAO ModelQuantizationConfig, OmegaConf DictConfig, or dict.
prepare(model) -> nn.Module: Prepare the model (backend-specific; often a no-op).
calibrate(model, dataloader): No-op unless backend implements calibration (ModelOpt backends).
quantize(model=None) -> nn.Module: Convert prepared model to quantized model.
quantize_model(model, calibration_loader=None) -> nn.Module: End-to-end helper (prepare -> optional calibrate -> quantize).
save_model(model=None, path:str=''): Save an artifact; backends may override format.

torchao backend

Implements weight-only PTQ. Ignores activations. Accepts weights.dtype in {int8, fp8_e4m3fn, fp8_e5m2}.
Works with PyTorch models.
Saves state_dict to quantized_model_torchao.pth.

modelopt.pytorch backend

Implements static PTQ with calibration and both weight and activation dtypes.
Works with PyTorch models.
Saves structured artifact to quantized_model_modelopt.pytorch.pth (model weights at model_state_dict).

modelopt.onnx backend

backend: 'torchao' | 'modelopt.pytorch' | 'modelopt.onnx'.
mode: 'weight_only_ptq' | 'static_ptq'.
algorithm: 'minmax' | 'max' | 'entropy' (ModelOpt backends).
default_layer_dtype: 'int8' | 'fp8_e4m3fn' | 'fp8_e5m2' | 'native'.
default_activation_dtype: Same domain (ModelOpt backends).
layers[*].module_name: String pattern (qualified name or class name for PyTorch; operator type for ONNX).
layers[*].weights.dtype: Same domain as above.
layers[*].activations.dtype: Same domain as above (ModelOpt backends).
skip_names: List of patterns to exclude.
model_path: Path to trained checkpoint (PyTorch) or ONNX file.
results_dir: Output directory.

Unsupported dtypes raise a clear error listing valid options.
Backends validate supported mode and error if mismatched.
Missing or wrong-type configuration fields raise type or validation errors during prepare and quantize.