7. API reference#

7.1. High-level API#

nvidia_tao_pytorch.core.quantization.quantizer.ModelQuantizer

__init__(cfg_like): Accepts a TAO ModelQuantizationConfig, OmegaConf DictConfig, or dict.
prepare(model) -> nn.Module: Prepare the model (backend-specific; often a no-op).
calibrate(model, dataloader): No-op unless backend implements calibration (ModelOpt).
quantize(model=None) -> nn.Module: Convert prepared model to quantized model.
quantize_model(model, calibration_loader=None) -> nn.Module: End-to-end helper (prepare -> optional calibrate -> quantize).
save_model(model=None, path:str=''): Save an artifact; backends may override format.

torchao backend

Implements weight-only PTQ. Ignores activations. Accepts weights.dtype in {int8, fp8_e4m3fn, fp8_e5m2}.
Saves state_dict to quantized_model_torchao.pth.

modelopt backend

Implements static PTQ with calibration and both weight and activation dtypes.
Saves structured artifact to quantized_model_modelopt.pth (model weights at model_state_dict).

Unsupported dtypes raise a clear error listing valid options.
Backends validate supported mode and will error if mismatched.
Missing or wrong-type configuration fields raise type or validation errors during prepare and quantize.