7. API reference#

7.1. High-level API#

nvidia_tao_pytorch.core.quantization.quantizer.ModelQuantizer

  • __init__(cfg_like): Accepts a TAO ModelQuantizationConfig, OmegaConf DictConfig, or dict.

  • prepare(model) -> nn.Module: Prepare the model (backend-specific; often a no-op).

  • calibrate(model, dataloader): No-op unless backend implements calibration (ModelOpt).

  • quantize(model=None) -> nn.Module: Convert prepared model to quantized model.

  • quantize_model(model, calibration_loader=None) -> nn.Module: End-to-end helper (prepare -> optional calibrate -> quantize).

  • save_model(model=None, path:str=''): Save an artifact; backends may override format.

7.2. Backends#

torchao backend

  • Implements weight-only PTQ. Ignores activations. Accepts weights.dtype in {int8, fp8_e4m3fn, fp8_e5m2}.

  • Saves state_dict to quantized_model_torchao.pth.

modelopt backend

  • Implements static PTQ with calibration and both weight and activation dtypes.

  • Saves structured artifact to quantized_model_modelopt.pth (model weights at model_state_dict).

7.3. Configuration schema (selected fields)#

  • backend: 'torchao' | 'modelopt'.

  • mode: 'weight_only_ptq' | 'static_ptq'.

  • algorithm: 'minmax' | 'entropy' (ModelOpt).

  • default_layer_dtype: 'int8' | 'fp8_e4m3fn' | 'fp8_e5m2' | 'native'.

  • default_activation_dtype: same domain (ModelOpt).

  • layers[*].module_name: string pattern (qualified name or class name).

  • layers[*].weights.dtype: same domain as above.

  • layers[*].activations.dtype: same domain as above (ModelOpt).

  • skip_names: list of patterns to exclude.

  • model_path: path to trained checkpoint.

  • results_dir: output directory.

7.4. Pattern matching rules#

  • Wildcards * and ? supported.

  • First match against qualified module name; fall back to class name.

  • Later layer entries override earlier ones; skip_names removes matches.

7.5. Error handling#

  • Unsupported dtypes raise a clear error listing valid options.

  • Backends validate supported mode and will error if mismatched.

  • Missing or wrong-type configuration fields raise type or validation errors during prepare and quantize.