7. API reference#
7.1. High-level API#
nvidia_tao_pytorch.core.quantization.quantizer.ModelQuantizer
__init__(cfg_like): Accepts a TAOModelQuantizationConfig, OmegaConfDictConfig, ordict.prepare(model) -> nn.Module: Prepare the model (backend-specific; often a no-op).calibrate(model, dataloader): No-op unless backend implements calibration (ModelOpt).quantize(model=None) -> nn.Module: Convert prepared model to quantized model.quantize_model(model, calibration_loader=None) -> nn.Module: End-to-end helper (prepare -> optional calibrate -> quantize).save_model(model=None, path:str=''): Save an artifact; backends may override format.
7.2. Backends#
torchao backend
Implements weight-only PTQ. Ignores activations. Accepts
weights.dtypein{int8, fp8_e4m3fn, fp8_e5m2}.Saves
state_dicttoquantized_model_torchao.pth.
modelopt backend
Implements static PTQ with calibration and both weight and activation dtypes.
Saves structured artifact to
quantized_model_modelopt.pth(model weights atmodel_state_dict).
7.3. Configuration schema (selected fields)#
backend:'torchao'|'modelopt'.mode:'weight_only_ptq'|'static_ptq'.algorithm:'minmax'|'entropy'(ModelOpt).default_layer_dtype:'int8'|'fp8_e4m3fn'|'fp8_e5m2'|'native'.default_activation_dtype: same domain (ModelOpt).layers[*].module_name: string pattern (qualified name or class name).layers[*].weights.dtype: same domain as above.layers[*].activations.dtype: same domain as above (ModelOpt).skip_names: list of patterns to exclude.model_path: path to trained checkpoint.results_dir: output directory.
7.4. Pattern matching rules#
Wildcards
*and?supported.First match against qualified module name; fall back to class name.
Later layer entries override earlier ones;
skip_namesremoves matches.
7.5. Error handling#
Unsupported dtypes raise a clear error listing valid options.
Backends validate supported
modeand will error if mismatched.Missing or wrong-type configuration fields raise type or validation errors during prepare and quantize.