6. ModelOpt backend (static PTQ)#

6.1. Overview#

  • Static PTQ with optional calibration loop over representative data.

  • Quantizes both weights and activations (INT8/FP8).

  • Algorithm selection via algorithm (see below).

6.2. Supported options#

  • mode: static_ptq.

  • algorithm: minmax (range via min/max), entropy. If unset, defaults to minmax.

  • weights.dtype: int8, fp8_e4m3fn, fp8_e5m2, native.

  • activations.dtype: int8, fp8_e4m3fn, fp8_e5m2, native.

  • default_layer_dtype / default_activation_dtype: currently ignored by this backend; specify dtypes per layer.

  • skip_names: remove modules from quantization.

6.3. Calibration#

  • Provide a DataLoader via TAO’s evaluation configurations; the integration builds a forward loop and runs it during quantization.

  • Batches can be tensors, tuples (first element is input), or dicts with key input.

6.4. Example config#

quantize:
  backend: "modelopt"
  mode: "static_ptq"
  algorithm: "minmax"
  default_layer_dtype: "int8"
  default_activation_dtype: "int8"
  layers:
    - module_name: "Conv2d"
      weights: { dtype: "int8" }
      activations: { dtype: "int8" }
    - module_name: "Linear"
      weights: { dtype: "int8" }
      activations: { dtype: "int8" }

6.5. Outputs#

  • Saved artifact in results_dir named quantized_model_modelopt.pth containing a structured checkpoint. The model state dict is under model_state_dict.

6.6. Notes#

  • In PyTorch runtime, ModelOpt inserts fake-quant operations; speedups may be limited but the exported checkpoint includes calibrated scales.