2. Getting started#

This guide helps you run PTQ quickly with default settings.

2.1. Prerequisites#

  • A trained checkpoint

  • TAO Toolkit (PyTorch) installed

2.2. Minimal specification snippets#

TorchAO (weight-only PTQ):

quantize:
  model_path: "/path/to/model.pth"
  results_dir: "/path/to/quantized_output"
  backend: "torchao"
  mode: "weight_only_ptq"
  default_layer_dtype: "int8"           # currently ignored by backends; set per-layer
  default_activation_dtype: "native"   # ignored by torchao
  layers:
    - module_name: "Linear"
      weights: { dtype: "int8" }

ModelOpt (static PTQ):

quantize:
  model_path: "/path/to/model.pth"
  results_dir: "/path/to/quantized_output"
  backend: "modelopt"
  mode: "static_ptq"
  algorithm: "minmax"
  default_layer_dtype: "int8"           # currently ignored by backends; set per-layer
  default_activation_dtype: "int8"      # currently ignored by backends; set per-layer
  layers:
    - module_name: "Linear"
      weights: { dtype: "int8" }
      activations: { dtype: "int8" }

2.3. Run quantization#

  • Classification: tao classification_pyt quantize -e <specification.yaml>.

  • RT-DETR: tao rtdetr quantize -e <specification.yaml>.

2.4. Use the quantized checkpoint#

Set evaluate.is_quantized: true or inference.is_quantized: true and point to the produced artifact in results_dir (e.g., quantized_model_torchao.pth or quantized_model_modelopt.pth). For ModelOpt checkpoints, the loader expects the state under model_state_dict.