2. Getting started#
This guide helps you run PTQ quickly with default settings.
2.1. Prerequisites#
A trained checkpoint
TAO Toolkit (PyTorch) installed
2.2. Minimal specification snippets#
TorchAO (weight-only PTQ):
quantize:
model_path: "/path/to/model.pth"
results_dir: "/path/to/quantized_output"
backend: "torchao"
mode: "weight_only_ptq"
default_layer_dtype: "int8" # currently ignored by backends; set per-layer
default_activation_dtype: "native" # ignored by torchao
layers:
- module_name: "Linear"
weights: { dtype: "int8" }
ModelOpt (static PTQ):
quantize:
model_path: "/path/to/model.pth"
results_dir: "/path/to/quantized_output"
backend: "modelopt"
mode: "static_ptq"
algorithm: "minmax"
default_layer_dtype: "int8" # currently ignored by backends; set per-layer
default_activation_dtype: "int8" # currently ignored by backends; set per-layer
layers:
- module_name: "Linear"
weights: { dtype: "int8" }
activations: { dtype: "int8" }
2.3. Run quantization#
Classification:
tao classification_pyt quantize -e <specification.yaml>
.RT-DETR:
tao rtdetr quantize -e <specification.yaml>
.
2.4. Use the quantized checkpoint#
Set evaluate.is_quantized: true
or inference.is_quantized: true
and point to the produced artifact in results_dir
(e.g., quantized_model_torchao.pth
or quantized_model_modelopt.pth
). For ModelOpt checkpoints, the loader expects the state under model_state_dict
.