3. Configuration#

Quantization is configured under the quantize section of your experiment specification. Dtype strings may be written as fp8_* or float8_* (aliases map to the same values in backends that accept FP8).

3.1. Top-level fields#

backend: torchao or modelopt.
mode: weight_only_ptq (TorchAO) or static_ptq (ModelOpt).
algorithm: calibration/optimisation algorithm for ModelOpt. Valid: minmax, entropy (see backend documentation).
default_layer_dtype: ignored by current backends; set weights.dtype per layer. Valid values remain: int8, fp8_e4m3fn, fp8_e5m2, native.
default_activation_dtype: ignored by current backends; set activations.dtype per layer (ModelOpt only). Same valid options as above.
layers: list of layer-wise configurations.
skip_names: list of module name patterns to exclude from quantization.
model_path: trained checkpoint path to quantize.
results_dir: directory for quantized artifacts.

3.2. Schema reference (auto-generated)#

4. ModelQuantizationConfig Fields#

Field	value_type	description	default_value	valid_options
`backend`	categorical	The quantization backend to use	torchao	modelopt,torchao
`mode`	categorical	The quantization mode to use	weight_only_ptq	static_ptq,weight_only_ptq
`algorithm`	categorical	Calibration or optimisation algorithm name to pass to the backend configuration. For the ‘modelopt’ backend, this becomes the top-level ‘algorithm’ field	minmax	minmax,entropy
`default_layer_dtype`	categorical	Default data type for layers (currently ignored by backends; specify dtype per layer)	native	int8,fp8_e4m3fn,fp8_e5m2,native
`default_activation_dtype`	categorical	Default data type for activations (currently ignored by backends; specify dtype per layer)	native	int8,fp8_e4m3fn,fp8_e5m2,native
`layers`	list	List of per-module quantization configurations	[]
`skip_names`	list	List of module or layer names or patterns to exclude from quantization	[]
`model_path`	string	Path to the model to be quantized
`results_dir`	string	Path to where all the assets generated from a task are stored

4.1. Layer entries#

Each item accepts:

module_name: Qualified name or wildcard pattern; also matches module types (e.g., Linear, Conv2d).
weights: { dtype: <int8|fp8_e4m3fn|fp8_e5m2|native> }.
activations: { dtype: <...> } (ModelOpt only; ignored by TorchAO).

4.2. Pattern matching#

Patterns are matched first against the qualified module name in the graph, then against the module class name. Wildcards * and ? are supported.

4.3. Examples#

Weight-only int8 for all Linear layers, skip classifier head:

quantize:
  backend: "torchao"
  mode: "weight_only_ptq"
  default_layer_dtype: "int8"
  default_activation_dtype: "native"
  layers:
    - module_name: "Linear"
      weights: { dtype: "int8" }
  skip_names: ["classifier.fc"]

Static PTQ INT8 for conv/linear with INT8 activations:

quantize:
  backend: "modelopt"
  mode: "static_ptq"
  algorithm: "minmax"
  default_layer_dtype: "int8"
  default_activation_dtype: "int8"
  layers:
    - module_name: "Conv2d"
      weights: { dtype: "int8" }
      activations: { dtype: "int8" }
    - module_name: "Linear"
      weights: { dtype: "int8" }
      activations: { dtype: "int8" }

4.4. Task-specific notes#

Classification: No extra dataset fields are needed beyond your usual evaluation/inference configurations.
RT-DETR + ModelOpt: Ensure you have a representative validation set configured for calibration; evaluation inputs are reused.