3. Configuration#

Quantization is configured under the quantize section of your experiment specification. Dtype strings may be written as fp8_* or float8_* (aliases map to the same values in backends that accept FP8).

3.1. Top-level fields#

  • backend: torchao or modelopt.

  • mode: weight_only_ptq (TorchAO) or static_ptq (ModelOpt).

  • algorithm: calibration/optimisation algorithm for ModelOpt. Valid: minmax, entropy (see backend documentation).

  • default_layer_dtype: ignored by current backends; set weights.dtype per layer. Valid values remain: int8, fp8_e4m3fn, fp8_e5m2, native.

  • default_activation_dtype: ignored by current backends; set activations.dtype per layer (ModelOpt only). Same valid options as above.

  • layers: list of layer-wise configurations.

  • skip_names: list of module name patterns to exclude from quantization.

  • model_path: trained checkpoint path to quantize.

  • results_dir: directory for quantized artifacts.

3.2. Schema reference (auto-generated)#

4. ModelQuantizationConfig Fields#

Field

value_type

description

default_value

valid_options

backend

categorical

The quantization backend to use

torchao

modelopt,torchao

mode

categorical

The quantization mode to use

weight_only_ptq

static_ptq,weight_only_ptq

algorithm

categorical

Calibration or optimisation algorithm name to pass to the backend configuration. For the ‘modelopt’ backend, this becomes the top-level ‘algorithm’ field

minmax

minmax,entropy

default_layer_dtype

categorical

Default data type for layers (currently ignored by backends; specify dtype per layer)

native

int8,fp8_e4m3fn,fp8_e5m2,native

default_activation_dtype

categorical

Default data type for activations (currently ignored by backends; specify dtype per layer)

native

int8,fp8_e4m3fn,fp8_e5m2,native

layers

list

List of per-module quantization configurations

[]

skip_names

list

List of module or layer names or patterns to exclude from quantization

[]

model_path

string

Path to the model to be quantized

results_dir

string

Path to where all the assets generated from a task are stored

4.1. Layer entries#

Each item accepts:

  • module_name: Qualified name or wildcard pattern; also matches module types (e.g., Linear, Conv2d).

  • weights: { dtype: <int8|fp8_e4m3fn|fp8_e5m2|native> }.

  • activations: { dtype: <...> } (ModelOpt only; ignored by TorchAO).

4.2. Pattern matching#

Patterns are matched first against the qualified module name in the graph, then against the module class name. Wildcards * and ? are supported.

4.3. Examples#

Weight-only int8 for all Linear layers, skip classifier head:

quantize:
  backend: "torchao"
  mode: "weight_only_ptq"
  default_layer_dtype: "int8"
  default_activation_dtype: "native"
  layers:
    - module_name: "Linear"
      weights: { dtype: "int8" }
  skip_names: ["classifier.fc"]

Static PTQ INT8 for conv/linear with INT8 activations:

quantize:
  backend: "modelopt"
  mode: "static_ptq"
  algorithm: "minmax"
  default_layer_dtype: "int8"
  default_activation_dtype: "int8"
  layers:
    - module_name: "Conv2d"
      weights: { dtype: "int8" }
      activations: { dtype: "int8" }
    - module_name: "Linear"
      weights: { dtype: "int8" }
      activations: { dtype: "int8" }

4.4. Task-specific notes#

  • Classification: No extra dataset fields are needed beyond your usual evaluation/inference configurations.

  • RT-DETR + ModelOpt: Ensure you have a representative validation set configured for calibration; evaluation inputs are reused.