3. Configuration#
Quantization is configured under the quantize
section of your experiment specification. Dtype strings may be written as fp8_*
or float8_*
(aliases map to the same values in backends that accept FP8).
3.1. Top-level fields#
backend
:torchao
ormodelopt
.mode
:weight_only_ptq
(TorchAO) orstatic_ptq
(ModelOpt).algorithm
: calibration/optimisation algorithm for ModelOpt. Valid:minmax
,entropy
(see backend documentation).default_layer_dtype
: ignored by current backends; setweights.dtype
per layer. Valid values remain:int8
,fp8_e4m3fn
,fp8_e5m2
,native
.default_activation_dtype
: ignored by current backends; setactivations.dtype
per layer (ModelOpt only). Same valid options as above.layers
: list of layer-wise configurations.skip_names
: list of module name patterns to exclude from quantization.model_path
: trained checkpoint path to quantize.results_dir
: directory for quantized artifacts.
3.2. Schema reference (auto-generated)#
4. ModelQuantizationConfig Fields#
Field |
value_type |
description |
default_value |
valid_options |
---|---|---|---|---|
|
categorical |
The quantization backend to use |
torchao |
modelopt,torchao |
|
categorical |
The quantization mode to use |
weight_only_ptq |
static_ptq,weight_only_ptq |
|
categorical |
Calibration or optimisation algorithm name to pass to the backend configuration. For the ‘modelopt’ backend, this becomes the top-level ‘algorithm’ field |
minmax |
minmax,entropy |
|
categorical |
Default data type for layers (currently ignored by backends; specify dtype per layer) |
native |
int8,fp8_e4m3fn,fp8_e5m2,native |
|
categorical |
Default data type for activations (currently ignored by backends; specify dtype per layer) |
native |
int8,fp8_e4m3fn,fp8_e5m2,native |
|
list |
List of per-module quantization configurations |
[] |
|
|
list |
List of module or layer names or patterns to exclude from quantization |
[] |
|
|
string |
Path to the model to be quantized |
||
|
string |
Path to where all the assets generated from a task are stored |
4.1. Layer entries#
Each item accepts:
module_name
: Qualified name or wildcard pattern; also matches module types (e.g.,Linear
,Conv2d
).weights
:{ dtype: <int8|fp8_e4m3fn|fp8_e5m2|native> }
.activations
:{ dtype: <...> }
(ModelOpt only; ignored by TorchAO).
4.2. Pattern matching#
Patterns are matched first against the qualified module name in the graph, then against the module class name. Wildcards *
and ?
are supported.
4.3. Examples#
Weight-only int8 for all Linear layers, skip classifier head:
quantize:
backend: "torchao"
mode: "weight_only_ptq"
default_layer_dtype: "int8"
default_activation_dtype: "native"
layers:
- module_name: "Linear"
weights: { dtype: "int8" }
skip_names: ["classifier.fc"]
Static PTQ INT8 for conv/linear with INT8 activations:
quantize:
backend: "modelopt"
mode: "static_ptq"
algorithm: "minmax"
default_layer_dtype: "int8"
default_activation_dtype: "int8"
layers:
- module_name: "Conv2d"
weights: { dtype: "int8" }
activations: { dtype: "int8" }
- module_name: "Linear"
weights: { dtype: "int8" }
activations: { dtype: "int8" }
4.4. Task-specific notes#
Classification: No extra dataset fields are needed beyond your usual evaluation/inference configurations.
RT-DETR + ModelOpt: Ensure you have a representative validation set configured for calibration; evaluation inputs are reused.