ModelQuantizationConfig Fields#
Field |
value_type |
description |
default_value |
valid_options |
|---|---|---|---|---|
|
categorical |
The quantization backend to use |
torchao |
modelopt,torchao |
|
categorical |
The quantization mode to use |
weight_only_ptq |
static_ptq,weight_only_ptq |
|
categorical |
Calibration or optimisation algorithm name to pass to the backend configuration. For the ‘modelopt’ backend, this becomes the top-level ‘algorithm’ field |
minmax |
minmax,entropy |
|
categorical |
Default data type for layers (currently ignored by backends; specify dtype per layer) |
native |
int8,fp8_e4m3fn,fp8_e5m2,native |
|
categorical |
Default data type for activations (currently ignored by backends; specify dtype per layer) |
native |
int8,fp8_e4m3fn,fp8_e5m2,native |
|
list |
List of per-module quantization configurations |
[] |
|
|
list |
List of module or layer names or patterns to exclude from quantization |
[] |
|
|
string |
Path to the model to be quantized |
||
|
string |
Path to where all the assets generated from a task are stored |