`nemo_automodel.components.quantization.qat`#

TorchAO Quantization-Aware Training (QAT) helpers for NeMo-AutoModel.

This module provides:

QATConfig: Configuration class for QAT settings
Thin wrappers to instantiate and apply torchao QAT quantizers to models (prepare)
Toggle fake-quant on/off during training (for delayed fake-quant)

Module Contents#

Classes#

QATConfig

Configuration for Quantization-Aware Training (QAT).

Functions#

`get_quantizer_mode`	Return a short mode string for a known torchao QAT quantizer.
`get_disable_fake_quant_fn`	Return the disable fake-quant function for a given quantizer mode.
`get_enable_fake_quant_fn`	Return the enable fake-quant function for a given quantizer mode.
`prepare_qat_model`	Apply a torchao QAT quantizer to the given model.

Data#

`logger`
`_QUANTIZER_TO_MODE`
`_DISABLE_FN_BY_MODE`
`_ENABLE_FN_BY_MODE`
`__all__`

API#

nemo_automodel.components.quantization.qat.logger#: ‘getLogger(…)’

class nemo_automodel.components.quantization.qat.QATConfig(

quantizer_type: Literal[int8_dynact_int4weight, int4_weight_only] = 'int8_dynact_int4weight',

**quantizer_kwargs,

)#

Configuration for Quantization-Aware Training (QAT).

This config controls how QAT quantizers are instantiated and applied to models. QAT is enabled when this config is provided to from_pretrained/from_config.

.. attribute:: quantizer_type

Type of QAT quantizer to use.

“int8_dynact_int4weight”: Int8 dynamic activation with Int4 weight quantization. Uses Int8DynActInt4WeightQATQuantizer from torchao. Good balance of accuracy and inference speed.
“int4_weight_only”: Int4 weight-only quantization. Uses Int4WeightOnlyQATQuantizer from torchao. More aggressive compression, may have slightly lower accuracy.

Type:: Literal[“int8_dynact_int4weight”, “int4_weight_only”]

.. attribute:: **quantizer_kwargs

Additional keyword arguments forwarded directly to the torchao quantizer constructor (e.g. groupsize, padding_allowed, inner_k_tiles).

Initialization

quantizer_type: Literal[int8_dynact_int4weight, int4_weight_only]#: ‘int8_dynact_int4weight’

to_dict() → Dict[str, Any]#: Convert config to dictionary.

create_quantizer()#

Create and return the appropriate QAT quantizer based on config.

Returns:: A torchao QAT quantizer instance (Int8DynActInt4WeightQATQuantizer or Int4WeightOnlyQATQuantizer).
Raises:: ValueError – If quantizer_type is not recognized.

nemo_automodel.components.quantization.qat._QUANTIZER_TO_MODE#: None

nemo_automodel.components.quantization.qat._DISABLE_FN_BY_MODE#: None

nemo_automodel.components.quantization.qat._ENABLE_FN_BY_MODE#: None

nemo_automodel.components.quantization.qat.get_quantizer_mode(quantizer: object) → Optional[str]#

Return a short mode string for a known torchao QAT quantizer.

Returns None when the quantizer is unrecognized.

nemo_automodel.components.quantization.qat.get_disable_fake_quant_fn( mode: str, ) → Optional[Callable]#: Return the disable fake-quant function for a given quantizer mode.

nemo_automodel.components.quantization.qat.get_enable_fake_quant_fn( mode: str, ) → Optional[Callable]#: Return the enable fake-quant function for a given quantizer mode.

nemo_automodel.components.quantization.qat.prepare_qat_model( model, quantizer, ) → tuple[object, Optional[str]]#

Apply a torchao QAT quantizer to the given model.

Returns the (possibly wrapped) model and a mode string if recognized.

nemo_automodel.components.quantization.qat.__all__#: [‘QATConfig’, ‘get_quantizer_mode’, ‘get_disable_fake_quant_fn’, ‘get_enable_fake_quant_fn’, ‘prepar…

nemo_automodel.components.quantization.qat#