nemo_automodel.components.quantization.qat#

TorchAO Quantization-Aware Training (QAT) helpers for NeMo-AutoModel.

This module provides:

  • QATConfig: Configuration class for QAT settings

  • Thin wrappers to instantiate and apply torchao QAT quantizers to models (prepare)

  • Toggle fake-quant on/off during training (for delayed fake-quant)

Module Contents#

Classes#

QATConfig

Configuration for Quantization-Aware Training (QAT).

Functions#

get_quantizer_mode

Return a short mode string for a known torchao QAT quantizer.

get_disable_fake_quant_fn

Return the disable fake-quant function for a given quantizer mode.

get_enable_fake_quant_fn

Return the enable fake-quant function for a given quantizer mode.

prepare_qat_model

Apply a torchao QAT quantizer to the given model.

Data#

API#

nemo_automodel.components.quantization.qat.logger#

‘getLogger(
)’

class nemo_automodel.components.quantization.qat.QATConfig(
quantizer_type: Literal[int8_dynact_int4weight, int4_weight_only] = 'int8_dynact_int4weight',
**quantizer_kwargs,
)#

Configuration for Quantization-Aware Training (QAT).

This config controls how QAT quantizers are instantiated and applied to models. QAT is enabled when this config is provided to from_pretrained/from_config.

.. attribute:: quantizer_type

Type of QAT quantizer to use.

  • “int8_dynact_int4weight”: Int8 dynamic activation with Int4 weight quantization. Uses Int8DynActInt4WeightQATQuantizer from torchao. Good balance of accuracy and inference speed.

  • “int4_weight_only”: Int4 weight-only quantization. Uses Int4WeightOnlyQATQuantizer from torchao. More aggressive compression, may have slightly lower accuracy.

Type:

Literal[“int8_dynact_int4weight”, “int4_weight_only”]

.. attribute:: **quantizer_kwargs

Additional keyword arguments forwarded directly to the torchao quantizer constructor (e.g. groupsize, padding_allowed, inner_k_tiles).

Initialization

quantizer_type: Literal[int8_dynact_int4weight, int4_weight_only]#

‘int8_dynact_int4weight’

to_dict() Dict[str, Any]#

Convert config to dictionary.

create_quantizer()#

Create and return the appropriate QAT quantizer based on config.

Returns:

A torchao QAT quantizer instance (Int8DynActInt4WeightQATQuantizer or Int4WeightOnlyQATQuantizer).

Raises:

ValueError – If quantizer_type is not recognized.

nemo_automodel.components.quantization.qat._QUANTIZER_TO_MODE#

None

nemo_automodel.components.quantization.qat._DISABLE_FN_BY_MODE#

None

nemo_automodel.components.quantization.qat._ENABLE_FN_BY_MODE#

None

nemo_automodel.components.quantization.qat.get_quantizer_mode(quantizer: object) Optional[str]#

Return a short mode string for a known torchao QAT quantizer.

Returns None when the quantizer is unrecognized.

nemo_automodel.components.quantization.qat.get_disable_fake_quant_fn(
mode: str,
) Optional[Callable]#

Return the disable fake-quant function for a given quantizer mode.

nemo_automodel.components.quantization.qat.get_enable_fake_quant_fn(
mode: str,
) Optional[Callable]#

Return the enable fake-quant function for a given quantizer mode.

nemo_automodel.components.quantization.qat.prepare_qat_model(
model,
quantizer,
) tuple[object, Optional[str]]#

Apply a torchao QAT quantizer to the given model.

Returns the (possibly wrapped) model and a mode string if recognized.

nemo_automodel.components.quantization.qat.__all__#

[‘QATConfig’, ‘get_quantizer_mode’, ‘get_disable_fake_quant_fn’, ‘get_enable_fake_quant_fn’, ‘prepar