nemo_automodel.components.quantization.qat#
TorchAO Quantization-Aware Training (QAT) helpers for NeMo-AutoModel.
This module provides:
QATConfig: Configuration class for QAT settings
Thin wrappers to instantiate and apply torchao QAT quantizers to models (prepare)
Toggle fake-quant on/off during training (for delayed fake-quant)
Module Contents#
Classes#
Configuration for Quantization-Aware Training (QAT). |
Functions#
Return a short mode string for a known torchao QAT quantizer. |
|
Return the disable fake-quant function for a given quantizer mode. |
|
Return the enable fake-quant function for a given quantizer mode. |
|
Apply a torchao QAT quantizer to the given model. |
Data#
API#
- nemo_automodel.components.quantization.qat.logger#
âgetLogger(âŠ)â
- class nemo_automodel.components.quantization.qat.QATConfig(
- quantizer_type: Literal[int8_dynact_int4weight, int4_weight_only] = 'int8_dynact_int4weight',
- **quantizer_kwargs,
Configuration for Quantization-Aware Training (QAT).
This config controls how QAT quantizers are instantiated and applied to models. QAT is enabled when this config is provided to from_pretrained/from_config.
.. attribute:: quantizer_type
Type of QAT quantizer to use.
âint8_dynact_int4weightâ: Int8 dynamic activation with Int4 weight quantization. Uses Int8DynActInt4WeightQATQuantizer from torchao. Good balance of accuracy and inference speed.
âint4_weight_onlyâ: Int4 weight-only quantization. Uses Int4WeightOnlyQATQuantizer from torchao. More aggressive compression, may have slightly lower accuracy.
- Type:
Literal[âint8_dynact_int4weightâ, âint4_weight_onlyâ]
.. attribute:: **quantizer_kwargs
Additional keyword arguments forwarded directly to the torchao quantizer constructor (e.g. groupsize, padding_allowed, inner_k_tiles).
Initialization
- quantizer_type: Literal[int8_dynact_int4weight, int4_weight_only]#
âint8_dynact_int4weightâ
- to_dict() Dict[str, Any]#
Convert config to dictionary.
- create_quantizer()#
Create and return the appropriate QAT quantizer based on config.
- Returns:
A torchao QAT quantizer instance (Int8DynActInt4WeightQATQuantizer or Int4WeightOnlyQATQuantizer).
- Raises:
ValueError â If quantizer_type is not recognized.
- nemo_automodel.components.quantization.qat._QUANTIZER_TO_MODE#
None
- nemo_automodel.components.quantization.qat._DISABLE_FN_BY_MODE#
None
- nemo_automodel.components.quantization.qat._ENABLE_FN_BY_MODE#
None
- nemo_automodel.components.quantization.qat.get_quantizer_mode(quantizer: object) Optional[str]#
Return a short mode string for a known torchao QAT quantizer.
Returns None when the quantizer is unrecognized.
- nemo_automodel.components.quantization.qat.get_disable_fake_quant_fn(
- mode: str,
Return the disable fake-quant function for a given quantizer mode.
- nemo_automodel.components.quantization.qat.get_enable_fake_quant_fn(
- mode: str,
Return the enable fake-quant function for a given quantizer mode.
- nemo_automodel.components.quantization.qat.prepare_qat_model(
- model,
- quantizer,
Apply a torchao QAT quantizer to the given model.
Returns the (possibly wrapped) model and a mode string if recognized.
- nemo_automodel.components.quantization.qat.__all__#
[âQATConfigâ, âget_quantizer_modeâ, âget_disable_fake_quant_fnâ, âget_enable_fake_quant_fnâ, âpreparâŠ