nemo_automodel.components.quantization.qat
nemo_automodel.components.quantization.qat
TorchAO Quantization-Aware Training (QAT) helpers for NeMo-AutoModel.
This module provides:
- QATConfig: Configuration class for QAT settings
- Thin wrappers to instantiate and apply torchao QAT quantizers to models (prepare)
- Toggle fake-quant on/off during training (for delayed fake-quant)
Module Contents
Classes
Functions
Data
API
Configuration for Quantization-Aware Training (QAT).
This config controls how QAT quantizers are instantiated and applied to models. QAT is enabled when this config is provided to from_pretrained/from_config.
Create and return the appropriate QAT quantizer based on config.
Returns:
A torchao QAT quantizer instance (Int8DynActInt4WeightQATQuantizer
Raises:
ValueError: If quantizer_type is not recognized.
Convert config to dictionary.
Return the disable fake-quant function for a given quantizer mode.
Return the enable fake-quant function for a given quantizer mode.
Return a short mode string for a known torchao QAT quantizer.
Returns None when the quantizer is unrecognized.
Apply a torchao QAT quantizer to the given model.
Returns the (possibly wrapped) model and a mode string if recognized.