nemo_automodel.components.quantization.fp8
nemo_automodel.components.quantization.fp8
Module Contents
Classes
Functions
Data
API
Configuration for FP8 quantization settings.
List of fully qualified names of modules to skip applying float8 training to. nn.Linear modules with any dim size not divisible by 16 are always skipped due to hardware requirements. Example: [“attention.wq”, “attention.wk”, “attention.wv”, “lm_head”]
Check if CUDA device has required compute capability.
Filter function to exclude certain modules from FP8 conversion.
Parameters:
The module to check
Fully qualified name of the module
List of FQNs to filter out
Returns:
True if module should be converted to FP8, False otherwise
Apply FP8 quantization to a PyTorch model using torchao.
This function can be called in two ways:
- With an FP8Config object: apply_fp8_to_model(model, config=fp8_config)
- With individual parameters: apply_fp8_to_model(model, filter_fqns=…, recipe_name=…, etc.)
Parameters:
The model to convert
FP8Config object containing all configuration. If provided, individual parameters are ignored.
List of module names to exclude from FP8 conversion
Recipe name for FP8 configuration (“tensorwise”, “rowwise”, etc.)
Whether to force recompute FP8 weight in backward pass
Whether to enable FSDP FP8 all-gather
Use emulation instead of hardware acceleration (for testing on older GPUs)
Whether FP8 quantization is enabled (only used when config is None)
Whether to precompute float8 scales dynamically
Returns: nn.Module
The model with FP8 linear layers (modified in-place)
Raises:
ImportError: If torchao is not installedValueError: If hardware doesn’t support FP8 and emulation is disabled
Build a FP8 config from configuration.
Parameters:
Configuration dictionary for FP8 quantization.
Returns: FP8Config
FP8Config instance.
Create a FP8Config from a dictionary.
Parameters:
Dictionary containing FP8 configuration.
Returns: FP8Config
FP8Config instance.
Verify that FP8 conversion was successful by counting converted modules.
Parameters:
The model to verify
Returns: dict
Dict with conversion statistics