nemo_automodel.components.quantization.fp8
#
Module Contents#
Classes#
Configuration for FP8 quantization settings. |
Functions#
Check if CUDA device has required compute capability. |
|
Filter function to exclude certain modules from FP8 conversion. |
|
Apply FP8 quantization to a PyTorch model using torchao. |
|
Verify that FP8 conversion was successful by counting converted modules. |
Data#
API#
- nemo_automodel.components.quantization.fp8.logger#
‘getLogger(…)’
- class nemo_automodel.components.quantization.fp8.FP8Config[source]#
Configuration for FP8 quantization settings.
- recipe_name: Optional[Literal[tensorwise, rowwise, rowwise_with_gw_hp]]#
None
FP8 recipe to use. If None, uses tensorwise scaling with manual configuration.
- enable_fsdp_float8_all_gather: bool#
False
Whether to enable float8 all-gather in FSDP, recommended for tensorwise scaling.
- precompute_float8_dynamic_scale_for_fsdp: bool#
False
Whether to precompute float8 scales dynamically for FSDP, recommended for tensorwise scaling.
- force_recompute_fp8_weight_in_bwd: bool#
False
Whether to force the recomputation of FP8 weights during backward pass.
- filter_fqns: List[str]#
‘field(…)’
List of fully qualified names of modules to skip applying float8 training to. nn.Linear modules with any dim size not divisible by 16 are always skipped due to hardware requirements. Example: [“attention.wq”, “attention.wk”, “attention.wv”, “lm_head”]
- emulate: bool#
False
If True, emulation is used instead of hardware accelerated gemm. This is for test purpose only
- nemo_automodel.components.quantization.fp8._has_cuda_capability(major: int, minor: int) bool [source]#
Check if CUDA device has required compute capability.
- nemo_automodel.components.quantization.fp8._module_filter_fn(module, name, filter_fqns: List[str] = None)[source]#
Filter function to exclude certain modules from FP8 conversion.
- Parameters:
module – The module to check
name – Fully qualified name of the module
filter_fqns – List of FQNs to filter out
- Returns:
True if module should be converted to FP8, False otherwise
- nemo_automodel.components.quantization.fp8.apply_fp8_to_model(
- model: torch.nn.Module,
- filter_fqns: Optional[List[str]] = None,
- recipe_name: Optional[str] = None,
- force_recompute_fp8_weight_in_bwd: bool = False,
- enable_fsdp_float8_all_gather: bool = False,
- emulate: bool = False,
Apply FP8 quantization to a PyTorch model using torchao.
- Parameters:
model – The model to convert
filter_fqns – List of module names to exclude from FP8 conversion
recipe_name – Recipe name for FP8 configuration (“tensorwise”, “rowwise”, etc.)
force_recompute_fp8_weight_in_bwd – Whether to force recompute FP8 weight in backward pass
enable_fsdp_float8_all_gather – Whether to enable FSDP FP8 all-gather
emulate – Use emulation instead of hardware acceleration (for testing on older GPUs)
- Returns:
The model with FP8 linear layers (modified in-place)
- Raises:
ImportError – If torchao is not installed
ValueError – If hardware doesn’t support FP8 and emulation is disabled