`nemo_automodel.components.quantization.fp8`#

Module Contents#

Classes#

FP8Config

Configuration for FP8 quantization settings.

Functions#

`_has_cuda_capability`	Check if CUDA device has required compute capability.
`_module_filter_fn`	Filter function to exclude certain modules from FP8 conversion.
`apply_fp8_to_model`	Apply FP8 quantization to a PyTorch model using torchao.
`verify_fp8_conversion`	Verify that FP8 conversion was successful by counting converted modules.
`create_fp8_config_from_dict`	Create a FP8Config from a dictionary.
`build_fp8_config`	Build a FP8 config from configuration.

Data#

logger

API#

nemo_automodel.components.quantization.fp8.logger#: ‘getLogger(…)’

class nemo_automodel.components.quantization.fp8.FP8Config( enabled: bool = False, recipe_name: Optional[Literal[tensorwise, rowwise, rowwise_with_gw_hp]] = None, enable_fsdp_float8_all_gather: bool = False, precompute_float8_dynamic_scale_for_fsdp: bool = False, force_recompute_fp8_weight_in_bwd: bool = False, filter_fqns: List[str] = None, emulate: bool = False, )#

Configuration for FP8 quantization settings.

Initialization

enabled: bool#

False

Whether FP8 quantization is enabled.

recipe_name: Optional[Literal[tensorwise, rowwise, rowwise_with_gw_hp]]#

None

FP8 recipe to use. If None, uses tensorwise scaling with manual configuration.

enable_fsdp_float8_all_gather: bool#

False

Whether to enable float8 all-gather in FSDP, recommended for tensorwise scaling.

precompute_float8_dynamic_scale_for_fsdp: bool#

False

Whether to precompute float8 scales dynamically for FSDP, recommended for tensorwise scaling.

force_recompute_fp8_weight_in_bwd: bool#

False

Whether to force the recomputation of FP8 weights during backward pass.

filter_fqns: List[str]#

‘field(…)’

List of fully qualified names of modules to skip applying float8 training to. nn.Linear modules with any dim size not divisible by 16 are always skipped due to hardware requirements. Example: [“attention.wq”, “attention.wk”, “attention.wv”, “lm_head”]

emulate: bool#

False

If True, emulation is used instead of hardware accelerated gemm. This is for test purpose only

classmethod from_config_node(config_node)#: Create FP8Config from a configuration node.

to_dict()#

nemo_automodel.components.quantization.fp8._has_cuda_capability(major: int, minor: int) → bool#: Check if CUDA device has required compute capability.

nemo_automodel.components.quantization.fp8._module_filter_fn(module, name, filter_fqns: List[str] = None)#

Filter function to exclude certain modules from FP8 conversion.

Parameters:

module – The module to check
name – Fully qualified name of the module
filter_fqns – List of FQNs to filter out

Returns:

True if module should be converted to FP8, False otherwise

nemo_automodel.components.quantization.fp8.apply_fp8_to_model( model: torch.nn.Module, config: Optional[nemo_automodel.components.quantization.fp8.FP8Config] = None, filter_fqns: Optional[List[str]] = None, recipe_name: Optional[str] = None, force_recompute_fp8_weight_in_bwd: bool = False, enable_fsdp_float8_all_gather: bool = False, emulate: bool = False, enabled: bool = True, precompute_float8_dynamic_scale_for_fsdp: bool = False, ) → torch.nn.Module#

Apply FP8 quantization to a PyTorch model using torchao.

This function can be called in two ways:

With an FP8Config object: apply_fp8_to_model(model, config=fp8_config)
With individual parameters: apply_fp8_to_model(model, filter_fqns=…, recipe_name=…, etc.)

Parameters:

model – The model to convert
config – FP8Config object containing all configuration. If provided, individual parameters are ignored.
filter_fqns – List of module names to exclude from FP8 conversion
recipe_name – Recipe name for FP8 configuration (“tensorwise”, “rowwise”, etc.)
force_recompute_fp8_weight_in_bwd – Whether to force recompute FP8 weight in backward pass
enable_fsdp_float8_all_gather – Whether to enable FSDP FP8 all-gather
emulate – Use emulation instead of hardware acceleration (for testing on older GPUs)
enabled – Whether FP8 quantization is enabled (only used when config is None)
precompute_float8_dynamic_scale_for_fsdp – Whether to precompute float8 scales dynamically

Returns:

The model with FP8 linear layers (modified in-place)

Raises:

ImportError – If torchao is not installed
ValueError – If hardware doesn’t support FP8 and emulation is disabled

nemo_automodel.components.quantization.fp8.verify_fp8_conversion(model: torch.nn.Module) → dict#

Verify that FP8 conversion was successful by counting converted modules.

Parameters:: model – The model to verify
Returns:: Dict with conversion statistics

nemo_automodel.components.quantization.fp8.create_fp8_config_from_dict( config_dict: Dict[str, Any], ) → nemo_automodel.components.quantization.fp8.FP8Config#

Create a FP8Config from a dictionary.

Parameters:: config_dict – Dictionary containing FP8 configuration.
Returns:: FP8Config instance.

nemo_automodel.components.quantization.fp8.build_fp8_config( cfg: Optional[Dict[str, Any]], ) → nemo_automodel.components.quantization.fp8.FP8Config#

Build a FP8 config from configuration.

Parameters:: cfg – Configuration dictionary for FP8 quantization.
Returns:: FP8Config instance.

nemo_automodel.components.quantization.fp8#

Module Contents#

Classes#

Functions#

Data#

API#

`nemo_automodel.components.quantization.fp8`#