nemo_automodel.components.quantization.fp8

View as Markdown

Module Contents

Classes

NameDescription
FP8ConfigConfiguration for FP8 quantization settings.

Functions

NameDescription
_has_cuda_capabilityCheck if CUDA device has required compute capability.
_module_filter_fnFilter function to exclude certain modules from FP8 conversion.
apply_fp8_to_modelApply FP8 quantization to a PyTorch model using torchao.
build_fp8_configBuild a FP8 config from configuration.
create_fp8_config_from_dictCreate a FP8Config from a dictionary.
verify_fp8_conversionVerify that FP8 conversion was successful by counting converted modules.

Data

HAVE_TORCHAO

logger

API

class nemo_automodel.components.quantization.fp8.FP8Config(
enabled: bool = False,
recipe_name: typing.Optional[typing.Literal['tensorwise', 'rowwise', 'rowwise_with_gw_hp']] = None,
enable_fsdp_float8_all_gather: bool = False,
precompute_float8_dynamic_scale_for_fsdp: bool = False,
force_recompute_fp8_weight_in_bwd: bool = False,
filter_fqns: typing.List[str] = None,
emulate: bool = False
)
Dataclass

Configuration for FP8 quantization settings.

filter_fqns
List[str] = filter_fqns or []

List of fully qualified names of modules to skip applying float8 training to. nn.Linear modules with any dim size not divisible by 16 are always skipped due to hardware requirements. Example: [“attention.wq”, “attention.wk”, “attention.wv”, “lm_head”]

nemo_automodel.components.quantization.fp8.FP8Config.to_dict()
nemo_automodel.components.quantization.fp8._has_cuda_capability(
major: int,
minor: int
) -> bool

Check if CUDA device has required compute capability.

nemo_automodel.components.quantization.fp8._module_filter_fn(
module,
name,
filter_fqns: typing.List[str] = None
)

Filter function to exclude certain modules from FP8 conversion.

Parameters:

module

The module to check

name

Fully qualified name of the module

filter_fqns
List[str]Defaults to None

List of FQNs to filter out

Returns:

True if module should be converted to FP8, False otherwise

nemo_automodel.components.quantization.fp8.apply_fp8_to_model(
model: torch.nn.Module,
config: typing.Optional[nemo_automodel.components.quantization.fp8.FP8Config] = None,
filter_fqns: typing.Optional[typing.List[str]] = None,
recipe_name: typing.Optional[str] = None,
force_recompute_fp8_weight_in_bwd: bool = False,
enable_fsdp_float8_all_gather: bool = False,
emulate: bool = False,
enabled: bool = True,
precompute_float8_dynamic_scale_for_fsdp: bool = False
) -> torch.nn.Module

Apply FP8 quantization to a PyTorch model using torchao.

This function can be called in two ways:

  1. With an FP8Config object: apply_fp8_to_model(model, config=fp8_config)
  2. With individual parameters: apply_fp8_to_model(model, filter_fqns=…, recipe_name=…, etc.)

Parameters:

model
nn.Module

The model to convert

config
Optional[FP8Config]Defaults to None

FP8Config object containing all configuration. If provided, individual parameters are ignored.

filter_fqns
Optional[List[str]]Defaults to None

List of module names to exclude from FP8 conversion

recipe_name
Optional[str]Defaults to None

Recipe name for FP8 configuration (“tensorwise”, “rowwise”, etc.)

force_recompute_fp8_weight_in_bwd
boolDefaults to False

Whether to force recompute FP8 weight in backward pass

enable_fsdp_float8_all_gather
boolDefaults to False

Whether to enable FSDP FP8 all-gather

emulate
boolDefaults to False

Use emulation instead of hardware acceleration (for testing on older GPUs)

enabled
boolDefaults to True

Whether FP8 quantization is enabled (only used when config is None)

precompute_float8_dynamic_scale_for_fsdp
boolDefaults to False

Whether to precompute float8 scales dynamically

Returns: nn.Module

The model with FP8 linear layers (modified in-place)

Raises:

  • ImportError: If torchao is not installed
  • ValueError: If hardware doesn’t support FP8 and emulation is disabled
nemo_automodel.components.quantization.fp8.build_fp8_config(
cfg: typing.Optional[typing.Dict[str, typing.Any]]
) -> nemo_automodel.components.quantization.fp8.FP8Config

Build a FP8 config from configuration.

Parameters:

cfg
Optional[Dict[str, Any]]

Configuration dictionary for FP8 quantization.

Returns: FP8Config

FP8Config instance.

nemo_automodel.components.quantization.fp8.create_fp8_config_from_dict(
config_dict: typing.Dict[str, typing.Any]
) -> nemo_automodel.components.quantization.fp8.FP8Config

Create a FP8Config from a dictionary.

Parameters:

config_dict
Dict[str, Any]

Dictionary containing FP8 configuration.

Returns: FP8Config

FP8Config instance.

nemo_automodel.components.quantization.fp8.verify_fp8_conversion(
model: torch.nn.Module
) -> dict

Verify that FP8 conversion was successful by counting converted modules.

Parameters:

model
nn.Module

The model to verify

Returns: dict

Dict with conversion statistics

nemo_automodel.components.quantization.fp8.HAVE_TORCHAO = True
nemo_automodel.components.quantization.fp8.logger = logging.getLogger(__name__)