nemo_automodel.components.quantization.fp8

Module Contents

Classes

Name	Description
`FP8Config`	Configuration for FP8 quantization settings.

Functions

Name	Description
`_has_cuda_capability`	Check if CUDA device has required compute capability.
`_module_filter_fn`	Filter function to exclude certain modules from FP8 conversion.
`apply_fp8_to_model`	Apply FP8 quantization to a PyTorch model using torchao.
`build_fp8_config`	Build a FP8 config from configuration.
`create_fp8_config_from_dict`	Create a FP8Config from a dictionary.
`verify_fp8_conversion`	Verify that FP8 conversion was successful by counting converted modules.

Data

HAVE_TORCHAO

logger

API

class nemo_automodel.components.quantization.fp8.FP8Config(
    enabled: bool = False,
    recipe_name: typing.Optional[typing.Literal['tensorwise', 'rowwise', 'rowwise_with_gw_hp']] = None,
    enable_fsdp_float8_all_gather: bool = False,
    precompute_float8_dynamic_scale_for_fsdp: bool = False,
    force_recompute_fp8_weight_in_bwd: bool = False,
    filter_fqns: typing.List[str] = None,
    emulate: bool = False
)

Dataclass

Configuration for FP8 quantization settings.

filter_fqns

List[str] = filter_fqns or []

List of fully qualified names of modules to skip applying float8 training to. nn.Linear modules with any dim size not divisible by 16 are always skipped due to hardware requirements. Example: [“attention.wq”, “attention.wk”, “attention.wv”, “lm_head”]

nemo_automodel.components.quantization.fp8.FP8Config.to_dict()

nemo_automodel.components.quantization.fp8._has_cuda_capability(
    major: int,
    minor: int
) -> bool

Check if CUDA device has required compute capability.

nemo_automodel.components.quantization.fp8._module_filter_fn(
    module,
    name,
    filter_fqns: typing.List[str] = None
)

Filter function to exclude certain modules from FP8 conversion.

Parameters:

module

The module to check

name

Fully qualified name of the module

filter_fqns

List[str]Defaults to None

List of FQNs to filter out

Returns:

True if module should be converted to FP8, False otherwise

nemo_automodel.components.quantization.fp8.apply_fp8_to_model(
    model: torch.nn.Module,
    config: typing.Optional[nemo_automodel.components.quantization.fp8.FP8Config] = None,
    filter_fqns: typing.Optional[typing.List[str]] = None,
    recipe_name: typing.Optional[str] = None,
    force_recompute_fp8_weight_in_bwd: bool = False,
    enable_fsdp_float8_all_gather: bool = False,
    emulate: bool = False,
    enabled: bool = True,
    precompute_float8_dynamic_scale_for_fsdp: bool = False
) -> torch.nn.Module

Apply FP8 quantization to a PyTorch model using torchao.

This function can be called in two ways:

With an FP8Config object: apply_fp8_to_model(model, config=fp8_config)
With individual parameters: apply_fp8_to_model(model, filter_fqns=…, recipe_name=…, etc.)

Parameters:

model

nn.Module

The model to convert

config

Optional[FP8Config]Defaults to None

FP8Config object containing all configuration. If provided, individual parameters are ignored.

filter_fqns

Optional[List[str]]Defaults to None

List of module names to exclude from FP8 conversion

recipe_name

Optional[str]Defaults to None

Recipe name for FP8 configuration (“tensorwise”, “rowwise”, etc.)

force_recompute_fp8_weight_in_bwd

boolDefaults to False

Whether to force recompute FP8 weight in backward pass

enable_fsdp_float8_all_gather

boolDefaults to False

Whether to enable FSDP FP8 all-gather

emulate

boolDefaults to False

Use emulation instead of hardware acceleration (for testing on older GPUs)

enabled

boolDefaults to True

Whether FP8 quantization is enabled (only used when config is None)

precompute_float8_dynamic_scale_for_fsdp

boolDefaults to False

Whether to precompute float8 scales dynamically

Returns: nn.Module

The model with FP8 linear layers (modified in-place)

Raises:

ImportError: If torchao is not installed
ValueError: If hardware doesn’t support FP8 and emulation is disabled

nemo_automodel.components.quantization.fp8.build_fp8_config(
    cfg: typing.Optional[typing.Dict[str, typing.Any]]
) -> nemo_automodel.components.quantization.fp8.FP8Config

Build a FP8 config from configuration.

Parameters:

cfg

Optional[Dict[str, Any]]

Configuration dictionary for FP8 quantization.

Returns: FP8Config

FP8Config instance.

nemo_automodel.components.quantization.fp8.create_fp8_config_from_dict(
    config_dict: typing.Dict[str, typing.Any]
) -> nemo_automodel.components.quantization.fp8.FP8Config

Create a FP8Config from a dictionary.

Parameters:

config_dict

Dict[str, Any]

Dictionary containing FP8 configuration.

Returns: FP8Config

FP8Config instance.

nemo_automodel.components.quantization.fp8.verify_fp8_conversion(
    model: torch.nn.Module
) -> dict

Verify that FP8 conversion was successful by counting converted modules.

Parameters:

model

nn.Module

The model to verify

Returns: dict

Dict with conversion statistics

nemo_automodel.components.quantization.fp8.HAVE_TORCHAO = True

nemo_automodel.components.quantization.fp8.logger = logging.getLogger(__name__)