> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.quantization.fp8

## Module Contents

### Classes

| Name                                                                 | Description                                  |
| -------------------------------------------------------------------- | -------------------------------------------- |
| [`FP8Config`](#nemo_automodel-components-quantization-fp8-FP8Config) | Configuration for FP8 quantization settings. |

### Functions

| Name                                                                                                     | Description                                                              |
| -------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------ |
| [`_has_cuda_capability`](#nemo_automodel-components-quantization-fp8-_has_cuda_capability)               | Check if CUDA device has required compute capability.                    |
| [`_module_filter_fn`](#nemo_automodel-components-quantization-fp8-_module_filter_fn)                     | Filter function to exclude certain modules from FP8 conversion.          |
| [`apply_fp8_to_model`](#nemo_automodel-components-quantization-fp8-apply_fp8_to_model)                   | Apply FP8 quantization to a PyTorch model using torchao.                 |
| [`build_fp8_config`](#nemo_automodel-components-quantization-fp8-build_fp8_config)                       | Build a FP8 config from configuration.                                   |
| [`create_fp8_config_from_dict`](#nemo_automodel-components-quantization-fp8-create_fp8_config_from_dict) | Create a FP8Config from a dictionary.                                    |
| [`verify_fp8_conversion`](#nemo_automodel-components-quantization-fp8-verify_fp8_conversion)             | Verify that FP8 conversion was successful by counting converted modules. |

### Data

[`HAVE_TORCHAO`](#nemo_automodel-components-quantization-fp8-HAVE_TORCHAO)

[`logger`](#nemo_automodel-components-quantization-fp8-logger)

### API

```python
class nemo_automodel.components.quantization.fp8.FP8Config(
    enabled: bool = False,
    recipe_name: typing.Optional[typing.Literal['tensorwise', 'rowwise', 'rowwise_with_gw_hp']] = None,
    enable_fsdp_float8_all_gather: bool = False,
    precompute_float8_dynamic_scale_for_fsdp: bool = False,
    force_recompute_fp8_weight_in_bwd: bool = False,
    filter_fqns: typing.List[str] = None,
    emulate: bool = False
)
```

Dataclass

Configuration for FP8 quantization settings.

List of fully qualified names of modules to skip applying float8 training to.
nn.Linear modules with any dim size not divisible by 16 are always skipped due to hardware requirements.
Example: \["attention.wq", "attention.wk", "attention.wv", "lm\_head"]

```python
nemo_automodel.components.quantization.fp8.FP8Config.to_dict()
```

```python
nemo_automodel.components.quantization.fp8._has_cuda_capability(
    major: int,
    minor: int
) -> bool
```

Check if CUDA device has required compute capability.

```python
nemo_automodel.components.quantization.fp8._module_filter_fn(
    module,
    name,
    filter_fqns: typing.List[str] = None
)
```

Filter function to exclude certain modules from FP8 conversion.

**Parameters:**

The module to check

Fully qualified name of the module

List of FQNs to filter out

**Returns:**

True if module should be converted to FP8, False otherwise

```python
nemo_automodel.components.quantization.fp8.apply_fp8_to_model(
    model: torch.nn.Module,
    config: typing.Optional[nemo_automodel.components.quantization.fp8.FP8Config] = None,
    filter_fqns: typing.Optional[typing.List[str]] = None,
    recipe_name: typing.Optional[str] = None,
    force_recompute_fp8_weight_in_bwd: bool = False,
    enable_fsdp_float8_all_gather: bool = False,
    emulate: bool = False,
    enabled: bool = True,
    precompute_float8_dynamic_scale_for_fsdp: bool = False
) -> torch.nn.Module
```

Apply FP8 quantization to a PyTorch model using torchao.

This function can be called in two ways:

1. With an FP8Config object: apply\_fp8\_to\_model(model, config=fp8\_config)
2. With individual parameters: apply\_fp8\_to\_model(model, filter\_fqns=..., recipe\_name=..., etc.)

**Parameters:**

The model to convert

FP8Config object containing all configuration. If provided, individual
parameters are ignored.

List of module names to exclude from FP8 conversion

Recipe name for FP8 configuration ("tensorwise", "rowwise", etc.)

Whether to force recompute FP8 weight in backward pass

Whether to enable FSDP FP8 all-gather

Use emulation instead of hardware acceleration (for testing on older GPUs)

Whether FP8 quantization is enabled (only used when config is None)

Whether to precompute float8 scales dynamically

**Returns:** `nn.Module`

The model with FP8 linear layers (modified in-place)

**Raises:**

* `ImportError`: If torchao is not installed
* `ValueError`: If hardware doesn't support FP8 and emulation is disabled

```python
nemo_automodel.components.quantization.fp8.build_fp8_config(
    cfg: typing.Optional[typing.Dict[str, typing.Any]]
) -> nemo_automodel.components.quantization.fp8.FP8Config
```

Build a FP8 config from configuration.

**Parameters:**

Configuration dictionary for FP8 quantization.

**Returns:** `FP8Config`

FP8Config instance.

```python
nemo_automodel.components.quantization.fp8.create_fp8_config_from_dict(
    config_dict: typing.Dict[str, typing.Any]
) -> nemo_automodel.components.quantization.fp8.FP8Config
```

Create a FP8Config from a dictionary.

**Parameters:**

Dictionary containing FP8 configuration.

**Returns:** `FP8Config`

FP8Config instance.

```python
nemo_automodel.components.quantization.fp8.verify_fp8_conversion(
    model: torch.nn.Module
) -> dict
```

Verify that FP8 conversion was successful by counting converted modules.

**Parameters:**

The model to verify

**Returns:** `dict`

Dict with conversion statistics

```python
nemo_automodel.components.quantization.fp8.HAVE_TORCHAO = True
```

```python
nemo_automodel.components.quantization.fp8.logger = logging.getLogger(__name__)
```