> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.utils.compile_utils

## Module Contents

### Classes

| Name                                                                            | Description                      |
| ------------------------------------------------------------------------------- | -------------------------------- |
| [`CompileConfig`](#nemo_automodel-components-utils-compile_utils-CompileConfig) | Configuration for torch.compile. |

### Functions

| Name                                                                                                                        | Description                                                                                             |
| --------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------- |
| [`apply_flash_attention_compile_fix`](#nemo_automodel-components-utils-compile_utils-apply_flash_attention_compile_fix)     | Apply the Flash Attention + torch.compile compatibility fix.                                            |
| [`build_compile_config`](#nemo_automodel-components-utils-compile_utils-build_compile_config)                               | Build a compile config from configuration.                                                              |
| [`compile_model`](#nemo_automodel-components-utils-compile_utils-compile_model)                                             | Compile the model with Flash Attention compatibility.                                                   |
| [`configure_torch_dynamo`](#nemo_automodel-components-utils-compile_utils-configure_torch_dynamo)                           | Configure torch.\_dynamo settings for compilation.                                                      |
| [`create_compile_config_from_dict`](#nemo_automodel-components-utils-compile_utils-create_compile_config_from_dict)         | Create a CompileConfig from a dictionary.                                                               |
| [`enable_torch_dynamo_scalar_outputs`](#nemo_automodel-components-utils-compile_utils-enable_torch_dynamo_scalar_outputs)   | Enable torch.dynamo to capture scalar outputs for better Flash Attention + torch.compile compatibility. |
| [`patch_prepare_fa2_from_position_ids`](#nemo_automodel-components-utils-compile_utils-patch_prepare_fa2_from_position_ids) | Apply a simple targeted patch to fix the prepare\_fa2\_from\_position\_ids function                     |

### Data

[`_FLASH_ATTENTION_FIX_APPLIED`](#nemo_automodel-components-utils-compile_utils-_FLASH_ATTENTION_FIX_APPLIED)

[`logger`](#nemo_automodel-components-utils-compile_utils-logger)

### API

```python
class nemo_automodel.components.utils.compile_utils.CompileConfig(
    enabled: bool = False,
    mode: str = 'default',
    fullgraph: bool = False,
    dynamic: bool = False,
    backend: typing.Optional[str] = None,
    options: typing.Optional[typing.Dict[str, typing.Any]] = None,
    dynamo_cache_size_limit: int = 256
)
```

Dataclass

Configuration for torch.compile.

```python
nemo_automodel.components.utils.compile_utils.CompileConfig.to_dict() -> typing.Dict[str, typing.Any]
```

Convert to dictionary.

```python
nemo_automodel.components.utils.compile_utils.apply_flash_attention_compile_fix()
```

Apply the Flash Attention + torch.compile compatibility fix.

This enables scalar output capture and patches the key function that causes issues.
Note: This function is focused solely on Flash Attention compatibility.
For dynamo configuration (cache size, etc.), use configure\_torch\_dynamo() separately.

```python
nemo_automodel.components.utils.compile_utils.build_compile_config(
    cfg: typing.Optional[typing.Dict[str, typing.Any]]
) -> nemo_automodel.components.utils.compile_utils.CompileConfig
```

Build a compile config from configuration.

**Parameters:**

Configuration dictionary for compilation.

**Returns:** `CompileConfig`

CompileConfig instance.

```python
nemo_automodel.components.utils.compile_utils.compile_model(
    model: torch.nn.Module,
    config: nemo_automodel.components.utils.compile_utils.CompileConfig
) -> torch.nn.Module
```

Compile the model with Flash Attention compatibility.

**Parameters:**

The model to compile.

Compile configuration.

**Returns:** `nn.Module`

The compiled model.

```python
nemo_automodel.components.utils.compile_utils.configure_torch_dynamo(
    cache_size_limit: int = 256,
    capture_scalar_outputs: bool = True
)
```

Configure torch.\_dynamo settings for compilation.

**Parameters:**

Cache size limit for dynamo compilation

Whether to capture scalar outputs for Flash Attention compatibility

```python
nemo_automodel.components.utils.compile_utils.create_compile_config_from_dict(
    config_dict: typing.Dict[str, typing.Any]
) -> nemo_automodel.components.utils.compile_utils.CompileConfig
```

Create a CompileConfig from a dictionary.

**Parameters:**

Dictionary containing compile configuration.

**Returns:** `CompileConfig`

CompileConfig instance.

```python
nemo_automodel.components.utils.compile_utils.enable_torch_dynamo_scalar_outputs()
```

Enable torch.dynamo to capture scalar outputs for better Flash Attention + torch.compile compatibility.

```python
nemo_automodel.components.utils.compile_utils.patch_prepare_fa2_from_position_ids()
```

Apply a simple targeted patch to fix the prepare\_fa2\_from\_position\_ids function
for torch.compile compatibility.

This is the key function that needs the fix for the max\_length computation.

```python
nemo_automodel.components.utils.compile_utils._FLASH_ATTENTION_FIX_APPLIED = apply_flash_attention_compile_fix()
```

```python
nemo_automodel.components.utils.compile_utils.logger = logging.getLogger(__name__)
```