> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.optim.dion

## Module Contents

### Classes

| Name                                                                           | Description                                                                         |
| ------------------------------------------------------------------------------ | ----------------------------------------------------------------------------------- |
| [`_DionFamilyConfig`](#nemo_automodel-components-optim-dion-_DionFamilyConfig) | Structural type for the dion-family optimizer configs build\_dion\_optimizer reads. |

### Functions

| Name                                                                                     | Description                                                              |
| ---------------------------------------------------------------------------------------- | ------------------------------------------------------------------------ |
| [`_get_dion_mesh`](#nemo_automodel-components-optim-dion-_get_dion_mesh)                 | -                                                                        |
| [`_separate_param_groups`](#nemo_automodel-components-optim-dion-_separate_param_groups) | Separate model parameters into groups for Dion/Muon optimizers.          |
| [`build_dion_optimizer`](#nemo_automodel-components-optim-dion-build_dion_optimizer)     | Build the parameter groups and resolve the device mesh for a Dion-family |
| [`is_dion_optimizer`](#nemo_automodel-components-optim-dion-is_dion_optimizer)           | Return whether an optimizer factory targets a Dion-family optimizer.     |

### Data

[`_import_error`](#nemo_automodel-components-optim-dion-_import_error)

[`logger`](#nemo_automodel-components-optim-dion-logger)

### API

```python
class nemo_automodel.components.optim.dion._DionFamilyConfig()
```

Protocol

Structural type for the dion-family optimizer configs build\_dion\_optimizer reads.

```python
nemo_automodel.components.optim.dion._get_dion_mesh(
    device_mesh: typing.Any
) -> typing.Any
```

```python
nemo_automodel.components.optim.dion._separate_param_groups(
    model: torch.nn.Module,
    base_lr: float,
    scalar_opt: str,
    weight_decay: float,
    scalar_betas: tuple[float, float] | None = None,
    scalar_eps: float | None = None,
    scalar_lr: float | None = None,
    embed_lr: float | None = None,
    lm_head_lr: float | None = None
) -> list[dict[str, typing.Any]]
```

Separate model parameters into groups for Dion/Muon optimizers.

**Parameters:**

The model to optimize.

Base learning rate for matrix params (Muon algorithm).

Optimizer algorithm for scalar params ("adamw" or "lion").

Weight decay for vector params.

(beta1, beta2) for scalar optimizer.

Epsilon for scalar optimizer.

Learning rate for scalar (vector/bias) params. Defaults to base\_lr.

Learning rate for embedding params. Defaults to scalar\_lr or base\_lr.

Learning rate for lm\_head. Defaults to base\_lr / sqrt(d\_in).

```python
nemo_automodel.components.optim.dion.build_dion_optimizer(
    config: '_DionFamilyConfig',
    model: torch.nn.Module,
    device_mesh: typing.Optional[typing.Any] = None,
    mesh_kwarg: str | None = 'distributed_mesh'
) -> tuple[list[dict[str, typing.Any]], dict[str, typing.Any]]
```

Build the parameter groups and resolve the device mesh for a Dion-family
optimizer.

This does not instantiate the optimizer; it returns `(param_groups,
mesh_kwargs)` so the caller (a typed config in
:mod:`nemo_automodel.components.optim.optimizer`) can assemble its own
constructor kwargs and instantiate the optimizer itself.  `mesh_kwargs` is a
dict that maps `mesh_kwarg` to the resolved mesh (or is empty when there is
no mesh), ready to splat into the optimizer constructor.

The parameter-grouping settings are read off `config`: `lr`,
`weight_decay`, `scalar_opt`, `scalar_betas`, `scalar_eps`
(required), and the optional `scalar_lr`, `embed_lr`, `lm_head_lr` and
`no_compile`.

**Parameters:**

The dion-family config (see :class:`_DionFamilyConfig`) to read settings from.

Model whose parameters are to be optimized.

Optional DeviceMesh for FSDP/TP. When non-empty it is
resolved to a 1-D Dion submesh.

Name of the constructor argument that receives the resolved
mesh (`"distributed_mesh"` for Muon/Dion2/NorMuon,
`"outer_shard_mesh"` for legacy Dion).  Set to `None` to never
include the mesh.

**Returns:** `list[dict[str, Any]]`

A `(param_groups, mesh_kwargs)` tuple: the per-group parameter dicts and

```python
nemo_automodel.components.optim.dion.is_dion_optimizer(
    optimizer_factory: typing.Any
) -> bool
```

Return whether an optimizer factory targets a Dion-family optimizer.

```python
nemo_automodel.components.optim.dion._import_error: Exception | None = None
```

```python
nemo_automodel.components.optim.dion.logger = logging.getLogger(__name__)
```