nemo_automodel.components.optim.dion

View as Markdown

Module Contents

Classes

NameDescription
_DionFamilyConfigStructural type for the dion-family optimizer configs build_dion_optimizer reads.

Functions

NameDescription
_get_dion_mesh-
_separate_param_groupsSeparate model parameters into groups for Dion/Muon optimizers.
build_dion_optimizerBuild the parameter groups and resolve the device mesh for a Dion-family
is_dion_optimizerReturn whether an optimizer factory targets a Dion-family optimizer.

Data

_import_error

logger

API

class nemo_automodel.components.optim.dion._DionFamilyConfig()
Protocol

Structural type for the dion-family optimizer configs build_dion_optimizer reads.

lr
float
scalar_betas
tuple[float, float]
scalar_eps
float
scalar_opt
str
weight_decay
float
nemo_automodel.components.optim.dion._get_dion_mesh(
device_mesh: typing.Any
) -> typing.Any
nemo_automodel.components.optim.dion._separate_param_groups(
model: torch.nn.Module,
base_lr: float,
scalar_opt: str,
weight_decay: float,
scalar_betas: tuple[float, float] | None = None,
scalar_eps: float | None = None,
scalar_lr: float | None = None,
embed_lr: float | None = None,
lm_head_lr: float | None = None
) -> list[dict[str, typing.Any]]

Separate model parameters into groups for Dion/Muon optimizers.

Parameters:

model
nn.Module

The model to optimize.

base_lr
float

Base learning rate for matrix params (Muon algorithm).

scalar_opt
str

Optimizer algorithm for scalar params (“adamw” or “lion”).

weight_decay
float

Weight decay for vector params.

scalar_betas
tuple[float, float] | NoneDefaults to None

(beta1, beta2) for scalar optimizer.

scalar_eps
float | NoneDefaults to None

Epsilon for scalar optimizer.

scalar_lr
float | NoneDefaults to None

Learning rate for scalar (vector/bias) params. Defaults to base_lr.

embed_lr
float | NoneDefaults to None

Learning rate for embedding params. Defaults to scalar_lr or base_lr.

lm_head_lr
float | NoneDefaults to None

Learning rate for lm_head. Defaults to base_lr / sqrt(d_in).

nemo_automodel.components.optim.dion.build_dion_optimizer(
config: '_DionFamilyConfig',
model: torch.nn.Module,
device_mesh: typing.Optional[typing.Any] = None,
mesh_kwarg: str | None = 'distributed_mesh'
) -> tuple[list[dict[str, typing.Any]], dict[str, typing.Any]]

Build the parameter groups and resolve the device mesh for a Dion-family optimizer.

This does not instantiate the optimizer; it returns (param_groups, mesh_kwargs) so the caller (a typed config in :mod:nemo_automodel.components.optim.optimizer) can assemble its own constructor kwargs and instantiate the optimizer itself. mesh_kwargs is a dict that maps mesh_kwarg to the resolved mesh (or is empty when there is no mesh), ready to splat into the optimizer constructor.

The parameter-grouping settings are read off config: lr, weight_decay, scalar_opt, scalar_betas, scalar_eps (required), and the optional scalar_lr, embed_lr, lm_head_lr and no_compile.

Parameters:

config
'_DionFamilyConfig'

The dion-family config (see :class:_DionFamilyConfig) to read settings from.

model
nn.Module

Model whose parameters are to be optimized.

device_mesh
Optional[Any]Defaults to None

Optional DeviceMesh for FSDP/TP. When non-empty it is resolved to a 1-D Dion submesh.

mesh_kwarg
str | NoneDefaults to 'distributed_mesh'

Name of the constructor argument that receives the resolved mesh ("distributed_mesh" for Muon/Dion2/NorMuon, "outer_shard_mesh" for legacy Dion). Set to None to never include the mesh.

Returns: list[dict[str, Any]]

A (param_groups, mesh_kwargs) tuple: the per-group parameter dicts and

nemo_automodel.components.optim.dion.is_dion_optimizer(
optimizer_factory: typing.Any
) -> bool

Return whether an optimizer factory targets a Dion-family optimizer.

nemo_automodel.components.optim.dion._import_error: Exception | None = None
nemo_automodel.components.optim.dion.logger = logging.getLogger(__name__)