nemo_automodel.components.optim.dion
nemo_automodel.components.optim.dion
Module Contents
Classes
Functions
Data
API
Structural type for the dion-family optimizer configs build_dion_optimizer reads.
Separate model parameters into groups for Dion/Muon optimizers.
Parameters:
The model to optimize.
Base learning rate for matrix params (Muon algorithm).
Optimizer algorithm for scalar params (“adamw” or “lion”).
Weight decay for vector params.
(beta1, beta2) for scalar optimizer.
Epsilon for scalar optimizer.
Learning rate for scalar (vector/bias) params. Defaults to base_lr.
Learning rate for embedding params. Defaults to scalar_lr or base_lr.
Learning rate for lm_head. Defaults to base_lr / sqrt(d_in).
Build the parameter groups and resolve the device mesh for a Dion-family optimizer.
This does not instantiate the optimizer; it returns (param_groups, mesh_kwargs) so the caller (a typed config in
:mod:nemo_automodel.components.optim.optimizer) can assemble its own
constructor kwargs and instantiate the optimizer itself. mesh_kwargs is a
dict that maps mesh_kwarg to the resolved mesh (or is empty when there is
no mesh), ready to splat into the optimizer constructor.
The parameter-grouping settings are read off config: lr,
weight_decay, scalar_opt, scalar_betas, scalar_eps
(required), and the optional scalar_lr, embed_lr, lm_head_lr and
no_compile.
Parameters:
The dion-family config (see :class:_DionFamilyConfig) to read settings from.
Model whose parameters are to be optimized.
Optional DeviceMesh for FSDP/TP. When non-empty it is resolved to a 1-D Dion submesh.
Name of the constructor argument that receives the resolved
mesh ("distributed_mesh" for Muon/Dion2/NorMuon,
"outer_shard_mesh" for legacy Dion). Set to None to never
include the mesh.
Returns: list[dict[str, Any]]
A (param_groups, mesh_kwargs) tuple: the per-group parameter dicts and
Return whether an optimizer factory targets a Dion-family optimizer.