nemo_automodel.components.optim.utils#
Module Contents#
Functions#
Separate model parameters into groups for Dion/Muon optimizers. |
|
Build a Dion-family optimizer with parameter grouping. |
Data#
API#
- nemo_automodel.components.optim.utils._import_error: Exception | None#
None
- nemo_automodel.components.optim.utils.logger#
βgetLogger(β¦)β
- nemo_automodel.components.optim.utils.is_dion_optimizer(cfg_opt) bool#
- nemo_automodel.components.optim.utils._separate_param_groups(
- model: torch.nn.Module,
- base_lr: float,
- scalar_opt: str,
- weight_decay: float,
- scalar_betas: tuple[float, float] | None = None,
- scalar_eps: float | None = None,
- scalar_lr: float | None = None,
- embed_lr: float | None = None,
- lm_head_lr: float | None = None,
Separate model parameters into groups for Dion/Muon optimizers.
- Parameters:
model β The model to optimize.
base_lr β Base learning rate for matrix params (Muon algorithm).
scalar_opt β Optimizer algorithm for scalar params (βadamwβ or βlionβ).
weight_decay β Weight decay for vector params.
scalar_betas β (beta1, beta2) for scalar optimizer.
scalar_eps β Epsilon for scalar optimizer.
scalar_lr β Learning rate for scalar (vector/bias) params. Defaults to base_lr.
embed_lr β Learning rate for embedding params. Defaults to scalar_lr or base_lr.
lm_head_lr β Learning rate for lm_head. Defaults to base_lr / sqrt(d_in).
- nemo_automodel.components.optim.utils._get_dion_mesh(distributed_mesh: Any) Any#
- nemo_automodel.components.optim.utils.build_dion_optimizer(
- cfg_opt,
- model: torch.nn.Module,
- distributed_mesh: Optional[Any] = None,
Build a Dion-family optimizer with parameter grouping.
- Parameters:
cfg_opt β ConfigNode for the optimizer.
model β Model whose parameters are to be optimized.
distributed_mesh β Optional DeviceMesh for FSDP/TP.
process_group β Optional ProcessGroup for DDP.