`nemo_automodel.optim.scheduler`#

Learning rate decay and weight decay incr functions.

Module Contents#

Classes#

OptimizerParamScheduler

Anneals learning rate and weight decay.

Data#

logger

API#

nemo_automodel.optim.scheduler.logger#: ‘getLogger(…)’

class nemo_automodel.optim.scheduler.OptimizerParamScheduler( optimizer: torch.optim.optimizer.Optimizer, init_lr: float, max_lr: float, min_lr: float, lr_warmup_steps: int, lr_decay_steps: int, lr_decay_style: str, start_wd: float, end_wd: float, wd_incr_steps: int, wd_incr_style: str, use_checkpoint_opt_param_scheduler: Optional[bool] = True, override_opt_param_scheduler: Optional[bool] = False, wsd_decay_steps: Optional[int] = None, lr_wsd_decay_style: Optional[str] = None, )[source]#

Anneals learning rate and weight decay.

Parameters:

optimizer (Optimizer) – the optimizer to be used
init_lr (float) – initial learning rate
max_lr (float) – maximum learning rate
min_lr (float) – minimum learning rate
lr_warmup_steps (int) – number of warmup steps
lr_decay_steps (int) – number of decay steps
lr_decay_style (str) – decay style for learning rate
start_wd (float) – initial weight decay
end_wd (float) – final weight decay
wd_incr_steps (int) – number of weight decay increment steps
wd_incr_style (str) – weight decay increment style
use_checkpoint_opt_param_scheduler (bool, optional) – whether to use the checkpoint values for the optimizer param scheduler. Defaults to True.
override_opt_param_scheduler (bool, optional) – whether to override the optimizer param scheduler values with the class values. Defaults to False.
wsd_decay_steps (int, optional) – number of weight decay decay steps. Defaults to None.
lr_wsd_decay_style (str, optional) – decay style for learning rate during weight decay decay steps. Defaults to None.

Initialization

Constructor for OptimizerParamScheduler.

Parameters:

optimizer (Optimizer) – the optimizer to be used
init_lr (float) – initial learning rate
max_lr (float) – maximum learning rate
min_lr (float) – minimum learning rate
lr_warmup_steps (int) – number of warmup steps
lr_decay_steps (int) – number of decay steps
lr_decay_style (str) – decay style for learning rate
start_wd (float) – initial weight decay
end_wd (float) – final weight decay
wd_incr_steps (int) – number of weight decay increment steps
wd_incr_style (str) – weight decay increment style
use_checkpoint_opt_param_scheduler (bool, optional) – whether to use the checkpoint values for the optimizer param scheduler. Defaults to True.
override_opt_param_scheduler (bool, optional) – whether to override the optimizer param scheduler values with the class values. Defaults to False.
wsd_decay_steps (int, optional) – number of weight decay decay steps. Defaults to None.
lr_wsd_decay_style (str, optional) – decay style for learning rate during weight decay decay steps. Defaults to None.

get_wd() → float[source]#: Weight decay incr functions.

get_lr(param_group: dict) → float[source]#

Learning rate decay functions from: https://openreview.net/pdf?id=BJYwwY9ll pg. 4.

Argsa: param_group (dict): parameter group from the optimizer.

step(increment: int) → None[source]#

Set lr for all parameters groups.

Parameters:: increment (int) – number of steps to increment

state_dict() → dict[source]#: Return the state dict.

_check_and_set(cls_value: float, sd_value: float, name: str) → float[source]#

Auxiliary function for checking the values in the checkpoint and setting them.

Parameters:

cls_value (float) – class value
sd_value (float) – checkpoint value
name (str) – name of the parameter

load_state_dict(state_dict: dict) → None[source]#

Load the state dict.

Parameters:: state_dict (dict) – state dict to be load

nemo_automodel.optim.scheduler#

Module Contents#

Classes#

Data#

API#

`nemo_automodel.optim.scheduler`#