nemo_automodel.optim.scheduler#

Learning rate decay and weight decay incr functions.

Module Contents#

Classes#

OptimizerParamScheduler

Anneals learning rate and weight decay.

Data#

API#

nemo_automodel.optim.scheduler.logger#

β€˜getLogger(…)’

class nemo_automodel.optim.scheduler.OptimizerParamScheduler(
optimizer: torch.optim.optimizer.Optimizer,
init_lr: float,
max_lr: float,
min_lr: float,
lr_warmup_steps: int,
lr_decay_steps: int,
lr_decay_style: str,
start_wd: float,
end_wd: float,
wd_incr_steps: int,
wd_incr_style: str,
use_checkpoint_opt_param_scheduler: Optional[bool] = True,
override_opt_param_scheduler: Optional[bool] = False,
wsd_decay_steps: Optional[int] = None,
lr_wsd_decay_style: Optional[str] = None,
)[source]#

Anneals learning rate and weight decay.

Parameters:
  • optimizer (Optimizer) – the optimizer to be used

  • init_lr (float) – initial learning rate

  • max_lr (float) – maximum learning rate

  • min_lr (float) – minimum learning rate

  • lr_warmup_steps (int) – number of warmup steps

  • lr_decay_steps (int) – number of decay steps

  • lr_decay_style (str) – decay style for learning rate

  • start_wd (float) – initial weight decay

  • end_wd (float) – final weight decay

  • wd_incr_steps (int) – number of weight decay increment steps

  • wd_incr_style (str) – weight decay increment style

  • use_checkpoint_opt_param_scheduler (bool, optional) – whether to use the checkpoint values for the optimizer param scheduler. Defaults to True.

  • override_opt_param_scheduler (bool, optional) – whether to override the optimizer param scheduler values with the class values. Defaults to False.

  • wsd_decay_steps (int, optional) – number of weight decay decay steps. Defaults to None.

  • lr_wsd_decay_style (str, optional) – decay style for learning rate during weight decay decay steps. Defaults to None.

Initialization

Constructor for OptimizerParamScheduler.

Parameters:
  • optimizer (Optimizer) – the optimizer to be used

  • init_lr (float) – initial learning rate

  • max_lr (float) – maximum learning rate

  • min_lr (float) – minimum learning rate

  • lr_warmup_steps (int) – number of warmup steps

  • lr_decay_steps (int) – number of decay steps

  • lr_decay_style (str) – decay style for learning rate

  • start_wd (float) – initial weight decay

  • end_wd (float) – final weight decay

  • wd_incr_steps (int) – number of weight decay increment steps

  • wd_incr_style (str) – weight decay increment style

  • use_checkpoint_opt_param_scheduler (bool, optional) – whether to use the checkpoint values for the optimizer param scheduler. Defaults to True.

  • override_opt_param_scheduler (bool, optional) – whether to override the optimizer param scheduler values with the class values. Defaults to False.

  • wsd_decay_steps (int, optional) – number of weight decay decay steps. Defaults to None.

  • lr_wsd_decay_style (str, optional) – decay style for learning rate during weight decay decay steps. Defaults to None.

get_wd() float[source]#

Weight decay incr functions.

get_lr(param_group: dict) float[source]#

Learning rate decay functions from: https://openreview.net/pdf?id=BJYwwY9ll pg. 4.

Argsa: param_group (dict): parameter group from the optimizer.

step(increment: int) None[source]#

Set lr for all parameters groups.

Parameters:

increment (int) – number of steps to increment

state_dict() dict[source]#

Return the state dict.

_check_and_set(cls_value: float, sd_value: float, name: str) float[source]#

Auxiliary function for checking the values in the checkpoint and setting them.

Parameters:
  • cls_value (float) – class value

  • sd_value (float) – checkpoint value

  • name (str) – name of the parameter

load_state_dict(state_dict: dict) None[source]#

Load the state dict.

Parameters:

state_dict (dict) – state dict to be load