nemo_automodel.components.optim.scheduler
#
Learning rate decay and weight decay incr functions.
Module Contents#
Classes#
Anneals learning rate and weight decay. |
Data#
API#
- nemo_automodel.components.optim.scheduler.logger#
βgetLogger(β¦)β
- class nemo_automodel.components.optim.scheduler.OptimizerParamScheduler(
- optimizer: torch.optim.optimizer.Optimizer,
- init_lr: float,
- max_lr: float,
- min_lr: float,
- lr_warmup_steps: int,
- lr_decay_steps: int,
- lr_decay_style: str,
- start_wd: float,
- end_wd: float,
- wd_incr_steps: int,
- wd_incr_style: str,
- use_checkpoint_opt_param_scheduler: Optional[bool] = True,
- override_opt_param_scheduler: Optional[bool] = False,
- wsd_decay_steps: Optional[int] = None,
- lr_wsd_decay_style: Optional[str] = None,
Anneals learning rate and weight decay.
- Parameters:
optimizer (Optimizer) β the optimizer to be used
init_lr (float) β initial learning rate
max_lr (float) β maximum learning rate
min_lr (float) β minimum learning rate
lr_warmup_steps (int) β number of warmup steps
lr_decay_steps (int) β number of decay steps
lr_decay_style (str) β decay style for learning rate
start_wd (float) β initial weight decay
end_wd (float) β final weight decay
wd_incr_steps (int) β number of weight decay increment steps
wd_incr_style (str) β weight decay increment style
use_checkpoint_opt_param_scheduler (bool, optional) β whether to use the checkpoint values for the optimizer param scheduler. Defaults to True.
override_opt_param_scheduler (bool, optional) β whether to override the optimizer param scheduler values with the class values. Defaults to False.
wsd_decay_steps (int, optional) β number of weight decay decay steps. Defaults to None.
lr_wsd_decay_style (str, optional) β decay style for learning rate during weight decay decay steps. Defaults to None.
Initialization
Constructor for OptimizerParamScheduler.
- Parameters:
optimizer (Optimizer) β the optimizer to be used
init_lr (float) β initial learning rate
max_lr (float) β maximum learning rate
min_lr (float) β minimum learning rate
lr_warmup_steps (int) β number of warmup steps
lr_decay_steps (int) β number of decay steps
lr_decay_style (str) β decay style for learning rate
start_wd (float) β initial weight decay
end_wd (float) β final weight decay
wd_incr_steps (int) β number of weight decay increment steps
wd_incr_style (str) β weight decay increment style
use_checkpoint_opt_param_scheduler (bool, optional) β whether to use the checkpoint values for the optimizer param scheduler. Defaults to True.
override_opt_param_scheduler (bool, optional) β whether to override the optimizer param scheduler values with the class values. Defaults to False.
wsd_decay_steps (int, optional) β number of weight decay decay steps. Defaults to None.
lr_wsd_decay_style (str, optional) β decay style for learning rate during weight decay decay steps. Defaults to None.
- __repr__() str #
Return a string representation of the OptimizerParamScheduler.
- get_wd() float #
Weight decay incr functions.
- get_lr(param_group: dict) float #
Learning rate decay functions from: https://openreview.net/pdf?id=BJYwwY9ll pg. 4.
Argsa: param_group (dict): parameter group from the optimizer.
- step(increment: int) None #
Set lr for all parameters groups.
- Parameters:
increment (int) β number of steps to increment
- state_dict() dict #
Return the state dict.
- _check_and_set(cls_value: float, sd_value: float, name: str) float #
Auxiliary function for checking the values in the checkpoint and setting them.
- Parameters:
cls_value (float) β class value
sd_value (float) β checkpoint value
name (str) β name of the parameter
- load_state_dict(state_dict: dict) None #
Load the state dict.
- Parameters:
state_dict (dict) β state dict to be load