nemo_automodel.components.optim.scheduler

View as Markdown

Learning rate decay and weight decay incr functions.

Module Contents

Classes

NameDescription
OptimizerParamSchedulerAnneals learning rate and weight decay.

Data

_T

logger

API

class nemo_automodel.components.optim.scheduler.OptimizerParamScheduler(
optimizer: torch.optim.optimizer.Optimizer,
init_lr: float,
max_lr: float,
min_lr: float,
lr_warmup_steps: int,
lr_decay_steps: int,
lr_decay_style: str,
start_wd: float,
end_wd: float,
wd_incr_steps: int,
wd_incr_style: str,
use_checkpoint_opt_param_scheduler: typing.Optional[bool] = True,
override_opt_param_scheduler: typing.Optional[bool] = False,
wsd_decay_steps: typing.Optional[int] = None,
lr_wsd_decay_style: typing.Optional[str] = None
)

Anneals learning rate and weight decay.

Parameters:

optimizer
Optimizer

the optimizer to be used

init_lr
float

initial learning rate

max_lr
float

maximum learning rate

min_lr
float

minimum learning rate

lr_warmup_steps
int

number of warmup steps

lr_decay_steps
int

number of decay steps

lr_decay_style
str

decay style for learning rate

start_wd
float

initial weight decay

end_wd
float

final weight decay

wd_incr_steps
int

number of weight decay increment steps

wd_incr_style
str

weight decay increment style

use_checkpoint_opt_param_scheduler
boolDefaults to True

whether to use the checkpoint values for the optimizer param scheduler. Defaults to True.

override_opt_param_scheduler
boolDefaults to False

whether to override the optimizer param scheduler values with the class values. Defaults to False.

wsd_decay_steps
intDefaults to None

number of weight decay decay steps. Defaults to None.

lr_wsd_decay_style
strDefaults to None

decay style for learning rate during weight decay decay steps. Defaults to None.

max_lr
= float(max_lr)
num_steps
= 0
nemo_automodel.components.optim.scheduler.OptimizerParamScheduler.__repr__() -> str

Return a string representation of the OptimizerParamScheduler.

nemo_automodel.components.optim.scheduler.OptimizerParamScheduler._check_and_set(
cls_value: nemo_automodel.components.optim.scheduler._T,
sd_value: nemo_automodel.components.optim.scheduler._T,
name: str
) -> nemo_automodel.components.optim.scheduler._T

Auxiliary function for checking the values in the checkpoint and setting them.

Parameters:

cls_value
_T

class value

sd_value
_T

checkpoint value

name
str

name of the parameter

nemo_automodel.components.optim.scheduler.OptimizerParamScheduler.get_lr(
param_group: dict[str, typing.Any]
) -> float

Learning rate decay functions from: https://openreview.net/pdf?id=BJYwwY9ll pg. 4.

nemo_automodel.components.optim.scheduler.OptimizerParamScheduler.get_wd() -> float

Weight decay incr functions.

nemo_automodel.components.optim.scheduler.OptimizerParamScheduler.load_state_dict(
state_dict: dict[str, typing.Any]
) -> None

Load the state dict.

Parameters:

state_dict
dict

state dict to be load

nemo_automodel.components.optim.scheduler.OptimizerParamScheduler.state_dict() -> dict[str, typing.Any]

Return the state dict.

nemo_automodel.components.optim.scheduler.OptimizerParamScheduler.step(
increment: int
) -> None

Set lr for all parameters groups.

Parameters:

increment
int

number of steps to increment

nemo_automodel.components.optim.scheduler._T = TypeVar('_T')
nemo_automodel.components.optim.scheduler.logger = logging.getLogger(__name__)