nemo_automodel.components.optim.scheduler

Learning rate decay and weight decay incr functions.

Module Contents

Classes

Name	Description
`OptimizerParamScheduler`	Anneals learning rate and weight decay.

Data

_T

logger

API

class nemo_automodel.components.optim.scheduler.OptimizerParamScheduler(
    optimizer: torch.optim.optimizer.Optimizer,
    init_lr: float,
    max_lr: float,
    min_lr: float,
    lr_warmup_steps: int,
    lr_decay_steps: int,
    lr_decay_style: str,
    start_wd: float,
    end_wd: float,
    wd_incr_steps: int,
    wd_incr_style: str,
    use_checkpoint_opt_param_scheduler: typing.Optional[bool] = True,
    override_opt_param_scheduler: typing.Optional[bool] = False,
    wsd_decay_steps: typing.Optional[int] = None,
    lr_wsd_decay_style: typing.Optional[str] = None
)

Anneals learning rate and weight decay.

Parameters:

optimizer

Optimizer

the optimizer to be used

init_lr

float

initial learning rate

max_lr

float

maximum learning rate

min_lr

float

minimum learning rate

lr_warmup_steps

int

number of warmup steps

lr_decay_steps

int

number of decay steps

lr_decay_style

str

decay style for learning rate

start_wd

float

initial weight decay

end_wd

float

final weight decay

wd_incr_steps

int

number of weight decay increment steps

wd_incr_style

str

weight decay increment style

use_checkpoint_opt_param_scheduler

boolDefaults to True

whether to use the checkpoint values for the optimizer param scheduler. Defaults to True.

override_opt_param_scheduler

boolDefaults to False

whether to override the optimizer param scheduler values with the class values. Defaults to False.

wsd_decay_steps

intDefaults to None

number of weight decay decay steps. Defaults to None.

lr_wsd_decay_style

strDefaults to None

decay style for learning rate during weight decay decay steps. Defaults to None.

max_lr

= float(max_lr)

num_steps

= 0

nemo_automodel.components.optim.scheduler.OptimizerParamScheduler.__repr__() -> str

Return a string representation of the OptimizerParamScheduler.

nemo_automodel.components.optim.scheduler.OptimizerParamScheduler._check_and_set(
    cls_value: nemo_automodel.components.optim.scheduler._T,
    sd_value: nemo_automodel.components.optim.scheduler._T,
    name: str
) -> nemo_automodel.components.optim.scheduler._T

Auxiliary function for checking the values in the checkpoint and setting them.

Parameters:

cls_value

class value

sd_value

checkpoint value

name

str

name of the parameter

nemo_automodel.components.optim.scheduler.OptimizerParamScheduler.get_lr(
    param_group: dict[str, typing.Any]
) -> float

Learning rate decay functions from: https://openreview.net/pdf?id=BJYwwY9ll pg. 4.

nemo_automodel.components.optim.scheduler.OptimizerParamScheduler.get_wd() -> float

Weight decay incr functions.

nemo_automodel.components.optim.scheduler.OptimizerParamScheduler.load_state_dict(
    state_dict: dict[str, typing.Any]
) -> None

Load the state dict.

Parameters:

state_dict

dict

state dict to be load

nemo_automodel.components.optim.scheduler.OptimizerParamScheduler.state_dict() -> dict[str, typing.Any]

Return the state dict.

nemo_automodel.components.optim.scheduler.OptimizerParamScheduler.step(
    increment: int
) -> None

Set lr for all parameters groups.

Parameters:

increment

int

number of steps to increment

nemo_automodel.components.optim.scheduler._T = TypeVar('_T')

nemo_automodel.components.optim.scheduler.logger = logging.getLogger(__name__)