nemo_automodel.training.step_scheduler#

Module Contents#

Classes#

StepScheduler

Scheduler for managing gradient accumulation and checkpointing steps.

API#

class nemo_automodel.training.step_scheduler.StepScheduler(
grad_acc_steps: int,
ckpt_every_steps: int,
dataloader: Optional[int],
val_every_steps: Optional[int] = None,
start_step: int = 0,
start_epoch: int = 0,
num_epochs: int = 10,
max_steps: Optional[int] = None,
)[source]#

Bases: torch.distributed.checkpoint.stateful.Stateful

Scheduler for managing gradient accumulation and checkpointing steps.

Initialization

Initialize the StepScheduler.

Parameters:
  • grad_acc_steps (int) – Number of steps for gradient accumulation.

  • ckpt_every_steps (int) – Frequency of checkpoint steps.

  • dataloader (Optional[int]) – The training dataloader.

  • val_every_steps (int) – Number of training steps between validation.

  • start_step (int) – Initial global step.

  • start_epoch (int) – Initial epoch.

  • num_epochs (int) – Total number of epochs.

  • max_steps (int) – Total number of steps to run.

__iter__()[source]#

Iterates over dataloader while keeping track of counters.

Raises:

StopIteration – If the dataloader was exhausted or max_steps was reached.

Yields:

dict – batch

set_epoch(epoch: int)[source]#

Set the epoch for the dataloader.

property is_optim_step#

Returns whether this step needs to call the optimizer step.

Returns:

if true, the optimizer should run.

Return type:

bool

property is_val_step#

Returns whether this step needs to call the validation.

property is_ckpt_step#

Returns whether this step needs to call the checkpoint saving.

Returns:

if true, the checkpoint should run.

Return type:

bool

property epochs#

Epoch iterator.

Yields:

iterator – over epochs

state_dict()[source]#

Get the current state of the scheduler.

Returns:

Current state with β€˜step’ and β€˜epoch’ keys.

Return type:

dict

load_state_dict(s)[source]#

Load the scheduler state from a dictionary.

Parameters:

s (dict) – Dictionary containing β€˜step’ and β€˜epoch’.