nemo_automodel.components.training.step_scheduler
nemo_automodel.components.training.step_scheduler
Module Contents
Classes
Functions
Data
API
Bases: Stateful
Scheduler for managing gradient accumulation and checkpointing steps.
Epoch iterator.
Returns whether this step needs to call the checkpoint saving.
Returns whether this step needs to run manual garbage collection.
Returns whether this is the last batch for this epoch.
Returns whether the current step is the final training step.
Training stops at whichever comes first: reaching max_steps or
exhausting the configured number of epochs (see __iter__ and
epochs). max_steps alone is therefore not enough to detect the
end — a small dataset can run out of epochs long before max_steps
is hit (e.g. max_steps=100 with only 60 steps’ worth of data). In
that case the last batch of the last epoch is the final step. Detect it
so the final checkpoint and consolidated export — which key off this
flag (see is_ckpt_step and the recipes’ is_final_checkpoint) —
are still written.
Returns whether this step should log to remote services (WandB, MLflow, etc.).
Returns whether this step needs to call the validation.
Returns whether SIGTERM was received.
Iterates over dataloader while keeping track of counters.
Raises:
StopIteration: If the dataloader was exhausted or max_steps was reached.
Load the scheduler state from a dictionary.
Parameters:
Dictionary containing ‘step’ and ‘epoch’.
Set the epoch for the sampler.
Get the current state of the scheduler.
Returns:
Current state with ‘step’ and ‘epoch’ keys.
User-facing step scheduler configuration.
These fields correspond to the YAML-configurable parameters of the
training loop. Runtime-only values (dataloader, dp_size,
local_batch_size) are passed separately to build_step_scheduler.
Build the step scheduler.
Parameters:
The training dataloader.
The size of the data parallel group.
The size of the local batch.
Returns: StepScheduler
Configured StepScheduler.
Calculate the maximum number of steps.
Calculate the number of epochs out of maximum number of steps.