nemo_automodel.components.distributed.pipelining.autopipeline
nemo_automodel.components.distributed.pipelining.autopipeline
Module Contents
Classes
Data
API
Orchestrates pipeline-parallel training on top of torch.distributed.pipelining.
Build the pipeline: validate -> init meta -> split -> schedule.
Reset pipeline stage infrastructure for a new sequence length.
VLM training batches can have wildly different sequence lengths across steps (image batches vs. text-only batches). PyTorch’s PipelineStage locks in recv buffer sizes on the first step, causing a shape-mismatch error on later steps with different seq_lens.
Call this before every schedule.step() to update the stage shapes without
running an expensive forward pass. A no-op when seq_len has not changed.
Parameters:
Sequence length of the upcoming batch (input_ids.shape[1]).
Runtime state produced by pipeline-parallel setup.