> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.distributed.pipelining.autopipeline

## Module Contents

### Classes

| Name                                                                                          | Description                                                                     |
| --------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------- |
| [`AutoPipeline`](#nemo_automodel-components-distributed-pipelining-autopipeline-AutoPipeline) | Orchestrates pipeline-parallel training on top of torch.distributed.pipelining. |
| [`PipelineInfo`](#nemo_automodel-components-distributed-pipelining-autopipeline-PipelineInfo) | Runtime state produced by pipeline-parallel setup.                              |

### Data

[`logger`](#nemo_automodel-components-distributed-pipelining-autopipeline-logger)

### API

```python
class nemo_automodel.components.distributed.pipelining.autopipeline.AutoPipeline(
    world_mesh: typing.Optional[torch.distributed.device_mesh.DeviceMesh] = None,
    moe_mesh: typing.Optional[torch.distributed.device_mesh.DeviceMesh] = None,
    pp_axis_name: str = 'pp',
    dp_axis_names: tuple[str, ...] = ('dp',),
    cp_axis_name: typing.Optional[str] = None,
    tp_axis_name: typing.Optional[str] = None,
    ep_axis_name: typing.Optional[str] = None,
    ep_shard_axis_names: typing.Optional[tuple[str, ...]] = None,
    pp_schedule: typing.Optional[str] = '1f1b',
    pp_schedule_csv: typing.Optional[str] = None,
    pp_microbatch_size: int = 1,
    pp_batch_size: int = 1,
    layers_per_stage: typing.Optional[int] = None,
    round_virtual_stages_to_pp_multiple: typing.Optional[typing.Literal['up', 'down']] = None,
    module_fqns_per_model_part: typing.Optional[list[list[str]]] = None,
    patch_inner_model: bool = True,
    patch_causal_lm_model: bool = True,
    patch_stage_backward_maybe_with_nosync: bool = False,
    defer_fsdp_grad_sync: bool = True,
    device: typing.Optional[torch.device] = None,
    dtype: typing.Optional[torch.dtype] = None,
    scale_grads_in_schedule: bool = False,
    pp_seq_len: typing.Optional[int] = None
)
```

Orchestrates pipeline-parallel training on top of torch.distributed.pipelining.

```python
nemo_automodel.components.distributed.pipelining.autopipeline.AutoPipeline._count_parameters(
    module: torch.nn.Module,
    trainable_only: bool = False
) -> int
```

staticmethod

```python
nemo_automodel.components.distributed.pipelining.autopipeline.AutoPipeline.build(
    model: torch.nn.Module,
    loss_fn: typing.Optional[typing.Callable] = None,
    parallelize_fn: typing.Optional[nemo_automodel.components.distributed.pipelining.functional.ParallelizeFnProtocol] = None
)
```

Build the pipeline: validate -> init meta -> split -> schedule.

```python
nemo_automodel.components.distributed.pipelining.autopipeline.AutoPipeline.debug_summary() -> str
```

```python
nemo_automodel.components.distributed.pipelining.autopipeline.AutoPipeline.get_stage_param_counts(
    trainable_only: bool = False
) -> list[int]
```

```python
nemo_automodel.components.distributed.pipelining.autopipeline.AutoPipeline.get_total_param_count(
    trainable_only: bool = False
) -> int
```

```python
nemo_automodel.components.distributed.pipelining.autopipeline.AutoPipeline.list_stage_modules() -> list[list[str]]
```

```python
nemo_automodel.components.distributed.pipelining.autopipeline.AutoPipeline.log_debug_summary() -> None
```

```python
nemo_automodel.components.distributed.pipelining.autopipeline.AutoPipeline.pretty_print_stages(
    max_modules_per_stage: int = 16,
    trainable_only: bool = False
) -> str
```

```python
nemo_automodel.components.distributed.pipelining.autopipeline.AutoPipeline.update_seq_len(
    seq_len: int
) -> None
```

Reset pipeline stage infrastructure for a new sequence length.

VLM training batches can have wildly different sequence lengths across steps
(image batches vs. text-only batches).  PyTorch's PipelineStage locks in recv
buffer sizes on the first step, causing a shape-mismatch error on later steps
with different seq\_lens.

Call this before every `schedule.step()` to update the stage shapes without
running an expensive forward pass.  A no-op when seq\_len has not changed.

**Parameters:**

Sequence length of the upcoming batch (`input_ids.shape[1]`).

```python
nemo_automodel.components.distributed.pipelining.autopipeline.AutoPipeline.visualize_current_schedule(
    filename: typing.Optional[str] = None
) -> None
```

```python
class nemo_automodel.components.distributed.pipelining.autopipeline.PipelineInfo(
    enabled: bool,
    schedule: typing.Optional[torch.distributed.pipelining.schedules._PipelineSchedule],
    has_first_stage: bool,
    has_last_stage: bool,
    model_parts: typing.Optional[list[torch.nn.Module]],
    stages: typing.Optional[list[torch.distributed.pipelining.stage.PipelineStage]]
)
```

Dataclass

Runtime state produced by pipeline-parallel setup.

```python
nemo_automodel.components.distributed.pipelining.autopipeline.logger = logging.getLogger(__name__)
```