nemo_automodel.components.distributed.pipelining.config

View as Markdown

Pipeline parallel configuration class.

Design principle:

  • Device mesh (world_mesh, moe_mesh) is passed separately to from_pretrained/from_config
  • PipelineConfig contains scheduling, splitting, and runtime options
  • loss_fn is included here since it’s only used for pipelining
  • Axis names are inferred automatically from device_mesh in _instantiate_pipeline

Module Contents

Classes

NameDescription
PipelineConfigConfiguration for pipeline parallel training.

API

class nemo_automodel.components.distributed.pipelining.config.PipelineConfig(
pp_schedule: typing.Optional[str] = '1f1b',
pp_schedule_csv: typing.Optional[str] = None,
pp_microbatch_size: int = 1,
pp_batch_size: int = 1,
layers_per_stage: typing.Optional[int] = None,
round_virtual_stages_to_pp_multiple: typing.Optional[typing.Literal['up', 'down']] = None,
module_fqns_per_model_part: typing.Optional[typing.List[typing.List[str]]] = None,
patch_inner_model: bool = True,
patch_causal_lm_model: bool = True,
patch_stage_backward_maybe_with_nosync: bool = False,
dtype: typing.Optional[torch.dtype] = None,
scale_grads_in_schedule: bool = False,
loss_fn: typing.Optional[typing.Callable] = None,
pp_seq_len: typing.Optional[int] = None
)
Dataclass

Configuration for pipeline parallel training.

Note: Device mesh (world_mesh, moe_mesh) is passed separately on the from_pretrained/from_config method signature. Pipeline parallelism is enabled when pp_size > 1. Axis names are inferred automatically from the device mesh structure.

dtype
Optional[dtype] = None
layers_per_stage
Optional[int] = None
loss_fn
Optional[Callable] = None
module_fqns_per_model_part
Optional[List[List[str]]] = None
patch_causal_lm_model
bool = True
patch_inner_model
bool = True
patch_stage_backward_maybe_with_nosync
bool = False
pp_batch_size
int = 1
pp_microbatch_size
int = 1
pp_schedule
Optional[str] = '1f1b'
pp_schedule_csv
Optional[str] = None
pp_seq_len
Optional[int] = None
round_virtual_stages_to_pp_multiple
Optional[Literal['up', 'down']] = None
scale_grads_in_schedule
bool = False
nemo_automodel.components.distributed.pipelining.config.PipelineConfig.to_dict() -> typing.Dict[str, typing.Any]

Convert config to dictionary.