nemo_automodel.components.distributed.config
nemo_automodel.components.distributed.config
Strategy-specific distributed training configuration classes.
Design principle:
- Size params (dp_size, dp_replicate_size, tp_size, pp_size, cp_size, ep_size)
are grouped in
ParallelismSizes. - dp_replicate_size is FSDP2-only: raises assertion if passed with non-FSDP2 config
- Strategy-specific configs contain only additional flags unique to each strategy
- Managers become normal classes that accept (config, device_mesh)
Module Contents
Classes
Functions
Data
API
Additional configuration for DDP distributed training.
Note: DDP does not support tensor parallelism, pipeline parallelism, or expert parallelism. Only dp_size is relevant (inferred from world_size).
Convert config to dictionary.
Resolved distributed topology and execution policies.
Create a resolved distributed setup from sizes and policy configs.
Intentionally, this function is forgiving wrt the input types, allowing strings for the strategy and dicts for the pipeline and MoE configs.
Additional configuration for FSDP2 distributed training.
Note: Size parameters (dp_size, dp_replicate_size, tp_size, pp_size, cp_size, ep_size)
are grouped separately in ParallelismSizes.
Convert config to dictionary (shallow, preserves policy objects).
Additional configuration for MegatronFSDP distributed training.
Note: Size parameters (dp_size, tp_size, cp_size) are grouped separately in
ParallelismSizes. MegatronFSDP does not
support pp_size, dp_replicate_size, or ep_size.
Convert config to dictionary (shallow, preserves objects).
Configuration for MoE model parallelization (EP + FSDP settings).
Resolve a setup-level strategy name or config object.