> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.distributed.config

Strategy-specific distributed training configuration classes.

Design principle:

* Size params (dp\_size, dp\_replicate\_size, tp\_size, pp\_size, cp\_size, ep\_size)
  are grouped in `ParallelismSizes`.
* dp\_replicate\_size is FSDP2-only: raises assertion if passed with non-FSDP2 config
* Strategy-specific configs contain only *additional* flags unique to each strategy
* Managers become normal classes that accept (config, device\_mesh)

## Module Contents

### Classes

| Name                                                                                           | Description                                                       |
| ---------------------------------------------------------------------------------------------- | ----------------------------------------------------------------- |
| [`DDPConfig`](#nemo_automodel-components-distributed-config-DDPConfig)                         | Additional configuration for DDP distributed training.            |
| [`DistributedSetup`](#nemo_automodel-components-distributed-config-DistributedSetup)           | Resolved distributed topology and execution policies.             |
| [`FSDP2Config`](#nemo_automodel-components-distributed-config-FSDP2Config)                     | Additional configuration for FSDP2 distributed training.          |
| [`MegatronFSDPConfig`](#nemo_automodel-components-distributed-config-MegatronFSDPConfig)       | Additional configuration for MegatronFSDP distributed training.   |
| [`MoEParallelizerConfig`](#nemo_automodel-components-distributed-config-MoEParallelizerConfig) | Configuration for MoE model parallelization (EP + FSDP settings). |

### Functions

| Name                                                                                                 | Description                                           |
| ---------------------------------------------------------------------------------------------------- | ----------------------------------------------------- |
| [`_resolve_strategy_config`](#nemo_automodel-components-distributed-config-_resolve_strategy_config) | Resolve a setup-level strategy name or config object. |

### Data

[`ActivationCheckpointingMode`](#nemo_automodel-components-distributed-config-ActivationCheckpointingMode)

[`DistributedConfig`](#nemo_automodel-components-distributed-config-DistributedConfig)

[`DistributedStrategyConfig`](#nemo_automodel-components-distributed-config-DistributedStrategyConfig)

[`_STRATEGY_MAP`](#nemo_automodel-components-distributed-config-_STRATEGY_MAP)

[`_StrategyConfigClass`](#nemo_automodel-components-distributed-config-_StrategyConfigClass)

[`__all__`](#nemo_automodel-components-distributed-config-__all__)

### API

```python
class nemo_automodel.components.distributed.config.DDPConfig(
    activation_checkpointing: bool = False,
    broadcast_buffers: bool = False,
    find_unused_parameters: bool = False,
    static_graph: bool = False,
    bucket_cap_mb: typing.Optional[float] = None,
    gradient_as_bucket_view: bool = False,
    autocast_dtype: typing.Optional[torch.dtype] = None
)
```

Dataclass

Additional configuration for DDP distributed training.

Note: DDP does not support tensor parallelism, pipeline parallelism, or expert parallelism.
Only dp\_size is relevant (inferred from world\_size).

```python
nemo_automodel.components.distributed.config.DDPConfig.to_dict() -> typing.Dict[str, typing.Any]
```

Convert config to dictionary.

```python
class nemo_automodel.components.distributed.config.DistributedSetup(
    mesh_context: 'MeshContext',
    strategy_config: nemo_automodel.components.distributed.config.DistributedStrategyConfig | None = None,
    pipeline_config: 'PipelineConfig | None' = None,
    moe_parallel_config: 'MoEParallelizerConfig | None' = None,
    activation_checkpointing: nemo_automodel.components.distributed.config.ActivationCheckpointingMode = False
)
```

Dataclass

Resolved distributed topology and execution policies.

```python
nemo_automodel.components.distributed.config.DistributedSetup.build(
    strategy: str | nemo_automodel.components.distributed.config.DistributedStrategyConfig = 'fsdp2',
    parallelism_sizes: 'ParallelismSizes | None' = None,
    pipeline_config: 'PipelineConfig | dict | None' = None,
    moe_parallel_config: 'MoEParallelizerConfig | dict | None' = None,
    activation_checkpointing: nemo_automodel.components.distributed.config.ActivationCheckpointingMode = False,
    world_size: int | None = None
) -> 'DistributedSetup'
```

classmethod

Create a resolved distributed setup from sizes and policy configs.

Intentionally, this function is forgiving wrt the input types, allowing
strings for the strategy and dicts for the pipeline and MoE configs.

```python
class nemo_automodel.components.distributed.config.FSDP2Config(
    sequence_parallel: bool = False,
    tp_plan: typing.Optional[dict] = None,
    patch_is_packed_sequence: bool = False,
    mp_policy: typing.Optional[torch.distributed.fsdp.MixedPrecisionPolicy] = (lambda: MixedPrecisionPoli...,
    offload_policy: typing.Optional[torch.distributed.fsdp.CPUOffloadPolicy] = None,
    autocast_dtype: typing.Optional[torch.dtype] = None,
    activation_checkpointing: nemo_automodel.components.distributed.config.ActivationCheckpointingMode = False,
    defer_fsdp_grad_sync: bool = True,
    reshard_after_forward: typing.Optional[bool] = None,
    enable_async_tensor_parallel: bool = False,
    enable_compile: bool = False,
    enable_fsdp2_prefetch: bool = False,
    fsdp2_backward_prefetch_depth: int = 2,
    fsdp2_forward_prefetch_depth: int = 1
)
```

Dataclass

Additional configuration for FSDP2 distributed training.

Note: Size parameters (dp\_size, dp\_replicate\_size, tp\_size, pp\_size, cp\_size, ep\_size)
are grouped separately in `ParallelismSizes`.

```python
nemo_automodel.components.distributed.config.FSDP2Config.__post_init__()
```

```python
nemo_automodel.components.distributed.config.FSDP2Config.to_dict() -> typing.Dict[str, typing.Any]
```

Convert config to dictionary (shallow, preserves policy objects).

```python
class nemo_automodel.components.distributed.config.MegatronFSDPConfig(
    megatron_fsdp_unit_modules: typing.List[str] = (lambda: ['transformers.mod...,
    zero_dp_strategy: int = 3,
    init_fsdp_with_meta_device: bool = False,
    grad_reduce_in_fp32: bool = False,
    preserve_fp32_weights: bool = False,
    overlap_grad_reduce: bool = True,
    overlap_param_gather: bool = True,
    check_for_nan_in_grad: bool = True,
    average_in_collective: bool = False,
    disable_bucketing: bool = False,
    calculate_per_token_loss: bool = False,
    keep_fp8_transpose_cache: bool = False,
    nccl_ub: bool = False,
    fsdp_double_buffer: bool = False,
    activation_checkpointing: bool = False
)
```

Dataclass

Additional configuration for MegatronFSDP distributed training.

Note: Size parameters (dp\_size, tp\_size, cp\_size) are grouped separately in
`ParallelismSizes`. MegatronFSDP does not
support pp\_size, dp\_replicate\_size, or ep\_size.

```python
nemo_automodel.components.distributed.config.MegatronFSDPConfig.to_dict() -> typing.Dict[str, typing.Any]
```

Convert config to dictionary (shallow, preserves objects).

```python
class nemo_automodel.components.distributed.config.MoEParallelizerConfig(
    ignore_router_for_ac: bool = True,
    reshard_after_forward: bool = False,
    lm_head_precision: typing.Optional[typing.Union[str, torch.dtype]] = None,
    wrap_outer_model: bool = True,
    mp_policy: typing.Optional[torch.distributed.fsdp.MixedPrecisionPolicy] = None
)
```

Dataclass

Configuration for MoE model parallelization (EP + FSDP settings).

```python
nemo_automodel.components.distributed.config.MoEParallelizerConfig.to_dict() -> typing.Dict[str, typing.Any]
```

```python
nemo_automodel.components.distributed.config._resolve_strategy_config(
    strategy: str | nemo_automodel.components.distributed.config.DistributedStrategyConfig,
    strategy_kwargs: typing.Any = {}
) -> nemo_automodel.components.distributed.config.DistributedStrategyConfig
```

Resolve a setup-level strategy name or config object.

```python
nemo_automodel.components.distributed.config.ActivationCheckpointingMode = Union[bool, Literal['selective']]
```

```python
nemo_automodel.components.distributed.config.DistributedConfig = DistributedStrategyConfig
```

```python
nemo_automodel.components.distributed.config.DistributedStrategyConfig = Union['FSDP2Config', 'MegatronFSDPConfig', 'DDPConfig']
```

```python
nemo_automodel.components.distributed.config._STRATEGY_MAP: Dict[str, _StrategyConfigClass] = {'fsdp2': FSDP2Config, 'megatron_fsdp': MegatronFSDPConfig, 'megatron-fsdp': Meg...
```

```python
nemo_automodel.components.distributed.config._StrategyConfigClass = type[FSDP2Config] | type[MegatronFSDPConfig] | type[DDPConfig]
```

```python
nemo_automodel.components.distributed.config.__all__ = ['DDPConfig', 'DistributedSetup', 'DistributedStrategyConfig', 'FSDP2Config', 'M...
```