> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.distributed.megatron_fsdp

## Module Contents

### Classes

| Name                                                                                              | Description                                                                   |
| ------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------- |
| [`MegatronFSDPManager`](#nemo_automodel-components-distributed-megatron_fsdp-MegatronFSDPManager) | Manager for parallelizing models using MegatronFSDP with TP, DP, CP sharding. |

### Functions

| Name                                                                                                  | Description                                                           |
| ----------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------- |
| [`fully_shard_optimizer`](#nemo_automodel-components-distributed-megatron_fsdp-fully_shard_optimizer) | -                                                                     |
| [`maybe_shard_optimizer`](#nemo_automodel-components-distributed-megatron_fsdp-maybe_shard_optimizer) | Shard the optimizer with Megatron-FSDP when the strategy requires it. |

### Data

[`HAS_MEGATRON_FSDP`](#nemo_automodel-components-distributed-megatron_fsdp-HAS_MEGATRON_FSDP)

[`logger`](#nemo_automodel-components-distributed-megatron_fsdp-logger)

### API

```python
class nemo_automodel.components.distributed.megatron_fsdp.MegatronFSDPManager(
    config: nemo_automodel.components.distributed.config.MegatronFSDPConfig,
    device_mesh: torch.distributed.device_mesh.DeviceMesh
)
```

Manager for parallelizing models using MegatronFSDP with TP, DP, CP sharding.

This manager applies parallelization to the model using a prescribed
TP sharding plan. It supports mixed precision and various FSDP options.

The device mesh must be created externally and passed in.

**Parameters:**

Configuration for MegatronFSDP distributed training.

Device mesh for distributed operations.

```python
nemo_automodel.components.distributed.megatron_fsdp.MegatronFSDPManager.parallelize(
    model,
    optimizer = None
)
```

Parallelizes the given model using MegatronFSDP and TP sharding strategies.

**Parameters:**

The model to be parallelized.

The optimizer for the model. If None, user needs to call
model.finish\_grad\_sync() before optimizer.step(),
model.install\_optimized\_model\_weights() and model.zero\_grad\_buffer()
after optimizer.zero\_grad().

**Returns:**

(parallelized\_model, optimizer)

```python
nemo_automodel.components.distributed.megatron_fsdp.fully_shard_optimizer(
    model: torch.nn.Module,
    optimizer: torch.optim.Optimizer,
    preproc_state_dict_for_dcp_ckpt: bool = True
) -> torch.optim.Optimizer
```

```python
nemo_automodel.components.distributed.megatron_fsdp.maybe_shard_optimizer(
    model_part: torch.nn.Module,
    optimizer: torch.optim.Optimizer,
    distributed_config: nemo_automodel.components.distributed.config.DistributedConfig | None,
    allow: bool = True
) -> torch.optim.Optimizer
```

Shard the optimizer with Megatron-FSDP when the strategy requires it.

Returns the optimizer unchanged unless `distributed_config` is a
:class:`MegatronFSDPConfig` running in a distributed (world size > 1) job.

**Parameters:**

The (already sharded) model part the optimizer belongs to.

The optimizer to (optionally) shard.

Distributed strategy config; only triggers sharding
when it is a :class:`MegatronFSDPConfig`.

Guard for optimizers incompatible with Megatron-FSDP sharding
(e.g. Dion); asserts when sharding would otherwise apply.

```python
nemo_automodel.components.distributed.megatron_fsdp.HAS_MEGATRON_FSDP = True
```

```python
nemo_automodel.components.distributed.megatron_fsdp.logger = logging.getLogger(__name__)
```