> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.moe.state_dict_utils

## Module Contents

### Functions

| Name                                                                                                                         | Description                                                          |
| ---------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------- |
| [`create_dtensor_from_local`](#nemo_automodel-components-moe-state_dict_utils-create_dtensor_from_local)                     | Create a DTensor from a local tensor for expert parallelism.         |
| [`get_expert_range_for_rank_from_mesh`](#nemo_automodel-components-moe-state_dict_utils-get_expert_range_for_rank_from_mesh) | Get the range of experts that should be loaded for the current rank. |
| [`get_expert_slice_for_rank`](#nemo_automodel-components-moe-state_dict_utils-get_expert_slice_for_rank)                     | Get the slice of experts present on the current rank for a DTensor.  |
| [`get_submesh`](#nemo_automodel-components-moe-state_dict_utils-get_submesh)                                                 | Access a submesh by dim names from the given mesh.                   |
| [`is_dtensor`](#nemo_automodel-components-moe-state_dict_utils-is_dtensor)                                                   | Check if a tensor is a DTensor.                                      |
| [`should_load_expert_for_rank`](#nemo_automodel-components-moe-state_dict_utils-should_load_expert_for_rank)                 | Check if a specific expert should be loaded on the current rank.     |
| [`split_experts_weights_dtensor_aware`](#nemo_automodel-components-moe-state_dict_utils-split_experts_weights_dtensor_aware) | Split expert weights, handling both regular tensors and DTensors.    |
| [`validate_dtensor_expert_sharding`](#nemo_automodel-components-moe-state_dict_utils-validate_dtensor_expert_sharding)       | Validate that a DTensor is properly sharded for expert parallelism.  |

### API

```python
nemo_automodel.components.moe.state_dict_utils.create_dtensor_from_local(
    local_tensor: torch.Tensor,
    device_mesh: typing.Optional[torch.distributed.device_mesh.DeviceMesh],
    rank: typing.Optional[int] = None
) -> torch.Tensor
```

Create a DTensor from a local tensor for expert parallelism.

**Parameters:**

Local portion of the tensor on this rank

Device mesh for DTensor creation

Current rank (for device placement)

**Returns:** `torch.Tensor`

DTensor if device\_mesh is provided and DTensor is available, otherwise local\_tensor

```python
nemo_automodel.components.moe.state_dict_utils.get_expert_range_for_rank_from_mesh(
    device_mesh: typing.Optional[torch.distributed.device_mesh.DeviceMesh],
    n_experts: int
) -> tuple[int, int]
```

Get the range of experts that should be loaded for the current rank.

**Parameters:**

Device mesh for expert parallelism

Total number of experts

**Returns:** `tuple[int, int]`

Tuple of (start\_expert\_id, end\_expert\_id) for this rank

```python
nemo_automodel.components.moe.state_dict_utils.get_expert_slice_for_rank(
    experts_tensor: torch.Tensor,
    n_experts: int
) -> tuple[torch.Tensor, int, int]
```

Get the slice of experts present on the current rank for a DTensor.

For non-DTensors, returns the full tensor with start\_expert=0, end\_expert=n\_experts.
For DTensors sharded along the expert dimension (dim=0), returns only the local experts.

**Parameters:**

Input tensor containing expert weights \[n\_experts, ...]

Total number of experts across all ranks

**Returns:** `torch.Tensor`

tuple of (local\_tensor, start\_expert\_id, end\_expert\_id)

```python
nemo_automodel.components.moe.state_dict_utils.get_submesh(
    device_mesh: torch.distributed.device_mesh.DeviceMesh,
    dims: tuple[str, ...]
) -> torch.distributed.device_mesh.DeviceMesh
```

Access a submesh by dim names from the given mesh.

```python
nemo_automodel.components.moe.state_dict_utils.is_dtensor(
    tensor: torch.Tensor
) -> bool
```

Check if a tensor is a DTensor.

```python
nemo_automodel.components.moe.state_dict_utils.should_load_expert_for_rank(
    expert_id: int,
    device_mesh: typing.Optional[torch.distributed.device_mesh.DeviceMesh],
    n_experts: int
) -> bool
```

Check if a specific expert should be loaded on the current rank.

**Parameters:**

The expert ID to check

Device mesh for expert parallelism

Total number of experts

**Returns:** `bool`

True if this expert should be loaded on the current rank

```python
nemo_automodel.components.moe.state_dict_utils.split_experts_weights_dtensor_aware(
    weight: torch.Tensor,
    n_experts: int
) -> tuple[list[torch.Tensor], list[int]]
```

Split expert weights, handling both regular tensors and DTensors.

For DTensors, only splits the experts present on the current rank.

**Parameters:**

Expert weights tensor \[n\_experts, ...] (regular tensor or DTensor)

Total number of experts across all ranks

**Returns:** `list[torch.Tensor]`

tuple of (split\_weights, expert\_ids)

```python
nemo_automodel.components.moe.state_dict_utils.validate_dtensor_expert_sharding(
    tensor: torch.Tensor,
    expected_experts: int,
    tensor_name: str = 'tensor'
) -> bool
```

Validate that a DTensor is properly sharded for expert parallelism.

**Parameters:**

Tensor to validate

Expected total number of experts

Name for error messages

**Returns:** `bool`

True if valid, raises ValueError if invalid