> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/automodel/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/automodel/_mcp/server.

# nemo_automodel.components.distributed.mesh

MeshContext dataclass, construction, and validation.

`MeshContext` is the single source of truth for distributed topology:
device meshes, parallelism sizes, and axis names.

Parallelism sizes (`tp_size`, `pp_size`, etc.) are derived at runtime
from the attached `DeviceMesh` objects via `@property`.  When no mesh
is present the properties return safe defaults (1 for sizes, `None` for
dp / hsdp).

All inputs and outputs are typed Python objects (dataclasses, enums, etc.).
YAML / dict parsing belongs in the recipe layer — see
`nemo_automodel.recipes._dist_utils`.

## Module Contents

### Classes

| Name                                                                               | Description                                                 |
| ---------------------------------------------------------------------------------- | ----------------------------------------------------------- |
| [`MeshAxisName`](#nemo_automodel-components-distributed-mesh-MeshAxisName)         | Canonical mesh axis names used by `DeviceMesh` and helpers. |
| [`MeshContext`](#nemo_automodel-components-distributed-mesh-MeshContext)           | Runtime distributed topology context.                       |
| [`ParallelismSizes`](#nemo_automodel-components-distributed-mesh-ParallelismSizes) | Build-time requested parallelism sizes.                     |

### Functions

| Name                                                                                                 | Description                                                               |
| ---------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------- |
| [`_get_axis_size`](#nemo_automodel-components-distributed-mesh-_get_axis_size)                       | Return the size of *axis* if present in *mesh*, else *default*.           |
| [`_optional_axis`](#nemo_automodel-components-distributed-mesh-_optional_axis)                       | Return *axis* if present in *mesh*, else `None`.                          |
| [`_validate_mesh_axis_names`](#nemo_automodel-components-distributed-mesh-_validate_mesh_axis_names) | Ensure every axis name in the attached meshes is a :class:`MeshAxisName`. |

### Data

[`_VALID_AXIS_NAMES`](#nemo_automodel-components-distributed-mesh-_VALID_AXIS_NAMES)

[`__all__`](#nemo_automodel-components-distributed-mesh-__all__)

### API

```python
class nemo_automodel.components.distributed.mesh.MeshAxisName
```

**Bases:** `enum.Enum`

Canonical mesh axis names used by `DeviceMesh` and helpers.

Inherits from `str` so each member compares equal to (and can be
used wherever) a plain string — e.g. `MeshAxisName.TP == "tp"`.

```python
class nemo_automodel.components.distributed.mesh.MeshContext(
    device_mesh: typing.Optional[torch.distributed.device_mesh.DeviceMesh] = None,
    moe_mesh: typing.Optional[torch.distributed.device_mesh.DeviceMesh] = None
)
```

Dataclass

Runtime distributed topology context.

Parallelism sizes (`tp_size`, `pp_size`, etc.) are **not** stored as
fields; they are `@property` accessors that read directly from the
attached `DeviceMesh` / `moe_mesh`.  When no mesh is present the
properties return safe defaults (`1` for sizes, `None` for dp / hsdp).

All `DeviceMesh` objects passed in must use axis names from
:class:`MeshAxisName`; a `ValueError` is raised on construction if
any unknown name is encountered.

## Lifecycle

1. Recipes parse YAML to obtain sizes and strategy configs.
2. Sizes are passed to :meth:`build` to build `DeviceMesh`
   objects.
3. `MeshContext` is created with those meshes; axis names are
   validated automatically in `__post_init__`.

Alternatively, :meth:`from_meshes` constructs an instance directly from
`DeviceMesh` objects (used by `NeMoAutoModel.from_pretrained`).

Context-parallel degree (from `device_mesh`, default `1`).

HSDP replication degree (from `device_mesh`, default `None`).

DP shard degree (from `device_mesh`, default `1`).

Data-parallel degree (from `device_mesh`, default `None`).

Expert-parallel degree (from `moe_mesh`, default `1`).

`True` when `pp_size &gt; 1`.

Pipeline-parallel degree (from `device_mesh`, default `1`).

Tensor-parallel degree (from `device_mesh`, default `1`).

```python
nemo_automodel.components.distributed.mesh.MeshContext.__post_init__() -> None
```

```python
nemo_automodel.components.distributed.mesh.MeshContext._dp_axis_names() -> typing.Tuple[str, ...]
```

DP axis names for FSDP mesh slicing.

```python
nemo_automodel.components.distributed.mesh.MeshContext.build(
    strategy_config: nemo_automodel.components.distributed.config.DistributedStrategyConfig,
    parallelism_sizes: nemo_automodel.components.distributed.mesh.ParallelismSizes | None = None,
    world_size: int | None = None
) -> nemo_automodel.components.distributed.mesh.MeshContext
```

classmethod

Build a topology-only :class:`MeshContext` from parallelism sizes.

**Parameters:**

Already-instantiated distributed strategy config.

Requested data, tensor, pipeline, context, and expert
parallelism sizes. If `None`, defaults to no parallelism with
DP inferred from `world_size`.

Total process count. If `None`, inferred from the
distributed environment.

```python
nemo_automodel.components.distributed.mesh.MeshContext.from_meshes(
    device_mesh: typing.Optional[torch.distributed.device_mesh.DeviceMesh],
    moe_mesh: typing.Optional[torch.distributed.device_mesh.DeviceMesh] = None
) -> nemo_automodel.components.distributed.mesh.MeshContext
```

classmethod

Build a :class:`MeshContext` from `DeviceMesh` objects.

This is the entry-point used by `NeMoAutoModel.from_pretrained` /
`from_config` where the caller has raw meshes rather than a parsed
YAML config.

```python
nemo_automodel.components.distributed.mesh.MeshContext.parallelize_axis_kwargs() -> typing.Dict[str, object]
```

Axis-name kwargs for `parallelize_fn` (EP/FSDP, no `pp_axis_name`).

```python
nemo_automodel.components.distributed.mesh.MeshContext.pipeline_axis_kwargs() -> typing.Dict[str, object]
```

Axis-name kwargs for `AutoPipeline`.

```python
class nemo_automodel.components.distributed.mesh.ParallelismSizes(
    dp_size: int | None = None,
    dp_replicate_size: int | None = None,
    tp_size: int = 1,
    pp_size: int = 1,
    cp_size: int = 1,
    ep_size: int = 1
)
```

Dataclass

Build-time requested parallelism sizes.

This is durable user intent, not runtime topology. `MeshContext` derives
its size properties from live `DeviceMesh` objects after build.

```python
nemo_automodel.components.distributed.mesh._get_axis_size(
    mesh: typing.Optional[torch.distributed.device_mesh.DeviceMesh],
    axis: nemo_automodel.components.distributed.mesh.MeshAxisName,
    default = 1
) -> typing.Optional[int]
```

Return the size of *axis* if present in *mesh*, else *default*.

```python
nemo_automodel.components.distributed.mesh._optional_axis(
    mesh: typing.Optional[torch.distributed.device_mesh.DeviceMesh],
    axis: nemo_automodel.components.distributed.mesh.MeshAxisName
) -> typing.Optional[str]
```

Return *axis* if present in *mesh*, else `None`.

```python
nemo_automodel.components.distributed.mesh._validate_mesh_axis_names(
    mesh_context: nemo_automodel.components.distributed.mesh.MeshContext
) -> None
```

Ensure every axis name in the attached meshes is a :class:`MeshAxisName`.

```python
nemo_automodel.components.distributed.mesh._VALID_AXIS_NAMES: frozenset = frozenset(MeshAxisName)
```

```python
nemo_automodel.components.distributed.mesh.__all__ = ['MeshAxisName', 'MeshContext', 'ParallelismSizes']
```