nemo_automodel.components.distributed.mesh
nemo_automodel.components.distributed.mesh
MeshContext dataclass, construction, and validation.
MeshContext is the single source of truth for distributed topology:
device meshes, parallelism sizes, and axis names.
Parallelism sizes (tp_size, pp_size, etc.) are derived at runtime
from the attached DeviceMesh objects via @property. When no mesh
is present the properties return safe defaults (1 for sizes, None for
dp / hsdp).
All inputs and outputs are typed Python objects (dataclasses, enums, etc.).
YAML / dict parsing belongs in the recipe layer — see
nemo_automodel.recipes._dist_utils.
Module Contents
Classes
Functions
Data
API
Bases: enum.Enum
Canonical mesh axis names used by DeviceMesh and helpers.
Inherits from str so each member compares equal to (and can be
used wherever) a plain string — e.g. MeshAxisName.TP == "tp".
Runtime distributed topology context.
Parallelism sizes (tp_size, pp_size, etc.) are not stored as
fields; they are @property accessors that read directly from the
attached DeviceMesh / moe_mesh. When no mesh is present the
properties return safe defaults (1 for sizes, None for dp / hsdp).
All DeviceMesh objects passed in must use axis names from
:class:MeshAxisName; a ValueError is raised on construction if
any unknown name is encountered.
Lifecycle
- Recipes parse YAML to obtain sizes and strategy configs.
- Sizes are passed to :meth:
buildto buildDeviceMeshobjects. MeshContextis created with those meshes; axis names are validated automatically in__post_init__.
Alternatively, :meth:from_meshes constructs an instance directly from
DeviceMesh objects (used by NeMoAutoModel.from_pretrained).
Context-parallel degree (from device_mesh, default 1).
HSDP replication degree (from device_mesh, default None).
DP shard degree (from device_mesh, default 1).
Data-parallel degree (from device_mesh, default None).
Expert-parallel degree (from moe_mesh, default 1).
True when pp_size > 1.
Pipeline-parallel degree (from device_mesh, default 1).
Tensor-parallel degree (from device_mesh, default 1).
DP axis names for FSDP mesh slicing.
Build a topology-only :class:MeshContext from parallelism sizes.
Parameters:
Already-instantiated distributed strategy config.
Requested data, tensor, pipeline, context, and expert
parallelism sizes. If None, defaults to no parallelism with
DP inferred from world_size.
Total process count. If None, inferred from the
distributed environment.
Build a :class:MeshContext from DeviceMesh objects.
This is the entry-point used by NeMoAutoModel.from_pretrained /
from_config where the caller has raw meshes rather than a parsed
YAML config.
Axis-name kwargs for parallelize_fn (EP/FSDP, no pp_axis_name).
Axis-name kwargs for AutoPipeline.
Build-time requested parallelism sizes.
This is durable user intent, not runtime topology. MeshContext derives
its size properties from live DeviceMesh objects after build.
Return the size of axis if present in mesh, else default.
Return axis if present in mesh, else None.
Ensure every axis name in the attached meshes is a :class:MeshAxisName.