core.hyper_comm_grid#

Module Contents#

Classes#

_RankViewSpec

A named rank factorization over the same rank span as the base grid.

HyperCommGrid

N-dimensional communication grid.

Functions#

_is_process_group_member

Whether the current rank belongs to pg (not the non-member sentinel).

Data#

API#

core.hyper_comm_grid._is_process_group_member(
pg: Optional[torch.distributed.ProcessGroup],
) bool#

Whether the current rank belongs to pg (not the non-member sentinel).

core.hyper_comm_grid._BASE_VIEW_NAME#

‘base’

class core.hyper_comm_grid._RankViewSpec#

A named rank factorization over the same rank span as the base grid.

name: str#

None

shape: list[int]#

None

dim_names: list[str]#

None

shared_dims: list[str]#

None

class core.hyper_comm_grid.HyperCommGrid(
shape: list[int],
dim_names: list[str],
rank_offset: int = 0,
backend: Optional[str] = None,
)#

N-dimensional communication grid.

Manages an arbitrary number of parallelisms as a hyperrectangle. Each dimension is given a name at initialization time. The order of dim_names implies the mapping order equivalent to the order argument of MCore’s initialize_model_parallel. Internally, it has to be reversed to match n-D array.

For any combination of dimensions, a process group can only be created once. Creating process groups for the same combination with different options is not supported.

Methods default to the base factorization. Register additional factorizations of the same rank span with :meth:register_view and target them via view="...".

.. note::

create_pg() over specific dims must be explicitly called to create a process group. We don’t create a process group in the get_pg() function because there are many options (kwargs) that can be passed when creating a process group, which get_pg() should not be exposed to.

.. rubric:: Examples

grid = HyperCommGrid([2, 3, 4, 5], [“tp”, “cp”, “pp”, “dp”]) dp_group = grid.create_pg(“dp”)

retrieve dp_group from grid after creation

dp_group = grid.get_pg(“dp”)

It is equivalent to calling the following functions in MCore parallel_state

with world size 120.

parallel_state.initialize_model_parallel( tensor_model_parallel_size=2, context_parallel_size=3, pipeline_model_parallel_size=4, order=”tp-cp-pp-dp”) dp_group_mcore = parallel_state.get_data_parallel_group()

We can create group from multiple leading dims and also pass more options.

pg_options = ProcessGroupNCCL.Options() pg_options.config.max_ctas = 8 dp_cp_group = grid.create_pg( [“cp”, “dp”], pg_options=pg_options, group_desc=”WEIGHT_GRADIENT_COMM_GROUP”)

Parameters:
  • shape – Shape of the communication grid.

  • dim_names – Name of each dimension corresponding to shape. Must have the same length as shape.

  • rank_offset – Starting rank when the grid doesn’t span the entire communication world. Default 0.

  • backend – Backend for creating process group. Default None and will use default backend.

Initialization

register_view(
name: str,
shape: list[int],
dim_names: list[str],
shared_dims: Optional[list[str]] = None,
) None#

Register an additional rank factorization over this grid’s rank span.

Shared dims must exist in both the base view and the new view, and must enumerate to the same rank groups as the base view.

create_pg(
dims: Union[str, list[str]],
*,
view: Optional[str] = None,
**kwargs: Any,
) torch.distributed.ProcessGroup | None#

Create a process group based on a list of dimension names

Note: The unique key used to store the process group internally will follow the reversed order of the original dim_names. For example, if dim_names=[“tp”, “cp”, “dp”] and you create a process group with dims=[“dp”, “tp”], the unique_group_key will be “dp-tp” (ordered according to the reversed dim_names order: [“dp”, “cp”, “tp”]).

Parameters:
  • dims – Name of leading dimensions to create process group

  • view – Optional registered rank view name. Defaults to the base view.

Keyword arguments are directly passed into new_subgroups_by_enumeration(). The docstring is copied from new_subgroups_by_enumeration().

Keyword args from dist.new_subgroups_by_enumeration: timeout (timedelta, optional): see init_process_group for details and default value. pg_options (ProcessGroupOptions, optional): process group options specifying what additional options need to be passed in during the construction of specific process groups. group_desc (str, optional): A string describing the group. Each subgroup will inherit its group_desc.

Returns:

The created process group.

Return type:

dist.ProcessGroup | None

Raises:

KeyError – If attempting to recreate a process group with an existing key.

destroy() None#

Destroy all process groups created by this grid that the current rank belongs to.

This includes base-view groups and view-private groups. A base group reused by a view for a shared dim is stored under a single key, so it is torn down exactly once.

get_pg(
dims: Union[str, list[str]],
*,
view: Optional[str] = None,
) torch.distributed.ProcessGroup#

Get a process group based on a list of dimension names

Parameters:
  • dims – Name of leading dimensions to create process group

  • view – Optional registered rank view name. Defaults to the base view.

get_rank_enum(
dims: Union[str, list[str]],
*,
view: Optional[str] = None,
) list[list[int]]#

Get the rank enumeration for the requested dimension(s).

This is the exact enumeration that would be used by create_pg for the same dims. It is useful for creating additional groups whose membership is derived from the grid (e.g., embedding/position-embedding groups derived from PP groups).

Parameters:
  • dims – Dimension name or list of dimension names.

  • view – Optional registered rank view name. Defaults to the base view.

Returns:

List of rank lists (one per subgroup).

_gen_rank_enum(dims: list[str]) list[list[int]]#

Generate rank enumeration before calling new_subgroups_by_enumeration

This function returns ranks grouped by the specified dimensions, but in REVERSE order of the input dimensions. For example, if you request dimensions [“a”, “b”], the ranks will be grouped by “b-a” order.

.. rubric:: Example

For a grid with shape [2, 2, 2] and dim_names [“a”, “b”, “c”]: _gen_rank_enum([“a”, “b”]) returns [[0, 2, 1, 3], [4, 6, 5, 7]]

This groups ranks first by dimension “b”, then by dimension “a”:

  • Group 0: ranks where c=0, grouped by b-a: [0, 2, 1, 3]

  • Group 1: ranks where c=1, grouped by b-a: [4, 6, 5, 7]

Parameters:

dims – Name of leading dimensions to create process group

Although the function is lightweight enough to be inlined, a standalone one makes it easier to test against MCore’s RankGenerator

_gen_rank_enum_for(
shape: list[int],
dim_names: list[str],
dims: list[str],
) list[list[int]]#

Generate rank enumeration for dims under an explicit shape/dim_names.

_order_dims_for(
dim_names: list[str],
dims: Union[str, list[str]],
) tuple[list[str], str]#

Reorder dims against an explicit dim_names.

_resolve_view(
view: Optional[str],
) core.hyper_comm_grid._RankViewSpec#

Return the requested rank view, defaulting to the base view.

_order_dims_for_view(
view: core.hyper_comm_grid._RankViewSpec,
dims: Union[str, list[str]],
) tuple[list[str], str]#

Reorder dims against a registered view and report missing dims clearly.

_canonical_pg_key_and_enum_view(
view: core.hyper_comm_grid._RankViewSpec,
ordered_dims: list[str],
) tuple[Union[str, tuple[str, tuple[str, ...]]], core.hyper_comm_grid._RankViewSpec, list[str]]#

Return the storage key and rank view used to enumerate a process group.

_is_base_pg_key(
key: Union[str, tuple[str, tuple[str, ...]]],
) bool#

Whether a process-group key belongs to the base view namespace.

is_current_rank_in_grid() bool#

Check if the current rank belongs to this grid.

Returns:

True if the current rank is within [rank_offset, rank_offset + size).