core.hyper_comm_grid#

Module Contents#

Classes#

HyperCommGrid

N-dimensional communication grid.

API#

class core.hyper_comm_grid.HyperCommGrid(
shape: list[int],
dim_names: list[str],
rank_offset: int = 0,
backend: Optional[str] = None,
)#

N-dimensional communication grid.

Manages an arbitrary number of parallelisms as a hyperrectangle. Each dimension is given a name at initialization time. The order of dim_names implies the mapping order equivalent to the order argument of MCore’s initialize_model_parallel. Internally, it has to be reversed to match n-D array.

For any combination of dimensions, a process group can only be created once. Creating process groups for the same combination with different options is not supported.

.. note::

create_pg() over specific dims must be explicitly called to create a process group. We don’t create a process group in the get_pg() function because there are many options (kwargs) that can be passed when creating a process group, which get_pg() should not be exposed to.

.. rubric:: Examples

grid = HyperCommGrid([2, 3, 4, 5], [“tp”, “cp”, “pp”, “dp”]) dp_group = grid.create_pg(“dp”)

retrieve dp_group from grid after creation

dp_group = grid.get_pg(“dp”)

It is equivalent to calling the following functions in MCore parallel_state

with world size 120.

parallel_state.initialize_model_parallel( tensor_model_parallel_size=2, context_parallel_size=3, pipeline_model_parallel_size=4, order=”tp-cp-pp-dp”) dp_group_mcore = parallel_state.get_data_parallel_group()

We can create group from multiple leading dims and also pass more options.

pg_options = ProcessGroupNCCL.Options() pg_options.config.max_ctas = 8 dp_cp_group = grid.create_pg( [“cp”, “dp”], pg_options=pg_options, group_desc=”WEIGHT_GRADIENT_COMM_GROUP”)

Parameters:
  • shape – Shape of the communication grid.

  • dim_names – Name of each dimension corresponding to shape. Must have the same length as shape.

  • rank_offset – Starting rank when the grid doesn’t span the entire communication world. Default 0.

  • backend – Backend for creating process group. Default None and will use default backend.

Initialization

create_pg(
dims: Union[str, list[str]],
**kwargs: Any,
) torch.distributed.ProcessGroup | None#

Create a process group based on a list of dimension names

Note: The unique key used to store the process group internally will follow the reversed order of the original dim_names. For example, if dim_names=[“tp”, “cp”, “dp”] and you create a process group with dims=[“dp”, “tp”], the unique_group_key will be “dp-tp” (ordered according to the reversed dim_names order: [“dp”, “cp”, “tp”]).

Parameters:

dims – Name of leading dimensions to create process group

Keyword arguments are directly passed into new_subgroups_by_enumeration(). The docstring is copied from new_subgroups_by_enumeration().

Keyword args from dist.new_subgroups_by_enumeration: timeout (timedelta, optional): see init_process_group for details and default value. pg_options (ProcessGroupOptions, optional): process group options specifying what additional options need to be passed in during the construction of specific process groups. group_desc (str, optional): A string describing the group. Each subgroup will inherit its group_desc.

Returns:

The created process group.

Return type:

dist.ProcessGroup | None

Raises:

KeyError – If attempting to recreate a process group with an existing key.

get_pg(
dims: Union[str, list[str]],
) torch.distributed.ProcessGroup#

Get a process group based on a list of dimension names

Parameters:

dims – Name of leading dimensions to create process group

_gen_rank_enum(dims: list[str]) list[list[int]]#

Generate rank enumeration before calling new_subgroups_by_enumeration

This function returns ranks grouped by the specified dimensions, but in REVERSE order of the input dimensions. For example, if you request dimensions [“a”, “b”], the ranks will be grouped by “b-a” order.

.. rubric:: Example

For a grid with shape [2, 2, 2] and dim_names [“a”, “b”, “c”]: _gen_rank_enum([“a”, “b”]) returns [[0, 2, 1, 3], [4, 6, 5, 7]]

This groups ranks first by dimension “b”, then by dimension “a”:

  • Group 0: ranks where c=0, grouped by b-a: [0, 2, 1, 3]

  • Group 1: ranks where c=1, grouped by b-a: [4, 6, 5, 7]

Parameters:

dims – Name of leading dimensions to create process group

Although the function is lightweight enough to be inlined, a standalone one makes it easier to test against MCore’s RankGenerator

_order_dims(
dims: Union[str, list[str]],
) Tuple[list[str], str]#

Reorder dims based on the order of self.dim_names