core.hyper_comm_grid#
Module Contents#
Classes#
N-dimensional communication grid. |
API#
- class core.hyper_comm_grid.HyperCommGrid(
- shape: list[int],
- dim_names: list[str],
- rank_offset: int = 0,
- backend: Optional[str] = None,
N-dimensional communication grid.
Manages an arbitrary number of parallelisms as a hyperrectangle. Each dimension is given a name at initialization time. The order of
dim_namesimplies the mapping order equivalent to theorderargument of MCore’sinitialize_model_parallel. Internally, it has to be reversed to match n-D array.For any combination of dimensions, a process group can only be created once. Creating process groups for the same combination with different options is not supported.
.. note::
create_pg()over specific dims must be explicitly called to create a process group. We don’t create a process group in theget_pg()function because there are many options (kwargs) that can be passed when creating a process group, whichget_pg()should not be exposed to... rubric:: Examples
grid = HyperCommGrid([2, 3, 4, 5], [“tp”, “cp”, “pp”, “dp”]) dp_group = grid.create_pg(“dp”)
retrieve dp_group from grid after creation
dp_group = grid.get_pg(“dp”)
It is equivalent to calling the following functions in MCore parallel_state
with world size 120.
parallel_state.initialize_model_parallel( tensor_model_parallel_size=2, context_parallel_size=3, pipeline_model_parallel_size=4, order=”tp-cp-pp-dp”) dp_group_mcore = parallel_state.get_data_parallel_group()
We can create group from multiple leading dims and also pass more options.
pg_options = ProcessGroupNCCL.Options() pg_options.config.max_ctas = 8 dp_cp_group = grid.create_pg( [“cp”, “dp”], pg_options=pg_options, group_desc=”WEIGHT_GRADIENT_COMM_GROUP”)
- Parameters:
shape – Shape of the communication grid.
dim_names – Name of each dimension corresponding to shape. Must have the same length as shape.
rank_offset – Starting rank when the grid doesn’t span the entire communication world. Default 0.
backend – Backend for creating process group. Default None and will use default backend.
Initialization
- create_pg(
- dims: Union[str, list[str]],
- **kwargs: Any,
Create a process group based on a list of dimension names
Note: The unique key used to store the process group internally will follow the reversed order of the original dim_names. For example, if dim_names=[“tp”, “cp”, “dp”] and you create a process group with dims=[“dp”, “tp”], the unique_group_key will be “dp-tp” (ordered according to the reversed dim_names order: [“dp”, “cp”, “tp”]).
- Parameters:
dims – Name of leading dimensions to create process group
Keyword arguments are directly passed into new_subgroups_by_enumeration(). The docstring is copied from new_subgroups_by_enumeration().
Keyword args from
dist.new_subgroups_by_enumeration: timeout (timedelta, optional): seeinit_process_groupfor details and default value. pg_options (ProcessGroupOptions, optional): process group options specifying what additional options need to be passed in during the construction of specific process groups. group_desc (str, optional): A string describing the group. Each subgroup will inherit its group_desc.- Returns:
The created process group.
- Return type:
dist.ProcessGroup | None
- Raises:
KeyError – If attempting to recreate a process group with an existing key.
- get_pg(
- dims: Union[str, list[str]],
Get a process group based on a list of dimension names
- Parameters:
dims – Name of leading dimensions to create process group
- _gen_rank_enum(dims: list[str]) list[list[int]]#
Generate rank enumeration before calling new_subgroups_by_enumeration
This function returns ranks grouped by the specified dimensions, but in REVERSE order of the input dimensions. For example, if you request dimensions [“a”, “b”], the ranks will be grouped by “b-a” order.
.. rubric:: Example
For a grid with shape [2, 2, 2] and dim_names [“a”, “b”, “c”]: _gen_rank_enum([“a”, “b”]) returns [[0, 2, 1, 3], [4, 6, 5, 7]]
This groups ranks first by dimension “b”, then by dimension “a”:
Group 0: ranks where c=0, grouped by b-a: [0, 2, 1, 3]
Group 1: ranks where c=1, grouped by b-a: [4, 6, 5, 7]
- Parameters:
dims – Name of leading dimensions to create process group
Although the function is lightweight enough to be inlined, a standalone one makes it easier to test against MCore’s RankGenerator
- _order_dims(
- dims: Union[str, list[str]],
Reorder dims based on the order of self.dim_names