Modulus Distributed - NVIDIA Docs

class modulus.distributed.manager.DistributedManager[source]

Bases: object

Distributed Manager for setting up distributed training enviroment.

This is a singleton that creates a persistance class instance for storing parallel environment information through out the life time of the program. This should be used to help set up Distributed Data Parallel and parallel datapipes.

Note

One should call DistributedManager.initialize() prior to constructing a manager object

Example

Copy
Copied!

            
            >>> DistributedManager.initialize()
>>> manager = DistributedManager()
>>> manager.rank
0
>>> manager.world_size
1

property broadcast_buffers

static cleanup()[source]

static create_orthogonal_process_group(name: str, group_name: str, verbose: bool = False)[source]

Create a process group that is orthogonal to the specified process group.

Parameters

static create_process_subgroup(name: str, size: int, group_name: Optional[str] = None, verbose: bool = False)[source]

Create a process subgroup of a parent process group. This must be a collective call by all processes participating in this application.

Parameters

property cuda

property device

property distributed

property find_unused_parameters

static get_available_backend()[source]

group(name=None)[source]

group_name(group=None)[source]

property group_names

group_rank(name=None)[source]

group_size(name=None)[source]

static initialize()[source]

Initialize distributed manager

Current supported initialization methods are:

ENV: PyTorch environment variable initialization
SLURM: Initialization on SLURM systems.
OPENMPI: Initialization for OpenMPI launchers.

Initialization by default is done using the first valid method in the order listed above. Initialization method can also be explicitly controlled using the MODULUS_DISTRIBUTED_INITIALIZATION_METHOD environment variable and setting it to one of the options above.

static initialize_env()[source]

static initialize_open_mpi(addr, port)[source]

static initialize_slurm(port)[source]

classmethod is_initialized() → bool[source]

property local_rank

property rank

static setup(rank=0, world_size=1, local_rank=None, addr='localhost', port='12355', backend='nccl', method='env')[source]

property world_size

modulus.distributed.utils.all_gather_v_wrapper(tensor: Tensor, sizes: List[int], dim: int = 0, group: Optional[ProcessGroup] = None) → Tensor[source]

Implements a distributed AllGatherV primitive. It is based on the idea of a single global tensor which is distributed along a specified dimension into chunks of variable size. This primitive gathers all local tensors from each rank into the full global tensor onto each rank.

Parameters
Returns
Return type

modulus.distributed.utils.all_reduce_v_wrapper(tensor: Tensor, sizes: List[int], dim: int = 0, use_fp32: bool = True, group: Optional[ProcessGroup] = None) → Tensor[source]

Implements a distributed AllReduceV primitive. It is based on the idea of a single global tensor which which can be distributed along a specified dimension into chunks of variable size. This primitive assumes different global tensors of the same shape on each rank. It then re-distributes chunks of all these tensors such that each rank receives all corresponding parts of a global tensor. Each rank then sums up the chunks after receiving it. By design, this primitive thus implements the backward pass of the “all_gather_v” primitive. In this case, the result would be a single global gradient tensor distributed onto different ranks.

Parameters
Returns
Return type

modulus.distributed.utils.gather_loss(loss: float, dst_rank: int = 0, mean: bool = True)[source]

Gathers loss from all processes to one for logging

Parameters
Raises

modulus.distributed.utils.gather_v_wrapper(tensor: Tensor, sizes: List[int], dim: int = 0, dst: int = 0, group: Optional[ProcessGroup] = None) → Tensor[source]

Implements a distributed GatherV primitive. It is based on the idea of a single global tensor which is distributed along a specified dimension into chunks of variable size. This primitive assumes such a distributed tensor and gathers all local tensors from each rank into the full global tensor valid on the specified destination rank.

Parameters
Returns
Return type

modulus.distributed.utils.get_memory_format(tensor)[source]

Gets format for tensor

modulus.distributed.utils.indexed_all_to_all_v_wrapper(tensor: Tensor, indices: List[Tensor], sizes: List[List[int]], dim: int = 0, group: Optional[ProcessGroup] = None) → Tensor[source]

Implements an indexed version of a distributed AllToAllV primitive. It is based on the idea of a single global tensor which is distributed along a specified dimension into chunks of variable size. This primitive assumes a set of indices into this dimension which indicate the corresponding slices sent to each other rank forming an indexed version of an AllToAllV primitive.

Parameters
Returns
Return type

modulus.distributed.utils.indexed_all_to_all_v_wrapper_bwd(tensor: Tensor, indices: List[Tensor], sizes: List[List[int]], tensor_size_along_dim: int, use_fp32: bool = True, dim: int = 0, group: Optional[ProcessGroup] = None) → Tensor[source]

Implements the backward pass to the indexed version of a distributed AllToAllV primitive.

Parameters
Returns
Return type

modulus.distributed.utils.pad_helper(tensor, dim, new_size, mode='zero')[source]

Util for padding tensors

modulus.distributed.utils.scatter_v_wrapper(tensor: Tensor, sizes: List[int], dim: int = 0, src: int = 0, group: Optional[ProcessGroup] = None) → Tensor[source]

Implements a distributed ScatterV primitive. It is based on the idea of a single global tensor which is distributed along a specified dimension into chunks of variable size. This primitive scatters the global tensor from a specified source rank into local chunks onto each other rank.

Parameters
Returns
Return type

modulus.distributed.utils.split_tensor_along_dim(tensor, dim, num_chunks)[source]

splits tensor along specific dim

modulus.distributed.utils.truncate_helper(tensor, dim, new_size)[source]

Util for truncating