The APIs for managing the distributed runtime are experimental and potentially subject to future changes.

TorchProcessGroup#

class nvmath.distributed.TorchProcessGroup( *, device_id: int | Literal['cpu'], torch_process_group=None, logger=None, )[source]#

ProcessGroup implemented on torch.distributed.

Parameters:

device_id – Device used by the torch.distributed process group backend.
torch_process_group – torch.distributed process group handle (e.g. returned by torch.distributed.new_group()), or None to use the default torch process group.

Attributes

allreduce_obj_buffer_size#: Current buffer size for allreduce_object() (in bytes)

device_id#: Device used by the communication backend of this torch process group.

Methods

allreduce_buffer( array: ndarray, *, op: ReductionOp, ) → None[source]#

Allreduce an array.

Parameters:

array – Input and output of the collective. The function operates in-place.
op – One of the values from ReduceOp enum. Specifies an operation for element-wise reductions.

allreduce_object( obj: T, *, op: Callable[[T, T], T], ) → T[source]#

Reduces all Python objects contributed by members of the group. The result is a single reduced object which is returned on every process.

Parameters:

obj – object contributed by this process.
op – A Python function that takes two objects and returns a single (reduced) object.

broadcast_buffer( array: ndarray, *, root: int = 0, ) → None[source]#

Broadcast an array from one process to every process.

Parameters: