TorchProcessGroup#

class nvmath.distributed.TorchProcessGroup(
*,
device_id: int | Literal['cpu'],
torch_process_group=None,
logger=None,
)[source]#

ProcessGroup implemented on torch.distributed.

Parameters:
  • device_id – Device used by the torch.distributed process group backend.

  • torch_process_grouptorch.distributed process group handle (e.g. returned by torch.distributed.new_group()), or None to use the default torch process group.

Attributes

MIN_ALL_REDUCE_OBJ_BUFFER_SIZE = 128#
allreduce_obj_buffer_size#

Current buffer size for allreduce_object() (in bytes)

device_id#

Device used by the communication backend of this torch process group.

nranks#
rank#

Methods

allreduce_buffer(
array: ndarray,
*,
op: ReductionOp,
) None[source]#

Allreduce an array.

Parameters:
  • array – Input and output of the collective. The function operates in-place.

  • op – One of the values from ReduceOp enum. Specifies an operation for element-wise reductions.

allreduce_object(
obj: T,
*,
op: Callable[[T, T], T],
) T[source]#

Reduces all Python objects contributed by members of the group. The result is a single reduced object which is returned on every process.

Parameters:
  • obj – object contributed by this process.

  • op – A Python function that takes two objects and returns a single (reduced) object.

broadcast_buffer(
array: ndarray,
*,
root: int = 0,
) None[source]#

Broadcast an array from one process to every process.

Parameters:
  • array – input (on root) and output of the collective.

  • root – rank of sending process.