initialize#

nvmath.distributed.initialize( device_id: int, process_group: ProcessGroup | mpi4py.MPI.Comm, backends: Sequence[Literal['nvshmem', 'nccl']], ) → None[source]#

Initialize nvmath.distributed runtime. This is required before any distributed operations can be performed. Note that this is a collective operation and must be called by all processes.

If the runtime is already initialized this function will raise an error. If you need to reinitialize the runtime (for example with different backends) you have to finalize it first.

Note: NCCL doesn’t allow assigning more than one process to the same GPU.

Parameters:

device_id – CUDA device ID to associate with the nvmath.distributed runtime on this process.
process_group – ProcessGroup (or mpi4py communicator) specifying the participating processes. This is used for setup and not for communication during compute.
backends – Communication backends to use in distributed computations. Valid values are “nvshmem” and “nccl”. Note that specific libraries (cuFFTMp, cuBLASMp, …) have specific required backends.