Distributed runtime#

Initializing the distributed runtime#

To use the distributed APIs, you must first initialize the distributed runtime. This is done by having each process provide a local CUDA device ID (referring to a GPU on the host on which that process runs), an MPI communicator and the desired communication backends:

import nvmath.distributed
from mpi4py import MPI
comm = MPI.COMM_WORLD  # can use any MPI communicator
nvmath.distributed.initialize(device_id, comm, backends=["nvshmem", "nccl"])

Note

nvmath-python uses MPI for bootstrapping, and other bootstrapping modes may become available in the future.

Under the hood, the distributed math libraries use additional communication backends, such as NVSHMEM and NCCL.

You are free to use MPI in other parts of your application.

After initializing the distributed runtime you may use the distributed APIs. Certain APIs such as FFT and Reshape require GPU operands to be allocated on the symmetric memory heap. Refer to Distributed API Utilities for examples and details of how to manage GPU operands on symmetric memory.

API Reference#

`initialize`(device_id, communicator, backends)	Initialize nvmath.distributed runtime.
`finalize`()	Finalize nvmath.distributed runtime (this is called automatically at exit if the runtime is initialized).
`get_context`()	Return the distributed runtime's context or None if not initialized.
`DistributedContext`(device_id, communicator, ...)	Context of initialized nvmath.distributed runtime.