Device Communicator Setup
Host-side methods and resources for creating an NCCL device communicator.
The device-side communication primitives themselves are available only
from CUDA kernels and are documented under the C device API
(Device API); this page covers what the Python (host) side
exposes for bootstrapping them. The configuration object passed to
Communicator.create_dev_comm() is documented in
Configuration.
create_dev_comm
- Communicator.create_dev_comm(requirements: NCCLDevCommRequirements | None = None) DevCommResource
Creates a device communicator for device-side NCCL operations.
Device communicators enable direct GPU kernel access to NCCL communication primitives. Multiple device communicators can be created from one host communicator. The returned
DevCommResourceis tracked by the communicator and may be released explicitly via itsclose()method, or automatically when the communicator is destroyed or aborted. Access the device communicator pointer viaDevCommResource.ptrorresource.dev_comm.ptr.- Parameters:
requirements – Configuration for device communicator resource allocation. If
None, a defaultNCCLDevCommRequirementsis used. Defaults toNone.- Returns:
DevCommResourcefor the device communicator.- Raises:
NcclInvalid – If the communicator is not initialized.
See also
GIN type enums
GPU Interconnect Network (GIN) enums describing what device-side network transport is available on a communicator and which connection topology the user requires.
NcclGinType
- class nccl.core.NcclGinType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)
Bases:
IntEnumGIN transport type, mirroring
ncclGinType_t.Reported by
Communicator.gin_typeandCommunicator.railed_gin_typeto indicate which device-side network transport, if any, is available on the communicator.- NONE = 0
GIN not available on this communicator.
- PROXY = 2
Proxy-based GIN. Network operations issued from a device kernel are relayed through a CPU proxy thread.
- GDAKI = 3
GPUDirect Async Kernel-Initiated (GDA-KI). The kernel directly issues network operations to the NIC, bypassing the CPU proxy.
- GPI = 4
GPU-Push Interface. GPU threads push network descriptors directly to a NIC-visible MMIO queue, with no CPU involvement and no memory barriers.
NcclGinConnectionType
- class nccl.core.NcclGinConnectionType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)
Bases:
IntEnumGIN connection topology, mirroring
ncclGinConnectionType_t.Set on the
gin_connection_typefield ofNCCLDevCommRequirementsbefore callingCommunicator.create_dev_comm()to declare which peers must be reachable via GIN from device code.- NONE = 0
No GIN connection requested.
- FULL = 1
Fully connected. Every rank in the communicator must be reachable from every other rank via GIN.
- RAIL = 2
Rail-restricted. Ranks must be reachable via GIN only within the same rail (network plane).