Device API – Host-Side Setup
Host-Side Setup
Host functions and types. The following are for use in host code: creating and destroying device communicators,
querying properties, and the requirement and property types. The ncclDevComm structure is then passed to
device code.
ncclDevComm
-
type ncclDevComm
A structure describing a device communicator, as created on the host side using
ncclDevCommCreate(). The structure is used primarily on the device side. In general, fields in this struct are considered internal and should not be accessed by users. An exception is made for the following fields, which are guaranteed to be stable across NCCL versions:-
int rank
The rank within the communicator.
-
int nRanks
The size of the communicator.
-
int lsaRank
-
int rank
ncclDevCommCreate
-
ncclResult_t ncclDevCommCreate(ncclComm_t comm, struct ncclDevCommRequirements const *reqs, struct ncclDevComm *outDevComm)
Creates a new device communicator (see
ncclDevComm) corresponding to the supplied host-side communicator comm. The result is returned in the outDevComm buffer (which needs to be supplied by the caller). The caller needs to also provide a filled-in list of requirements via the reqs argument (seencclDevCommRequirements); the function will allocate any necessary resources to meet them. It is recommended to callncclCommQueryProperties()before calling the function; the function will fail if the specified requirements are not supported. Since this is a collective call, every rank in the communicator needs to participate. If called within a group, outDevComm may not be filled in untilncclGroupEnd()has completed.Note that this is a host-side function.
ncclDevCommDestroy
-
ncclResult_t ncclDevCommDestroy(ncclComm_t comm, struct ncclDevComm const *devComm)
Destroys a device communicator (see
ncclDevComm) previously created usingncclDevCommCreate()and releases any allocated resources. The caller must ensure that no device kernel that uses this device communicator could be running at the time this function is invoked.Note that this is a host-side function.
ncclDevCommRequirements
-
type ncclDevCommRequirements
A host-side structure specifying the list of requirements when creating device communicators (see
ncclDevComm). Since NCCL 2.29, this struct must be initialized usingNCCL_DEV_COMM_REQUIREMENTS_INITIALIZER.-
int lsaBarrierCount
Specifies the number of memory barriers to allocate (see
ncclLsaBarrierSession). These barriers are necessary to write fused kernel and may be required by building blocks such as those in Device API – Remote Reduce and Copy: Building Blocks for Custom Communication Kernels.
-
int railGinBarrierCount
Specifies the number of network barriers to allocate (see
ncclGinBarrierSession; available since NCCL 2.28.7).
-
int barrierCount
Specifies the minimum number for both the memory and network barriers (see above; available since NCCL 2.28.7).
-
int ginSignalCount
Specifies the number of network signals to allocate (see
ncclGinSignal_t; available since NCCL 2.28.7).
-
int ginCounterCount
Specifies the number of network counters to allocate (see
ncclGinCounter_t; available since NCCL 2.28.7).
-
bool ginForceEnable
Deprecated. Forces GIN (GPU-Initiated Networking) support to be enabled by automatically setting
ginConnectionTypetoNCCL_GIN_CONNECTION_FULL. This field is deprecated in favor of explicitly settingginConnectionTypeto the desired value. When set totrue, it overrides theginConnectionTypefield. New code should useginConnectionTypedirectly instead of this field. Available since NCCL 2.28.7, deprecated since NCCL 2.29.4.
-
ncclGinConnectionType_t ginConnectionType
Specifies the type of GIN (GPU-Initiated Networking) connection to establish for the device communicator. This field controls whether GIN is enabled and how it is configured. When set to
NCCL_GIN_CONNECTION_FULL, GIN is initialized and all ranks connect to all other ranks in the communicator. When set toNCCL_GIN_CONNECTION_RAIL, GIN is initialized and each rank connects to other ranks in the same rail team. If GIN resources are requested viaginSignalCount,ginCounterCount,barrierCount, orrailGinBarrierCountwhile this field is set toNCCL_GIN_CONNECTION_NONE, device communicator creation will fail withncclInvalidArgument. Available since NCCL 2.29.4.See
ncclGinConnectionType_tfor possible values.
-
ncclDevResourceRequirements_t *resourceRequirementsList
Specifies a list of resource requirements. This is best set to NULL for now.
-
ncclTeamRequirements_t *teamRequirementsList
Specifies a list of requirements for particular teams. This is best set to NULL for now.
-
int lsaBarrierCount
ncclCommQueryProperties
-
ncclResult_t ncclCommQueryProperties(ncclComm_t comm, ncclCommProperties_t *props)
Exposes communicator properties by filling in props. Before calling this function, props must be initialized using
NCCL_COMM_PROPERTIES_INITIALIZER. Introduced in NCCL 2.29.Note that this is a host-side function.
ncclCommProperties_t
-
type ncclCommProperties_t
A structure describing the properties of the communicator. Introduced in NCCL 2.29. Properties include:
-
int rank
Rank within the communicator.
-
int nRanks
Size of the communicator.
-
int cudaDev
CUDA device index.
-
int nvmlDev
NVML device index.
-
bool deviceApiSupport
Whether the device API is supported. If false, a
ncclDevCommcannot be created.
-
bool multimemSupport
Whether ranks in the same LSA team can communicate using multimem. If false, a
ncclDevCommcannot be created with multimem resources.
-
ncclGinType_t ginType
The GIN type supported by the communicator. If equal to
NCCL_GIN_TYPE_NONE, ancclDevCommcannot be created with GIN connection typeNCCL_GIN_CONNECTION_FULL.
-
ncclGinType_t railedGinType
The railed GIN type supported by the communicator. If equal to
NCCL_GIN_TYPE_NONE, ancclDevCommcannot be created with GIN connection typeNCCL_GIN_CONNECTION_RAIL. Available since NCCL 2.29.4.
-
int rank
ncclGinType_t
ncclGinConnectionType_t
-
type ncclGinConnectionType_t
Specifies the type of GIN connection for device communicators. This enum controls whether GIN (GPU-Initiated Networking) resources should be allocated and what connection type to use. Used in
ncclDevCommRequirementswhen creating device communicators. Available since NCCL 2.29.4.-
NCCL_GIN_CONNECTION_NONE
No GIN connectivity.
-
NCCL_GIN_CONNECTION_FULL
Full GIN connectivity. Each rank is connected to all other ranks.
-
NCCL_GIN_CONNECTION_RAIL
Railed GIN connectivity. Each rank is connected to other ranks in the same rail team.
-
NCCL_GIN_CONNECTION_NONE
Host-Accessible Device Pointer Functions
Host functions. The following are callable from host code only. They provide host-side access to device pointer functionality, enabling host code to obtain pointers to LSA memory regions.
All functions return ncclResult_t error codes. On success, ncclSuccess is returned.
On failure, appropriate error codes are returned (e.g., ncclInvalidArgument for invalid parameters,
ncclInternalError for internal failures), unless otherwise specified.
The returned pointers are valid for the lifetime of the window. Pointers should not be used after either the window or communicator is destroyed. Obtained pointers are device pointers.
ncclGetLsaMultimemDevicePointer
-
ncclResult_t ncclGetLsaMultimemDevicePointer(ncclWindow_t window, size_t offset, void **outPtr)
Returns a multimem base pointer for the LSA team associated with the given window. This function provides host-side access to the multimem memory functionality.
window is the NCCL window object (must not be NULL). offset is the byte offset within the window. outPtr is the output parameter for the multimem pointer (must not be NULL).
This function requires LSA multimem support (multicast capability on the system). The window must be registered with a communicator that supports symmetric memory, and the hardware must support NVLink SHARP multicast functionality.
Note
If the system does not support multimem, the function returns
ncclSuccesswith*outPtrset tonullptr. This allows applications to gracefully detect and handle the absence of multimem support without breaking the communicator. Users should check if the returned pointer isnullptrto determine availability.Example:
void* multimemPtr; ncclResult_t result = ncclGetLsaMultimemDevicePointer(window, 0, &multimemPtr); if (result == ncclSuccess) { if (multimemPtr != nullptr) { // Use multimemPtr for multimem operations } else { // Multimem not supported, use fallback approach } }
ncclGetMultimemDevicePointer
-
ncclResult_t ncclGetMultimemDevicePointer(ncclWindow_t window, size_t offset, ncclMultimemHandle multimem, void **outPtr)
Returns a multimem base pointer using a provided multimem handle instead of the window’s internal multimem. This function enables using external or custom multimem handles for pointer calculation.
window is the NCCL window object (must not be NULL). offset is the byte offset within the window. multimem is the multimem handle containing the multimem base pointer (multimem.mcBasePtr must not be NULL). outPtr is the output parameter for the multimem pointer (must not be NULL).
This function requires LSA multimem support (multicast capability on the system).
Note
If the system does not support multimem, the function returns
ncclSuccesswith*outPtrset tonullptr. The function validates thatmultimem.mcBasePtris not nullptr before proceeding.Example:
// Get multimem handle from device communicator setup ncclMultimemHandle customHandle; // ... (obtain handle) void* multimemPtr; ncclResult_t result = ncclGetMultimemDevicePointer(window, 0, customHandle, &multimemPtr); if (result == ncclSuccess) { if (multimemPtr != nullptr) { // Use multimemPtr for multimem operations with custom handle } else { // Multimem not supported, use fallback approach } }
ncclGetLsaDevicePointer
-
ncclResult_t ncclGetLsaDevicePointer(ncclWindow_t window, size_t offset, int lsaRank, void **outPtr)
Returns a load/store accessible pointer to the memory buffer of a specific LSA peer within the window. This function provides host-side access to LSA pointer functionality using LSA rank directly.
window is the NCCL window object (must not be NULL). offset is the byte offset within the window (must be >= 0 and < window size). lsaRank is the LSA rank of the target peer (must be >= 0 and < LSA team size). outPtr is the output parameter for the LSA pointer (must not be NULL).
On success,
ncclSuccessis returned and the LSA pointer is returned inoutPtr.The window must be registered with a communicator that supports LSA. The LSA rank must be within the valid range for the LSA team, and the target peer must be load/store accessible (P2P connectivity required).
Example:
void* lsaPtr; ncclResult_t result = ncclGetLsaDevicePointer(window, 0, 1, &lsaPtr); if (result == ncclSuccess) { // Use lsaPtr to access LSA peer 1's memory }
ncclGetPeerDevicePointer
-
ncclResult_t ncclGetPeerDevicePointer(ncclWindow_t window, size_t offset, int peer, void **outPtr)
Returns a load/store accessible pointer to the memory buffer of a specific world rank peer within the window. This function converts world rank to LSA rank internally and provides host-side access to peer pointer functionality.
window is the NCCL window object (must not be NULL). offset is the byte offset within the window. peer is the world rank of the target peer (must be >= 0 and < communicator size). outPtr is the output parameter for the peer pointer (must not be NULL).
On success,
ncclSuccessis returned and the peer pointer is returned inoutPtr.If the peer is not reachable via LSA (not in LSA team),
outPtris set to NULL andncclSuccessis returned. This matches the behavior of the device-sidencclGetPeerPointerfunction.The window must be registered with a communicator that supports LSA. The peer rank must be within the valid range for the communicator, and the target peer must be load/store accessible (P2P connectivity required).
Example:
void* peerPtr; ncclResult_t result = ncclGetPeerDevicePointer(window, 0, 2, &peerPtr); if (result == ncclSuccess) { if (peerPtr != NULL) { // Use peerPtr to access world rank 2's memory } else { // Peer 2 is not reachable via LSA } }