Kernel Launch Routines

NVSHMEMX_COLLECTIVE_LAUNCH

int nvshmemx_collective_launch(const void *func, dim3 gridDims, dim3 blockDims, void **args, size_t sharedMem, cudaStream_t stream)
func [IN]
A pointer to the function to launch on the device.
gridDims [IN]
The grid dimensions.
blockDims [IN]
The block dimensions.
args [IN]
Arguments to be passed to the device function.
sharedMem [IN]
The size of the shared memory.
stream [IN]
The stream on which the kernel should be launched.

Description

The nvshmemx_collective_launch function must be used to launch CUDA kernels on the GPU when the CUDA kernels use NVSHMEM synchronization or collective APIs. This call is collective across the PEs in the NVSHMEM job. It takes the same parameters as a CUDA kernel launch API. It uses a single device CUDA cooperative launch and hence provides all its guarantees. If a CUDA kernel in a PE calls a synchronizing NVSHMEM API (such as nvshmem_wait, nvshmem_barrier, nvshmem_barrier_all, or any other collective operation), then it is required to be launched using this API. Any CUDA kernel not using synchronizing NVSHMEM APIs (or not using NVSHMEM APIs at all), is not required to be launched by this API.

Specify gridDim or set it to 0. When gridDim is set to 0, the NVSHMEM runtime picks the largest grid size that can be used for the given kernel with CUDA cooperative launch on the current GPU.

Returns

Returns 0 on success or an error code on failure.

NVSHMEMX_COLLECTIVE_LAUNCH_QUERY_GRIDSIZE

int nvshmemx_collective_launch_query_gridsize(const void *func, dim3 blockDims, void **args, size_t sharedMem, int *gridsize)
func [IN]
A pointer to the function to launch on the device.
blockDims [IN]
The block dimensions.
args [IN]
Arguments to be passed to the device function.
sharedMem [IN]
The size of the shared memory.
gridsize [OUT]
The function returns in this variable the largest number of blocks (grid size) that can be used for collective launch.

Description

The nvshmemx_collective_launch_query_gridsize call is used to query the largest grid size that can be used for the given kernel with CUDA cooperative launch on the current GPU. The return value of this function can be passed as an argument to nvshmemx_collective_launch.

Returns

Returns 0 on success or an error code on failure.