NVSHMEM Unsupported Operations#

This section describes some key NVSHMEM unsupported operations considerations when developing applications using NVSHMEM runtime.

Toolchaining#

NVSHMEM only supports compiling applications by using GNU (e.g., nvcc) or Clang compilers. NVSHMEM does not natively support other compilers, such as the NVIDIA HPC C++ compiler.

Operations#

The following list of operations are not natively supported, and the workarounds are included with each item:

Host-side memory allocation: NVSHMEM does not support host memory as a target for its device-initiated communication operations. There is currently no process to allocate symmetric memory from the host. However, host memory can be registered with NVSHMEM and used as the local operand of an operation (src for put operations, dest for get operations). The options available to users are:
- If the host memory is the remote operand:
  1. Copy the data from host memory to the symmetric heap by using cudaMemcpyAsync and its friends.
  2. Complete the device op to/from data in the symmetric heap.
  3. Copy the data back from symmetric heap to host memory by using cudaMemcpyAsync and its friends.
- If the host memory is the local operand:
  1. Register the host memory using nvshmemx_buffer_register.
  2. Pass the registered pointer to the desired NVSHMEM APIs.
  3. When the buffer is no longer needed by the application, deregister the buffer using nvshmemx_buffer_unregister before freeing it.
If a host-only implementation is sufficient, NVSHMEM can be initialized in the same process as a host-only SHMEM application. Host memory can be allocated by SHMEM and used with the host-only SHMEM APIs.
Atomic floating point min/max operation.

This is possible to emulate using nvshmem_TYPENAME_OPERATION_reduce. Refer to the supported reduce ops table for more information.