NVSHMEM Release 1.1.3
This is the NVIDIA® NVSHMEM™ 1.1.3 release notes.
Key Features And Enhancements
-
Implemented the nvshmem_<type>_put_signal API from OpenSHMEM 1.5.
-
Added the nvshmemx_signal_op API.
-
Optimized the implementation of a signal set operation over P2P connected GPUs.
-
Optimized the performance of the nvshmem_fence() function.
-
Optimized the latency of the NVSHMEM atomics API.
-
Fixed a bug in the nvshmem_ptr API.
-
Fixed a bug in the implementation of the host-side strided transfer (iput, iget,and so on) API.
-
Fixed a bug in the on-stream reduction for the long long datatype.
-
Fixed a hang during the nvshmem barrier collective operation.
- Fixed __device__ nvshmem_quiet() to also do quiet on IB ops to self.
Known Issues
-
NVSHMEM and libraries that use NVSHMEM can only be built as static libraries, not as shared libraries.
This is because linking of CUDA device symbols does not work across shared libraries.
-
NVSHMEM collective operations with active sets are not supported.
-
Concurrent NVSHMEM memory allocation operations and collective operations are not supported.
-
nvshmem_barrier*, nvshmem_quiet, and nvshmem_wait_until only ensure PE-PE ordering and visibility on systems with NVLink and InfiniBand.
They do not ensure global ordering and visibility.