NVSHMEM Release 1.1.0
This is the NVIDIA® NVSHMEM™ 1.1.0 release notes.
Key Features And Enhancements
-
Implemented the nvshmem_<type>_put_signal API from OpenSHMEM 1.5.
-
Added the nvshmemx_signal_op API.
-
Optimized the implementation of a signal set operation over P2P connected GPUs.
-
Optimized the performance of the nvshmem_fence() function.
-
Optimized the latency of the NVSHMEM atomics API.
-
Fixed a bug in the nvshmem_ptr API.
-
Fixed a bug in the implementation of the host-side strided transfer (iput, iget,and so on) API.
-
Fixed a bug in the on-stream reduction for the long long datatype.
-
Fixed a hang during nvshmem barrier collective operation.
Known Issues
-
NVSHMEM and libraries that use NVSHMEM can only be built as static libraries, not as shared libraries.
This is because linking of CUDA device symbols does not work across shared libraries.
-
NVSHMEM collective operations with overlapping active sets are known not to work in some scenarios.
-
nvshmem_quiet only ensures PE-PE ordering and visibility.
They do not ensure global ordering and visibility.