NVSHMEM Release 1.1.3

This is the NVIDIA® NVSHMEM™ 1.1.3 release notes.

Key Features And Enhancements

This NVSHMEM release includes the following key features and enhancements:
  • Implemented the nvshmem_<type>_put_signal API from OpenSHMEM 1.5.

  • Added the nvshmemx_signal_op API.

  • Optimized the implementation of a signal set operation over P2P connected GPUs.

  • Optimized the performance of the nvshmem_fence() function.

  • Optimized the latency of the NVSHMEM atomics API.

  • Fixed a bug in the nvshmem_ptr API.

  • Fixed a bug in the implementation of the host-side strided transfer (iput, iget,and so on) API.

  • Fixed a bug in the on-stream reduction for the long long datatype.

  • Fixed a hang during nvshmem barrier collective operation.


NVSHMEM 1.1.3 has been tested with the following:

Known Issues

  • NVSHMEM and libraries that use NVSHMEM can only be built as static libraries, not as shared libraries.

    This is because linking of CUDA device symbols does not work across shared libraries.

  • NVSHMEM collective operations with active sets are known not to work in some scenarios.

  • Concurrent NVSHMEM memory allocation operations and collective operations are not supported.

  • nvshmem_barrier*, nvshmem_quiet and nvshmem_wait_until only ensure PE-PE ordering and visibility on systems with both NVLink and InfiniBand.

    They do not ensure global ordering and visibility.