NVIDIA® NVSHMEM 3.3.24 Release Notes#

NVSHMEM is an implementation of the OpenSHMEM specification for NVIDIA GPUs. The NVSHMEM programming interface implements a Partitioned Global Address Space (PGAS) model across a cluster of NVIDIA GPUs. NVSHMEM provides an easy-to-use interface to allocate memory that is symmetrically distributed across the GPUs. In addition to a CPU-side interface, NVSHMEM provides a NVIDIA® CUDA® kernel-side interface that allows CUDA threads to access any location in the symmetrically-distributed memory.

The release notes describe the key features, software enhancements and improvements, and known issues for NVSHMEM 3.3.20 and earlier releases.

Key Features and Enhancements#

This NVSHMEM release includes the following key features and enhancements:

  • SM70 support with CUDA 13 was removed in 3.3.20, that led SM75 builds, which are still supported with CUDA 13, to fail. This release adds SM75 support back to NVSHMEM CUDA 13 builds.

Compatibility#

NVSHMEM 3.3.24 has been tested with the following:

CUDA Toolkit:

  • 12.2

  • 12.6

  • 12.9

  • 13.0

CPUs:

  • On x86 and NVIDIA Grace™ processors

GPUs:

  • NVIDIA Ampere A100

  • NVIDIA Hopper™

  • NVIDIA Blackwell®

Limitations#

When using the Libfabric transport with NVSHMEM_LIBFABRIC_PROVIDER=EFA, the user is responsible for ensuring that the libfabric environment variable FI_EFA_ENABLE_SHM_TRANSFER is set to 0 before launching their application. While NVSHMEM does set this variable during initialization, it can be ignored by the EFA provider if it was already initialized by the launcher, for example when using mpirun.

Known Issues#

Same as 3.3.9