NVIDIA® NVSHMEM 3.5.21 Release Notes#

NVSHMEM is an implementation of the OpenSHMEM specification for NVIDIA GPUs. The NVSHMEM programming interface implements a Partitioned Global Address Space (PGAS) model across a cluster of NVIDIA GPUs. NVSHMEM provides an easy-to-use interface to allocate memory that is symmetrically distributed across the GPUs. In addition to a CPU-side interface, NVSHMEM provides a NVIDIA® CUDA® kernel-side interface that allows CUDA threads to access any location in the symmetrically-distributed memory.

The release notes describe the key features, software enhancements and improvements, and known issues for NVSHMEM 3.5.21 and earlier releases.

Key Features and Enhancements#

This NVSHMEM release includes the following key features and enhancements:

  • Fixed a bug that was related to ABI compatibility breakage for the internal team structure.

NVSHMEM4Py release 0.2.2 includes the following:

  • Removed an incorrect assumption that any NVSHMEM4Py-managed buffers will have at most one child buffer (peer or multicast)

Compatibility#

NVSHMEM 3.5.21 has been tested with the following:

NCCL:

  • 2.28.3

CUDA Toolkit:

  • 12.4

  • 12.9

  • 13.0

  • 13.1

CPUs:

  • x86 and NVIDIA Grace™ processors

GPUs:

  • NVIDIA Ampere A100

  • NVIDIA Hopper™

  • NVIDIA Blackwell

Limitations#

Same as 3.5.19

Known Issues#

  • The internal layout of RC-connected QPs changed starting in 3.5.19 causing ABI compatibility breakage when enabling IBGDA.