NCCL Release 2.19.3
This is the NCCL 2.19.3 release notes. For previous NCCL release notes, refer to the NCCL Archives.
Compatibility
-
Deep learning framework containers. Refer to the Support Matrix for the supported container version.
-
This NCCL release supports CUDA 11.0, CUDA 12.0, CUDA 12.2, and CUDA 12.3.
Key Features and Enhancements
This NCCL release includes the following key features and enhancements.
-
Added local user buffer registration for NVLink SHARP.
-
Improved performance on Hopper GPUs (H800/H100).
-
Added tuning plugin support.
-
Added NET API v7 to allow for device side packet reordering; removed v4 plugin support.
-
Added support for RoCE ECE.
-
Added support for C2C links.
-
Disabled network flush by default on H100.
Fixed Issues
The following issues have been resolved in NCCL 2.19.3:
-
Better detection of shared memory allocation failures and report them instead of crashing later with a "Bus error."
-
Fixed missing thread unlock in bootstrap code.
Known Issues
-
Alltoall performance at scale may see degradation for medium-size operations. Setting NCCL_NCHANNELS_PER_NET_PEER=1 should workaround the problem.
-
A hang may occur during alltoall connection establishment between multiple nodes (more likely at scale). This hang can be avoided by setting NCCL_CUMEM_ENABLE=0.
Updating the GPG Repository Key
To best ensure the security and reliability of our RPM and Debian package repositories, NVIDIA is updating and rotating the signing keys used by apt, dnf/yum, and zypper package managers beginning on April 27, 2022. Failure to update your repository signing keys will result in package management errors when attempting to access or install NCCL packages. To ensure continued access to the latest NCCL release, please follow the updated NCCL installation guide.