NCCL Release 2.27.5
This is the NCCL 2.27.5 release notes. For previous NCCL release notes, refer to the NCCL Archives.
Compatibility
-
Deep learning framework containers. Refer to the Support Matrix for the supported container version.
-
This NCCL release supports CUDA 12.2, CUDA 12.4, and CUDA 12.9. The provided prebuilt binaries should work with other CUDA 12.x versions as well.
Key Features and Enhancements
This NCCL release includes the following key features and enhancements.
-
Optimized the network performance on GB200 systems by alternating the direction of the rings and the NIC to GPU assignment across communicators to limit unnecessary sharing.
-
Optimized the network performance on DGX B200 systems by adjusting the bandwidths provided to the graph search algorithm.
-
Added an example tuner plugin with CSV-based overrides.
Fixed Issues
The following issues have been resolved in NCCL 2.27.5:
-
Fixed the detection of C2C links in case GPU Direct RDMA is disabled between a GPU and a NIC.
-
Fixed PXN support on MNNVL systems, where NCCL would try (and fail) to share regular host memory across multiple nodes.
-
Further reduced the overheads of CUDA graph capturing, which increased in NCCL 2.26.2 for large graphs.
-
Enabled fp8 reductions in symmetric kernels on Blackwell with CUDA 12.8.
-
Restored the plugin name handling logic to make it possible to specify a path to the plugin.
-
Restored the ability to change NCCL_COLLNET_ENABLE during execution
-
Removed an x86 dependency from the example profiler.
Updating the GPG Repository Key
To best ensure the security and reliability of our RPM and Debian package repositories, NVIDIA is updating and rotating the signing keys used by apt, dnf/yum, and zypper package managers beginning on April 27, 2022. Failure to update your repository signing keys will result in package management errors when attempting to access or install NCCL packages. To ensure continued access to the latest NCCL release, please follow the updated NCCL installation guide.