NCCL Release 2.26.5
This is the NCCL 2.26.5 release notes. For previous NCCL release notes, refer to the NCCL Archives.
Compatibility
-
Deep learning framework containers. Refer to the Support Matrix for the supported container version.
-
This NCCL release supports CUDA 12.2, CUDA 12.4, and CUDA 12.9.
Fixed Issues
The following issues have been resolved in NCCL 2.26.5:
-
Minimized the performance impact of the device kernel profiling support when the profiler plugin is not loaded.
-
Reduced the overheads of CUDA graph capturing, which increased in NCCL 2.26.2 for large graphs.
-
Fixed the exchange of enhanced connection establishment (ECE) options to address potential slowdowns on networks utilizing RoCE.
-
Added testing if cuMem host allocations work and if not, disabling them. Enabled by default since NCCL 2.24 if the CUDA driver version is at least 12.6, such allocations rely on NUMA support, which is by default not available under Docker. We recommend invoking Docker with --cap-add SYS_NICE to enable it.
-
Worked around a potential hang in alltoall-like communication patterns on MNNVL systems at a scale of over 80 ranks.
-
Fixed an initialization error when running with NCCL_NET_GDR_C2C=1 on multiple MNNVL domains with non-uniform network configurations across nodes.
-
Fixed the printing of sub-seconds in the debug log when using a custom NCCL_DEBUG_TIMESTAMP_FORMAT setting.
Updating the GPG Repository Key
To best ensure the security and reliability of our RPM and Debian package repositories, NVIDIA is updating and rotating the signing keys used by apt, dnf/yum, and zypper package managers beginning on April 27, 2022. Failure to update your repository signing keys will result in package management errors when attempting to access or install NCCL packages. To ensure continued access to the latest NCCL release, please follow the updated NCCL installation guide.