NCCL Release 2.29.7
This is the NCCL 2.29.7 release notes. For previous NCCL release notes, refer to the NCCL Archives.
Compatibility
-
Deep learning framework containers. Refer to the Support Matrix for the supported container version.
-
This NCCL release supports CUDA 12.x and CUDA 13.x.
Key Features and Enhancements
This NCCL release includes the following key features and enhancements.
-
Device API and GIN: Added multi-context support for GIN with the option to request exclusive GIN contexts. Added VA-based GIN signals plus strict window ordering. Added advanced queue control for GIN, including queue depth, manual credit management and aggregation. Added GIN support for platforms with no cross rail connectivity. Added nLsaTeams to ncclCommQueryProperties. Decoupled GIN from NET plugin and topology.
-
New device APIs: Introduced Copy, ReduceCopy, and ReduceSum with various data types and ops.
-
Dynamic Memory Offload: Added ncclCommSuspend() and ncclCommResume() for releasing and restoring communicator memory. Added basic memory overhead tracking infrastructure. Multiple GPUs for a single process are not currently supported for ncclCommSuspend() and ncclCommResume(), this will be added in v2.30.
-
Built-in hybrid (LSA+GIN) symmetric kernel for ReduceScatter: Added new hierarchical kernels to improve performance and scalability of ReduceScatter. Requires symmetric memory registration and GIN support. Symmetric GIN kernels can be disabled with NCCL_SYM_GIN_KERNELS_ENABLE=0.
-
Port Failover: Added support for internal IB/RoCE plugin to continue working transparently when network errors occur. Added automatic port failover for GPUs having multiple local IB/RoCE ports/devices. Can be enabled by setting NCCL_IB_RESILIENCY_PORT_FAILOVER=1.
-
Symmetric memory: Added support for abort in symmetric kernels. Added NCCL_CHECK_MODE=DEBUG to validate symmetric buffer registration.
-
Project layout reorganization: Moved the ext-* directories to plugins (e.g., ext-net -> plugins/net). Moved ir and nccl4py under bindings. Moved examples to docs/examples.
-
Uses different signals for different peers in the GIN barrier.
-
Added NCCL_NO_CACHE to force NCCL to always re-read selected env vars.
-
Added CMake install and find_package support.
-
Added CMake for NCCL4Py build and updated Cybind integration.
-
Added preliminary backwards compatibility support to enable running LSA kernels compiled with NCCL 2.29.2/3 on NCCL 2.29.7. This is not supported for GIN yet.
-
Updated license to Apache 2.0.
Fixed Issues
The following issues have been resolved in NCCL 2.29.7:
-
Fixed problems related to the introduction of git_version.h. (Github Issue #1960)
-
Fixed oneRankReduce when the number of elements is not a multiple of block number. (Github Issue #1950)
-
Improved GIN handling in ncclCommGetAsyncError. (Github Issue #2019)
-
Fixed memory initialization in P2P transport. (Github Issue #1962)
-
Fixed hang issue in send/receive scheduling of repeated sparse patterns.
-
Fixed CE-based collective operations to fall back to cudaMemcpyAsync API when null/default stream is used.
-
Fixed symmetric window objects to be freed automatically during commFree.
-
Fixed a 16-bit overflow of signal and counter ids with GIN proxy.
-
Fixed GIN counters and signals to be reset upon ncclDevCommDestroy.
-
Fixed local data calculation during ncclGinIbP2PBarrier.
-
Fixed a crash when calling ncclNet.finalize() after a failed ncclNet_v10->init().
-
Fixed nccl4py compatibility with cuda.core 0.5.0.
Known Issues
-
Applications that use GIN APIs need to be recompiled with 2.29.7 to work with 2.29.7 runtime.
-
The Profiler Inspector example does not currently compile under CMake. This will be fixed soon.
Updating the GPG Repository Key
To best ensure the security and reliability of our RPM and Debian package repositories, NVIDIA is updating and rotating the signing keys used by apt, dnf/yum, and zypper package managers beginning on April 27, 2022. Failure to update your repository signing keys will result in package management errors when attempting to access or install NCCL packages. To ensure continued access to the latest NCCL release, please follow the updated NCCL installation guide.