Release Notes :: NVIDIA Deep Learning NCCL Documentation

NCCL Release 2.29.2

This is the NCCL 2.29.2 release notes. For previous NCCL release notes, refer to the NCCL Archives.

Compatibility

NCCL 2.29.2 has been tested with the following:

Deep learning framework containers. Refer to the Support Matrix for the supported container version.
This NCCL release supports CUDA 12.x and CUDA 13.x.

Key Features and Enhancements

This NCCL release includes the following key features and enhancements.

Device API has improved GIN documentation to clarify the support matrix, added host-accessible device pointer functions from symmetric registered ncclWindows, added a new ncclCommQueryProperties API to check supported features before creating a DevComm, and modified structs to add versioning for backwards compatibility in future versions.
One-sided Host APIs were added (put, wait) for both network and NVL using zero-SM. Requires CUDA 12.5 or greater.
Added NCCL4Py, Python language bindings for NCCL. Supports native collectives, P2P and other NCCL operations, with automatic cleanup of NCCL-managed resources.
Added LLVM intermediate representation (IR) support, enabling consumption by diverse code generation systems.
Added a built-in hybrid (LSA+GIN) symmetric kernel for AllGather.
Implemented the new ncclCommGrow API which adds the ability to dynamically and efficiently add ranks to an existing NCCL communicator. Use it with ncclCommShrink to adjust membership of communicators in response to failing and recovering nodes. It also addresses the need for elastic applications to expand a running job by integrating new ranks.
Added multi-segment restoration which expands buffer registration to support multiple segments of physical memory mapped to one contiguous VA space for the p2p, ib, and nvls transports. This enabled support for expandable segments in PyTorch.
Improved scalability of AllGatherV with a new scalable allgatherv pattern (group of broadcasts) and a new scheduler path and new kernels to improve performance at large scale.
Improved debuggability and observability with realtime monitoring support in RAS, Prometheus output format for Inspector, and profiler support for CopyEngine (CE) based collectives.
Added a new contribution guide available at: https://github.com/NVIDIA/nccl/blob/master/CONTRIBUTING.md
Added NCCL_SOCKET_POLL_TIMEOUT_MSEC which allows waiting instead of spinning during bootstrap in order to reduce CPU usage. (Github PR #1759)

Fixed Issues

The following issues have been resolved in NCCL 2.29.2:

Fixed segfault in ncclGin initialization that can happen if ncclGinIbGdaki.devices() fails after init() succeeds. (Github PR #1881)
Fixed crash that can happen when calling p2p and then collectives while using the same user buffer. (Github Issue #1859)
Fixed bug that was lowering performance on some sm80 or earlier machines with one NIC per GPU. (Github Issue #1876)
Cleared non-fatal CUDA errors so they do not propagate. (PyTorch Issue #164402)
Improved performance of large-size AllGather operations using symmetric memory buffers on Blackwell by transparently switching to CE collectives.
Improved the default number of channels per net peer for all-to-all, send, and recv to achieve better performance.
Improved performance tuning of 256M-512M message sizes on Blackwell for AllReduce.
Enabled built-in symmetric kernels only on fully connected nvlink systems, as PCIE systems do not perform as well.
Updated all2all, send, and recv to obey NCCL_NETDEVS_POLICY. For these operations, NCCL will by default use a subset of available network devices as dictated by the Network Device Policy.
Fixed a hang on GB200/300 + CX8 when the user disables GDR.
Fixed a bug that could cause AllReduce on ncclFloat8e4m3 to yield no algorithm/protocol available.

Updating the GPG Repository Key

To best ensure the security and reliability of our RPM and Debian package repositories, NVIDIA is updating and rotating the signing keys used by apt, dnf/yum, and zypper package managers beginning on April 27, 2022. Failure to update your repository signing keys will result in package management errors when attempting to access or install NCCL packages. To ensure continued access to the latest NCCL release, please follow the updated NCCL installation guide.