NVIDIA Deep Learning NCCL Documentation
Release Notes (PDF) - v2.21.5 - Last updated June 18, 2024

NCCL Release 2.13.4

This is the NCCL 2.13.4 release notes. For previous NCCL release notes, refer to the NCCL Archives.


NCCL 2.13.4 has been tested with the following:

Key Features and Enhancements

This NCCL release includes the following key features and enhancements.

  • Optimize CUDA graph launch; avoid launching a CPU callback for intra-node operations.

  • Simplify kernel common code to improve the latency of send/recv operations.

  • Strengthen CUDA streams semantics.

  • Change NET API to v6, to add dmabuf support.

  • Add ncclGetLastError() function.

  • Add ncclRemoteError code and use it for remote network errors.

  • Support the use of a different NCCL_NET parameter per communicator.

  • Add support for SHM and P2P transfers using cudaMemcpy.

Fixed Issues

The following issues have been resolved in NCCL 2.13.4:
  • Fix multi-receive size encoding which could cause flush to be skipped in corner cases mixing zero-bytes send/receive operations and non-zero-bytes send/receive operations.

  • Replace busy polling in the bootstrap thread waiting for ranks to check in by a blocking accept.

Updating the GPG Repository Key

To best ensure the security and reliability of our RPM and Debian package repositories, NVIDIA is updating and rotating the signing keys used by apt, dnf/yum, and zypper package managers beginning on April 27, 2022. Failure to update your repository signing keys will result in package management errors when attempting to access or install NCCL packages. To ensure continued access to the latest NCCL release, please follow the updated NCCL installation guide.