NVIDIA Deep Learning NCCL Documentation
Release Notes (PDF) - v2.21.5 - Last updated April 5, 2024

NCCL Release 2.17.1

This is the NCCL 2.17.1 release notes. For previous NCCL release notes, refer to the NCCL Archives.

Compatibility

NCCL 2.17.1 has been tested with the following:

Key Features and Enhancements

This NCCL release includes the following key features and enhancements.

  • Add support for NVLink SHARP Reduction / Broadcast to accelerate intra-node allreduce operations.

  • Add new fields in the communicator configuration structure: Cooperative Group Array cluster size, minimum and maximum number of CTAs, network plugin name to use.

  • Update NVTX3 includes.

Known Issues

  • On systems where a NIC shares a PCI switch with only one GPU (like on HGX H100), the Tree algorithm will make data transit through the CPU, making the LL128 protocol unsafe. This could result in data corruption. You can workaround this issue by setting the following:

    NCCL_IB_PCI_RELAXED_ORDERING=0

    Another solution is to disable the LL128 protocol with the following:

    NCCL_PROTO=^LL128
  • Performance is sub-optimal for NCCL communicators using 1 NIC per node on DGX H100/HGX H100 + NDR. It can be worked around by setting the following parameter:

    NCCL_MIN_NCHANNELS=4

Fixed Issues

The following issues have been resolved in NCCL 2.17.1:

  • Fix crash when one CollNet (SHARP) rail fails to initialize.

  • Re-enable the LL128 protocol on H100 when we use PXN to close rings.

Updating the GPG Repository Key

To best ensure the security and reliability of our RPM and Debian package repositories, NVIDIA is updating and rotating the signing keys used by apt, dnf/yum, and zypper package managers beginning on April 27, 2022. Failure to update your repository signing keys will result in package management errors when attempting to access or install NCCL packages. To ensure continued access to the latest NCCL release, please follow the updated NCCL installation guide.