NVIDIA Deep Learning NCCL Documentation
Release Notes (PDF) - v2.23.4 - Last updated September 16, 2024

NCCL Release 2.23.4

This is the NCCL 2.23.4 release notes. For previous NCCL release notes, refer to the NCCL Archives.

Compatibility

NCCL 2.23.4 has been tested with the following:

Key Features and Enhancements

This NCCL release includes the following key features and enhancements.

  • Add a new scalable initialization API.

  • Accelerate bootstrap operations.

  • Add new PAT (Parallel Aggregated Trees) algorithm for allgather and reduce scatter operations using one GPU per node.

  • Accelerate intra-node communication when buffers are registered.

  • Add profiler plugin API.

  • Make CUDA calls within graph allocation asynchronous.

  • Use fatal RDMA asynchronous events to stop network operations early.

  • Disable GPU Direct P2P on AMD CPUs when using more than 2 GPUs.

  • Add parameter to set the location of the user configuration file.

  • Increase default IB timeout from 18 to 20.

Fixed Issues

The following issues have been resolved in NCCL 2.23.4:

  • Fixed GPU Direct RDMA check on linux kernels 6.6+.

  • Fixed performance regression when mixing small and large operations.

  • Fixed crash in topology detection when devices have a NUMA ID of -1.

  • Fixed Tree graph search when NCCL_CROSS_NIC is set to 1..

  • Fixes for IB operation on multi-node NVLink systems.

Updating the GPG Repository Key

To best ensure the security and reliability of our RPM and Debian package repositories, NVIDIA is updating and rotating the signing keys used by apt, dnf/yum, and zypper package managers beginning on April 27, 2022. Failure to update your repository signing keys will result in package management errors when attempting to access or install NCCL packages. To ensure continued access to the latest NCCL release, please follow the updated NCCL installation guide.