Release Notes :: NVIDIA Deep Learning NCCL Documentation

NVIDIA Deep Learning NCCL Documentation

Release Notes (PDF) - v2.20.5 - Last updated March 6, 2024

NCCL Release 2.12.7

This is the NCCL 2.12.7 release notes. For previous NCCL release notes, refer to the NCCL Archives.

Compatibility

NCCL 2.12.7 has been tested with the following:

Deep learning framework containers. Refer to the Support Matrix for the supported container version.
This NCCL release supports CUDA 10.2, CUDA 11.0, and CUDA 11.6.

Key Features and Enhancements

This NCCL release includes the following key features and enhancements.

Added NVLink-optimized network communication to keep traffic rail-local (PXN).
Improved alltoall latency by aggregating messages within a node to a given destination.
Added new v5 plugin API with grouped receives and tags, keeping compatibility for v4 plugins.
Added naming of NCCL threads to help debugging.
Added support for Relaxed Ordering for IB.
Added profiling and timing infrastructure.

Fixed Issues

The following issues have been resolved in NCCL 2.12.7:

Fixed NVLink detection and avoid data corruption when some NVLinks are down.