NCCL Release 2.12.7
This is the NCCL 2.12.7 release notes. For previous NCCL release notes, refer to the NCCL Archives.
Compatibility
-
Deep learning framework containers. Refer to the Support Matrix for the supported container version.
-
This NCCL release supports CUDA 10.2, CUDA 11.0, and CUDA 11.6.
Key Features and Enhancements
This NCCL release includes the following key features and enhancements.
-
Added NVLink-optimized network communication to keep traffic rail-local (PXN).
-
Improved alltoall latency by aggregating messages within a node to a given destination.
-
Added new v5 plugin API with grouped receives and tags, keeping compatibility for v4 plugins.
-
Added naming of NCCL threads to help debugging.
-
Added support for Relaxed Ordering for IB.
-
Added profiling and timing infrastructure.
Fixed Issues
-
Fixed NVLink detection and avoid data corruption when some NVLinks are down.