NCCL Release 2.12.7
This is the NCCL 2.12.7 release notes. For previous NCCL release notes, refer to the NCCL Archives.
Compatibility
- 
                                 
Deep learning framework containers. Refer to the Support Matrix for the supported container version.
 - 
                                 
This NCCL release supports CUDA 10.2, CUDA 11.0, and CUDA 11.6.
 
Key Features and Enhancements
This NCCL release includes the following key features and enhancements.
- 
                              
Added NVLink-optimized network communication to keep traffic rail-local (PXN).
 - 
                              
Improved alltoall latency by aggregating messages within a node to a given destination.
 - 
                              
Added new v5 plugin API with grouped receives and tags, keeping compatibility for v4 plugins.
 - 
                              
Added naming of NCCL threads to help debugging.
 - 
                              
Added support for Relaxed Ordering for IB.
 - 
                              
Added profiling and timing infrastructure.
 
Fixed Issues
- 
                                 
Fixed NVLink detection and avoid data corruption when some NVLinks are down.