NCCL Release 2.12.7
This is the NCCL 2.12.7 release notes. For previous NCCL release notes, refer to the NCCL Archives.
Compatibility
- 
                                 Deep learning framework containers. Refer to the Support Matrix for the supported container version. 
- 
                                 This NCCL release supports CUDA 10.2, CUDA 11.0, and CUDA 11.6. 
Key Features and Enhancements
This NCCL release includes the following key features and enhancements.
- 
                              Added NVLink-optimized network communication to keep traffic rail-local (PXN). 
- 
                              Improved alltoall latency by aggregating messages within a node to a given destination. 
- 
                              Added new v5 plugin API with grouped receives and tags, keeping compatibility for v4 plugins. 
- 
                              Added naming of NCCL threads to help debugging. 
- 
                              Added support for Relaxed Ordering for IB. 
- 
                              Added profiling and timing infrastructure. 
Fixed Issues
- 
                                 Fixed NVLink detection and avoid data corruption when some NVLinks are down.