NCCL Release 2.9.6
This is the NCCL 2.9.6 release notes. For previous NCCL release notes, refer to the NCCL Archives.
Compatibility
-
Deep learning framework containers. Refer to the Support Matrix for the supported container version.
-
This NCCL release supports CUDA 10.1, CUDA 10.2, CUDA 11.0, and CUDA 11.3.
Key Features and Enhancements
-
Added support for CUDA graphs
-
Improved performance for CollNet (SHARP)
-
Fuse PCI Gen4 switches showing a two-level hierarchy into a single level.
-
Improve NIC balancing for communicators using a single GPU per node.
Fixed Issues
-
Fixed bootstrap hang in case of reordered packets causing connections to be inverted.
-
Fix locking issue causing NCCL calls to block until previous operations were complete.