NCCL Release 2.9.6
This is the NCCL 2.9.6 release notes. For previous NCCL release notes, refer to the NCCL Archives.
Compatibility
- 
                                 Deep learning framework containers. Refer to the Support Matrix for the supported container version. 
- 
                                 This NCCL release supports CUDA 10.1, CUDA 10.2, CUDA 11.0, and CUDA 11.3. 
Key Features and Enhancements
- 
                                 Added support for CUDA graphs 
- 
                                 Improved performance for CollNet (SHARP) 
- 
                                 Fuse PCI Gen4 switches showing a two-level hierarchy into a single level. 
- 
                                 Improve NIC balancing for communicators using a single GPU per node. 
Fixed Issues
- 
                                 Fixed bootstrap hang in case of reordered packets causing connections to be inverted. 
- 
                                 Fix locking issue causing NCCL calls to block until previous operations were complete.