NCCL Release 2.9.6

This is the NCCL 2.9.6 release notes. For previous NCCL release notes, refer to the NCCL Archives.


NCCL 2.9.6 has been tested with the following:

Key Features and Enhancements

This NCCL release includes the following key features and enhancements.
  • Added support for CUDA graphs

  • Improved performance for CollNet (SHARP)

  • Fuse PCI Gen4 switches showing a two-level hierarchy into a single level.

  • Improve NIC balancing for communicators using a single GPU per node.

Fixed Issues

The following issues have been resolved in NCCL 2.9.6:
  • Fixed bootstrap hang in case of reordered packets causing connections to be inverted.

  • Fix locking issue causing NCCL calls to block until previous operations were complete.