NCCL Release 2.5.6

This is the NCCL 2.5.6 release notes. For previous NCCL release notes, refer to the NCCL Archives.

Key Features and Enhancements

This NCCL release includes the following key features and enhancements.
  • Added new protocol to improve performance on small operations.

  • Improved topology detection and tree/ring creation (#179, #262).

  • Improved multi-node tree performance by sending/receiving from different GPUs.

  • Added model-based tuning to switch between the different algorithms and protocols.

  • Reworked P2P/SHM detection in containers (#155, #248).

  • Added detection for duplicate CUDA devices and return an error (#231).

  • Added tuning for Google Cloud’s gVNIC platform.

Compatibility

NCCL 2.5.6 has been tested with the following:

Fixed Issues

The following issues have been resolved in NCCL 2.5.6:
  • Sporadic NCCL error "ring 0 does not loop back to start” (#179).

  • NCCL doesn't form proper rings on GCP V100s (#262).