NVIDIA Deep Learning NCCL Documentation
Release Notes (PDF) - v2.21.5 - Last updated April 5, 2024

NCCL Release 2.4.2

Key Features and Enhancements

This NCCL release includes the following key features and enhancements.
  • Implemented tree-based algorithms for better All Reduce performance at scale and with small and medium size messages.
  • Support for external network plugins (e.g., libfabric).
  • Add ncclCommGetAsyncError() function to report errors happening during collective operations.
  • Add ncclCommAbort() function to destroy a communicator, aborting any outstanding operations.
  • Support different ranks having a different CUDA_VISIBLE_DEVICES.
  • Add a best-effort mechanism to check for size mismatch among collective calls.

Fixed Issues

  • Support communication between Mesos containers (Github issue #155).
  • Fix case where posix_fallocate() returns EINTR (Github issue #137).
  • NCCL threads no longer escape the CPU affinity set by the user or job scheduler.