NCCL Release 2.4.2
Key Features and Enhancements
This NCCL release includes the following key features and enhancements.
- Implemented tree-based algorithms for better All Reduce performance at scale and with small and medium size messages.
- Support for external network plugins (e.g., libfabric).
- Add ncclCommGetAsyncError() function to report errors happening during collective operations.
- Add ncclCommAbort() function to destroy a communicator, aborting any outstanding operations.
- Support different ranks having a different CUDA_VISIBLE_DEVICES.
- Add a best-effort mechanism to check for size mismatch among collective calls.
Fixed Issues
- Support communication between Mesos containers (Github issue #155).
- Fix case where posix_fallocate() returns EINTR (Github issue #137).
- NCCL threads no longer escape the CPU affinity set by the user or job scheduler.