NCCL Release 2.3.4
Key Features and Enhancements
This NCCL release includes the following key features and enhancements.
- Improve performance tuning on large number of ranks.
- Add NCCL_P2P_LEVEL and NCCL_IB_GDR_LEVEL knobs to finely control when to use GPU Direct P2P and GPU Direct RDMA.
- Reduce setup time for large scale jobs.
- Increased maximum number of rings supported to 16.
- Added a runtime NCCL version API: ncclGetVersion().
- Added NCCL_DEBUG_SUBSYS to allow filtering of NCCL_DEBUG=INFO logging from different subsystems.
- Support for Turing based systems.
Fixed Issues
- Fix hang on Power platforms.
- Fix low inter-node bandwidth issue on multi-DGX2 systems.
- Fix crash when used with PID isolator.