NCCL Release 2.3.4

Key Features and Enhancements

This NCCL release includes the following key features and enhancements.
  • Improve performance tuning on large number of ranks.
  • Add NCCL_P2P_LEVEL and NCCL_IB_GDR_LEVEL knobs to finely control when to use GPU Direct P2P and GPU Direct RDMA.
  • Reduce setup time for large scale jobs.
  • Increased maximum number of rings supported to 16.
  • Added a runtime NCCL version API: ncclGetVersion().
  • Added NCCL_DEBUG_SUBSYS to allow filtering of NCCL_DEBUG=INFO logging from different subsystems.
  • Support for Turing based systems.

Fixed Issues

  • Fix hang on Power platforms.
  • Fix low inter-node bandwidth issue on multi-DGX2 systems.
  • Fix crash when used with PID isolator.