NCCL Release 2.3.4
Key Features and Enhancements
This NCCL release includes the following key features and enhancements. 
                           
                     - Improve performance tuning on large number of ranks.
 - Add NCCL_P2P_LEVEL and NCCL_IB_GDR_LEVEL knobs to finely control when to use GPU Direct P2P and GPU Direct RDMA.
 - Reduce setup time for large scale jobs.
 - Increased maximum number of rings supported to 16.
 - Added a runtime NCCL version API: ncclGetVersion().
 - Added NCCL_DEBUG_SUBSYS to allow filtering of NCCL_DEBUG=INFO logging from different subsystems.
 - Support for Turing based systems.
 
Fixed Issues
- Fix hang on Power platforms.
 - Fix low inter-node bandwidth issue on multi-DGX2 systems.
 - Fix crash when used with PID isolator.