NCCL Release 2.11.4
This is the NCCL 2.11.4 release notes. For previous NCCL release notes, refer to the NCCL Archives.
Compatibility
Key Features and Enhancements
This NCCL release includes the following key features and enhancements.
-
Added new API for creating a reduction operation which multiplies the input by a rank-specific scalar before doing an inter-rank summation (see: ncclRedOpCreatePreMulSum).
-
Improved CollNet (SHARP) performance of ncclAllReduce when captured in a CUDA Graph via user buffer registration.
-
Added env NCCL_NET_PLUGIN=“<suffix>” to allow the user a way to choose among multiple NCCL net plugins by substituting into libnccl-net-<suffix>.so.
Fixed Issues
-
Fixed memory leak of NVB connections.
-
Fixed crash of ncclGroup() containing mixed datatypes/operations (GitHub issue #560, introduced in NCCL 2.10.3).
-
Fixed topology detection of IB Virtual Functions (SR-IOV).