NCCL Release 2.11.4

This is the NCCL 2.11.4 release notes. For previous NCCL release notes, refer to the NCCL Archives.


NCCL 2.11.4 has been tested with the following:

Key Features and Enhancements

This NCCL release includes the following key features and enhancements.

  • Added new API for creating a reduction operation which multiplies the input by a rank-specific scalar before doing an inter-rank summation (see: ncclRedOpCreatePreMulSum).

  • Improved CollNet (SHARP) performance of ncclAllReduce when captured in a CUDA Graph via user buffer registration.

  • Added env NCCL_NET_PLUGIN=“<suffix>” to allow the user a way to choose among multiple NCCL net plugins by substituting into libnccl-net-<suffix>.so.

Fixed Issues

The following issues have been resolved in NCCL 2.11.4:
  • Fixed memory leak of NVB connections.

  • Fixed crash of ncclGroup() containing mixed datatypes/operations (GitHub issue #560, introduced in NCCL 2.10.3).

  • Fixed topology detection of IB Virtual Functions (SR-IOV).