NVIDIA HPC-X Software Toolkit Rev 2.26

NCCL-RDMA-SHARP Plugins

NCCL-RDMA-SHARP plugins enable RDMA and switch-based collectives (SHARP) with NVIDIA's NCCL library.

This plugin replaces the default NCCL internal inter-node communication with RDMA-based transports. It implements both Point-to-Point transport and Collective transport(CollNet) (including SHARP Collective transport).

The environment variable NCCL_IBEXT_DISABLE enables/disables the use of the plugin. When set to NCCL_IBEXT_DISABLE=1, it disables the plugin, causing a fallback to NCCL's native internal communication.

The following environment variables enable the SHARP aggregation with NCCL when using the plugin.

Copy
Copied!
            

NCCL_COLLNET_ENABLE=1

Note

NVIDIA switches allow a limited number of streaming aggregation flows (maximum: 2). On systems with multiple GPUs and multiple HCAs, NCCL creates an aggregation streaming flow (NCCL Ring/Channel) per HCA rail. It is required to build the cluster topology in such a way that leaf level switches are connected to the same HCA rail from each server.

The following environment variable enables SHARP allgather overlap when using the plugin. This is useful when SHARP‑based Reduce‑Scatter is enabled, so the Reduce‑Scatter ↔ Allgather phase can overlap.

Copy
Copied!
            

SHARP_COLLNET_OVERLAP_AG=1

© Copyright 2026, NVIDIA. Last updated on Mar 1, 2026