InfiniBand Topology Provider
Topograph provides two variations of InfiniBand provider. Both discover the IB fabric switch tree using ibnetdiscover, which is useful for any cluster — CPU-only, mixed, or GPU-accelerated — where topology-aware scheduling across an InfiniBand fabric improves workload performance. NVLink domain discovery is an additional capability that applies only to nodes with NVLink-connected NVIDIA GPUs.
Why automate IB discovery? Hand-maintaining IB topology — a static topology.conf or a set of hand-applied node labels — is feasible at ~32 nodes with a stable network and a careful operator. It does not scale. At 1,000 nodes with InfiniBand fabric churn, NVLink partitions shifting with tenant allocation, and a constant background rate of link degradation and node cycling, manual maintenance becomes the dominant source of scheduling misplacement. Topograph keeps topology data current as the cluster changes, removing that burden.
The choice of which to use depends on the specifics of the deployment environment:
- Use
infiniband-bmfor bare-metal clusters (e.g. Slurm) - Use
infiniband-k8sfor Kubernetes clusters
If NetQ is deployed in your environment, consider using the NetQ provider instead — it discovers topology via the NetQ management API rather than directly from the fabric, which avoids node access requirements and is the standard approach for Spectrum-X environments.
For Multi-Node NVLink (MNNVL) Kubernetes clusters (e.g. GB200 NVL72), use the DRA provider instead — it reads nvidia.com/gpu.clique labels set by the GPU Operator’s DRA driver and is the Kubernetes-native integration path for MNNVL topology.
Both variants are presently single-region only (multi-region requests return a 400 Bad Request error). No CSP credentials are required.
Output
Both variants produce the same topology representation, and are in turn consumed by whichever engine you configure:
- Slurm engine (
engine: slurm) — writes atopology.conffile describing the switch tree, used by the Slurm topology plugin for topology-aware scheduling - Kubernetes engine (
engine: k8s) — appliesnetwork.topology.nvidia.com/labels to nodes reflecting their position in the switch hierarchy and (where applicable) their NVLink domain - Slinky engine (
engine: slinky) — writes topology data to a Kubernetes ConfigMap for Slurm-on-Kubernetes deployments
See the engine documentation (docs/engines/) for details on each output format.
infiniband-bm (Bare-Metal)
Prerequisites
pdshmust be installed on the node running Topograph and able to reach at least one node per IB fabric segment — Topograph discovers the full fabric from a single entry point per segment, so every node does not need to be reachable via pdshibnetdiscovermust be available on cluster nodes (invoked viapdshwithsudo) — part of the standardinfiniband-diagspackage (dnf install infiniband-diags/apt install infiniband-diags), expected to already be present on any properly configured IB system- NVIDIA GPU driver required on nodes with NVLink-connected GPUs — used to collect NVLink clique IDs via
nvidia-smi. Nodes without NVLink are included in the IB switch tree but excluded from block topology.
How It Works
- Runs
sudo ibnetdiscoverviapdshon one node per IB fabric segment to map the full switch tree - On NVIDIA GPU nodes: runs
nvidia-smi -q | grep "ClusterUUID\|CliqueId" | sort -uviapdshacross all nodes to collect NVLink clique IDs. The resultingacceleratorlabel value isClusterUUID.CliqueId— the same format asnvidia.com/gpu.cliqueset by the GPU Operator device plugin on MNNVL systems. - Combines the switch tree and any NVLink clique data into the topology graph
Configuration
No credentials or parameters are required. Set provider: infiniband-bm in your Topograph config:
Verifying the Output
After triggering topology generation, query the result endpoint:
For the Slurm engine, verify the generated topology.conf reflects the expected switch hierarchy. See the Slurm engine documentation for details.
infiniband-k8s (Kubernetes)
Prerequisites
- Topograph deployed via Helm — the node-data-broker DaemonSet (a Topograph subchart, enabled by default) collects NVLink clique IDs from each node and stores them as Kubernetes node annotations (
topograph.nvidia.com/cluster-id) - NVIDIA GPU Operator — standard on NVIDIA GPU Kubernetes clusters; manages the device plugin DaemonSet used to read NVLink clique IDs. Required only for NVLink domain discovery; on clusters without NVLink-connected GPUs this does not apply and the provider will still discover the IB switch tree.
How It Works
- Runs
ibnetdiscoverby exec-ing into a node-data-broker pod on each node to map the switch tree - On NVIDIA GPU nodes: reads NVLink clique IDs from the
topograph.nvidia.com/cluster-idnode annotations set by the node-data-broker. The resultingacceleratorlabel value isClusterUUID.CliqueId— the same format asnvidia.com/gpu.cliqueset by the GPU Operator device plugin on MNNVL systems. - Combines the switch tree and any NVLink clique data into the topology graph
Configuration
No credentials are required. The provider uses the in-cluster service account automatically.
Set provider: infiniband-k8s in your Topograph config:
Parameters
The following optional parameter can be passed in the topology request payload:
To override the GPU Operator namespace or device plugin DaemonSet name (defaults: gpu-operator and nvidia-device-plugin-daemonset), set these via node-data-broker.initc.extraArgs in your Helm values — they are init container arguments, not provider request parameters:
Example request payload with nodeSelector:
Verifying the Output
After topology generation, inspect the node labels applied by Topograph:
See the Kubernetes engine documentation for details on the label schema.