Configuration Reference
The Launch Kit configuration file (typically cluster-config.yaml, produced by l8k discover and consumed by l8k generate) is YAML. This page documents every top-level section.
networkOperator:
selectedRelease: "26.4"
version: v26.4.0
componentVersion: network-operator-v26.4.0
repository: nvcr.io/nvidia/cloud-native
namespace: nvidia-network-operator
imagePullSecrets: []
docsBaseURL: https://docs.nvidia.com/networking/display/kubernetes2610
docaDriver:
enable: true
version: doca3.4.0-26.04-0.8.6.0-0
unloadStorageModules: false
enableNFSRDMA: false
unloadThirdPartyRDMAModules: false
skipPreflightChecks: false
nvIpam:
poolName: nv-ipam-pool
startingSubnet: "192.168.2.0"
mask: 24
offset: 1
sriov:
ethernetMtu: 9000
infinibandMtu: 4000
numVfs: 8
priority: 90
resourceName: sriov_resource
networkName: sriov-network
hostdev:
resourceName: hostdev-resource
networkName: hostdev-network
rdmaShared:
resourceName: rdma_shared_resource
hcaMax: 63
ipoib:
networkName: ipoib-network
macvlan:
networkName: macvlan-network
nicConfigurationOperator:
deployNicInterfaceNameTemplate: true
rdmaPrefix: "rdma_r%rail%"
netdevPrefix: "eth_r%rail%"
spectrumX:
nicType: "1023"
overlay: "none"
rdmaPrefix: "roce_p%plane%_r%rail%"
netdevPrefix: "eth_p%plane%_r%rail%"
workload:
manifest: ""
profile:
fabric: ethernet
deployment: sriov
multirail: false
spectrumX:
spcxVersion: "RA2.1"
multiplaneMode: swplb
numberOfPlanes: 4
ai: false
clusterConfig:
- identifier: "dgx-b200-nvidia-b200"
machineType: "DGX-B200"
productType: "NVIDIA-B200"
capabilities:
nodes:
sriov: true
rdma: true
ib: false
workerNodes: ["worker-0", "worker-1"]
nodeSelector:
nvidia.kubernetes-launch-kit.machine: "DGX-B200-NVIDIA-B200"
thirdPartyRDMAModules: []
storageModules: []
linkType: Ethernet
pfs:
- deviceID: "1023"
pciAddress: "0000:05:00.0"
rdmaDevice: "mlx5_0"
networkInterface: "net1"
traffic: east-west
rail: 0
Network Operator version, image registry, namespace, and pull secrets.
Field | Description |
|---|---|
selectedRelease |
Pin to a release line. Supported: 25.10, 26.1, 26.4. Auto-fills version and image tags from an embedded catalog. Equivalent to the --network-operator-release flag. |
version |
Explicit Network Operator version. Overrides the catalog when set. |
componentVersion |
Tag for component images (CNI, device plugins, etc.). |
repository |
Container registry (default: nvcr.io/nvidia/mellanox). |
namespace |
Operator namespace (default: nvidia-network-operator). |
imagePullSecrets |
List of secret names. Propagated to NicClusterPolicy.spec.global.imagePullSecrets and per-group NicNodePolicy sub-specs. |
docsBaseURL |
Documentation URL embedded in generated annotations. |
OFED driver configuration and kernel driver dependencies validation.
Field | Description |
|---|---|
enable |
Include the OFED driver in generated manifests. Set to false to skip (or use --enable-doca-driver to flip). |
version |
DOCA driver version tag. |
unloadStorageModules |
Unload storage-over-RDMA modules (nvme_rdma, ib_isert, rpcrdma, …). Auto-set to true during discovery if such modules are detected. |
unloadThirdPartyRDMAModules |
Unload third-party RDMA modules (rdma_rxe, qedr, bnxt_re, …). Auto-set to true during discovery if such modules are detected. Storage and third-party module lists are sourced from the doca-driver-build project. |
enableNFSRDMA |
Enable NFS-over-RDMA support. |
skipPreflightChecks |
Skip the kernel driver dependencies validation. Useful for environments where it’s known-good. |
See Discover Workflow for how OFED-dependent modules are detected.
NV-IPAM configuration. Either provide an explicit subnets list or let Launch Kit auto-generate non-overlapping subnets per node group.
Field | Description |
|---|---|
poolName |
Pool name used in IPPool CRs. |
subnets |
Explicit list of {subnet, gateway} entries. Mutually exclusive with the auto-generation fields. |
startingSubnet |
First subnet for auto-generation (e.g., 192.168.2.0). |
mask |
Prefix length for auto-generated subnets. |
offset |
Increment used between auto-generated subnets. |
Profile-specific parameters — only the section for the selected profile is consumed.
Section | Field | Description |
|---|---|---|
sriov |
ethernetMtu / infinibandMtu |
MTU values per fabric. |
sriov |
numVfs |
Number of virtual functions per PF. |
sriov |
priority |
SriovNetworkNodePolicy priority. |
sriov |
resourceName / networkName |
Kubernetes resource and network names. |
hostdev |
resourceName / networkName |
Kubernetes resource and network names for host-device. |
rdmaShared |
resourceName |
Kubernetes resource name. |
rdmaShared |
hcaMax |
Maximum HCAs per host (soft limit). |
ipoib |
networkName |
IPoIB network name. |
macvlan |
networkName |
MacVLAN network name. |
Controls when NIC interface names are templated by the NIC Configuration Operator.
Field | Description |
|---|---|
deployNicInterfaceNameTemplate |
“Enable when needed”. Templates are deployed when groups have cross-rail PCI conflicts or when names are otherwise ambiguous. See Heterogeneous Clusters. |
rdmaPrefix |
RDMA device naming template (default: rdma_r%rail%). |
netdevPrefix |
Netdev naming template (default: eth_r%rail%). |
Spectrum-X-specific settings.
Field | Description |
|---|---|
nicType |
NIC type device ID. 1021 = ConnectX-7 NIC; 1023 = ConnectX-8 SuperNIC; a2dc = BlueField-3 SuperNIC. |
overlay |
Overlay mode. |
rdmaPrefix |
RDMA device naming template with %plane% and %rail% substitutions. |
netdevPrefix |
Netdev naming template with %plane% and %rail% substitutions. |
Field | Description |
|---|---|
manifest |
Path to a custom workload manifest. When set, Launch Kit patches it with network annotations, resource requests, and node affinity instead of generating an example DaemonSet. See Generate Workflow. |
Profile selection (also overridable via CLI flags).
Field | Description |
|---|---|
fabric |
ethernet or infiniband. |
deployment |
sriov, rdma_shared, or host_device. |
multirail |
Enable multirail. |
spectrumX.spcxVersion |
Spectrum-X reference architecture (RA2.1 or RA2.2). |
spectrumX.multiplaneMode |
Multiplane mode: hwplb, swplb, uniplane, none. |
spectrumX.numberOfPlanes |
Number of planes. |
Discovered node groups, populated by l8k discover. Each entry describes one group.
Field | Description |
|---|---|
identifier |
Sanitised <machineType>-<gpuType> (e.g. dgx-b200-nvidia-h100-nvl) when both fields are resolved; group-0 / group-1 fallback when they aren’t. Used as the NicNodePolicy / SriovNetworkNodePolicy name suffix. |
machineType / productType |
Hardware type strings (e.g., DGX-B200 / NVIDIA-B200). |
capabilities.nodes.sriov / rdma / ib |
Boolean flags reflecting hardware capability. |
workerNodes |
List of node names in this group. |
nodeSelector |
Per-group selector. After l8k discover, this is {nvidia.kubernetes-launch-kit.machine: <machineType>-<gpuType>} — a label discovery writes onto every node in the group. When l8k generate auto-merges groups sharing a GPU type, the merged group falls back to {nvidia.kubernetes-launch-kit.gpu: <gpuType>} instead (different source machineTypes can’t share a single machine label). Discovery writes both labels onto every node, so the merged selector has a value to bind to. Configs from earlier l8k versions with old-style differential nodeSelectors are preserved as-is. |
thirdPartyRDMAModules / storageModules |
OFED-dependent modules detected on the group. |
presetApplied |
true when a topology preset matched (machineType, gpuType) and was applied. |
presetDeviation |
List of field-level discrepancies between the matched preset and discovered hardware. Non-empty means the preset was applied but the cluster differs from the preset. Each entry has field (pciAddress / deviceID), expected, got, and detail. See Cluster Topology Presets “Validation and Deviations”. |
linkType |
The discovered fabric for the group: Ethernet or InfiniBand. Populated by the fabric probe only when every east-west port produces a confirmed verdict (port ACTIVE plus, for IB, a subnet manager is present) and they agree. When omitted, discovery couldn’t prove the cluster’s fabric — downstream code should treat the absence as “unknown”. See Discover Workflow “Fabric Type Detection”. |
pfs |
List of physical functions. Each entry has deviceID, pciAddress, rdmaDevice, networkInterface, traffic (east-west or north-south), and rail (sequential index for east-west PFs). |
North-south PFs are listed for visibility but filtered out of generated manifests. See Overview and Discover Workflow.
CLI Reference — flag reference
Discover Workflow — how this file is produced
Generate Workflow — how this file is consumed