In order to tailor the deployment of the Network Operator to your cluster needs, use the following parameters:

Name Type Default Description operator.admissionController.enabled Bool False Deploy with admission controller operator.admissionController.useCertManager Bool False Use cert-manager for generating self-signed certificate operator.admissionController.certificate.tlsCrt String "" External TLS certificate. Ignored if cert-manager is used operator.admissionController.certificate.tlsKey String "" External TLS private key. Ignored if cert-manager is used nfd.enabled Bool True Deploy Node Feature Discovery nfd.deployNodeFeatureRules Bool True Deploy Node Feature Rules to label the nodes sriovNetworkOperator.enabled Bool False Deploy SR-IOV Network Operator sriovNetworkOperator.configDaemonNodeSelectorExtra List node-role.kubernetes.io/worker: "" Additional values for SR-IOV Config Daemon nodes selector upgradeCRDs Bool True Enable CRDs upgrade with helm pre-install and pre-upgrade hooks operator.repository String nvcr.io/nvidia Network Operator image repository operator.image String network-operator Network Operator image name operator.tag String None Network Operator image tag. If set to None , the chart's appVersion will be used operator.imagePullSecrets List [] An optional list of references to secrets to use for pulling Network Operator image operator.cniBinDirectory String /opt/cni/bin Directory, where CNI binaries will be deployed on the nodes. Setting for the sriov-network-operator is set with `sriov-network-operator.cniBinPath` parameter. Note that the CNI bin directory should be aligned with the CNI bin directory in the container runtime. operator.resources Yaml resources: limits: cpu: 500m memory: 128Mi requests: cpu: 5m memory: 64Mi Optional resource requests and limits for the operator imagePullSecrets List [] An optional list of references to secrets to use for pulling any of the Network Operator image, if it is not overridden deployCR Bool False Deploy NicClusterPolicy custom resource according to the provided parameters nodeAffinity Yaml requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: node-role.kubernetes.io/master operator: DoesNotExist - key: node-role.kubernetes.io/control-plane operator: DoesNotExist Configure node affinity settings for Network Operator components tolerations Yaml "" Set additional tolerations for various Daemonsets deployed by the network operator, e.g. whereabouts, multus, cni-plugins. useDTK Bool True Enable the use of Driver ToolKit to compile OFED drivers (OpenShift only)

The NFD labels required by the Network Operator and GPU Operator:

Label Location feature.node.kubernetes.io/pci-15b3.present Nodes containing NVIDIA Networking hardware feature.node.kubernetes.io/pci-10de.present Nodes containing NVIDIA GPU hardware

SR-IOV Network Operator Helm chart customization options can be found here. Following is a list of overriden values by NVIDIA Operator Helm Chart:

Name Type Defaul in NVIDIA Network Operator Notes sriov-network-operator.operator.resourcePrefix String nvidia.com sriov-network-operator.operator.images.operator String nvcr.io/nvidia/mellanox/sriov-network-operator:network-operator-24.1.0 sriov-network-operator.operator.images.sriovConfigDaemon String nvcr.io/nvidia/mellanox/sriov-network-operator-config-daemon:network-operator-24.1.0 sriov-network-operator.operator.images.sriovCni String ghcr.io/k8snetworkplumbingwg/sriov-cni:v2.7.0 For ARM-based deployments, it is recommended to use the `ghcr.io/k8snetworkplumbingwg/sriov-cni:latest-arm64` image sriov-network-operator.operator.images.ibSriovCni String ghcr.io/k8snetworkplumbingwg/ib-sriov-cni:v1.0.3 For ARM-based deployments, it is recommended to use the `ghcr.io/k8snetworkplumbingwg/ib-sriov-cni:latest-arm64` image sriov-network-operator.operator.images.sriovDevicePlugin String ghcr.io/k8snetworkplumbingwg/sriov-network-device-plugin:v3.6.2 For ARM-based deployments, it is recommended to use the `ghcr.io/k8snetworkplumbingwg/sriov-network-device-plugin:v3.6.2-amd64` image sriov-network-operator.operator.images.webhook String nvcr.io/nvidia/mellanox/sriov-network-operator-webhook:network-operator-24.1.0 sriov-network-operator.operator.images. String nvcr.io/nvidia/mellanox/sriov-network-operator:network-operator-24.1.0

Optional requests and limits can be configured for each container of the sub-resources deployed by the Network Operator by setting the parameter ''containerResources".

For example:

containerResources:
  - name: "mofed-container"
    requests:
      cpu: "200m"
      memory: "150Mi"
    limits:
      cpu: "300m"
      memory: "300Mi"





Name Type Default Description ofedDriver.deploy Bool false Deploy the MLNX_OFED driver container ofedDriver.repository String nvcr.io/nvidia/mellanox MLNX_OFED driver image repository ofedDriver.image String doca-driver MLNX_OFED driver image name ofedDriver.version String 24.01-0.3.3.1.3 MLNX_OFED driver version ofedDriver.initContainer.enable Bool true Deploy init container ofedDriver.initContainer.repository string ghcr.io/mellanox init container image repository ofedDriver.initContainer.image string network-operator-init-container init container image name ofedDriver.initContainer.version string v0.0.2 init container image version ofedDriver.certConfig.name String "" Custom TLS key/certificate configuration configMap name ofedDriver.repoConfig.name String "" Private mirror repository configuration configMap name ofedDriver.terminationGracePeriodSeconds Int 300 NVIDIA OFED termination grace periods in seconds ofedDriver.imagePullSecrets List [] An optional list of references to secrets to use for pulling any of the MLNX_OFED driver images ofedDriver.env List [] An optional list of environment variables passed to the NVIDIA OFED driver image ofedDriver.startupProbe.initialDelaySeconds Int 10 MLNX_OFED startup probe initial delay ofedDriver.startupProbe.periodSeconds Int 20 MLNX_OFED startup probe interval ofedDriver.livenessProbe.initialDelaySeconds Int 30 MLNX_OFED liveness probe initial delay ofedDriver.livenessProbe.periodSeconds Int 30 MLNX_OFED liveness probe interval ofedDriver.readinessProbe.initialDelaySeconds Int 10 MLNX_OFED readiness probe initial delay ofedDriver.readinessProbe.periodSeconds IIn tnt 30 MLNX_OFED readiness probe interval ofedDriver.upgradePolicy.autoUpgrade Bool false A global switch for the automatic upgrade feature. If set to false, all other options are ignored. ofedDriver.upgradePolicy.maxParallelUpgrades Int 1 The amount of nodes that can be upgraded in parallel. 0 means no limit. All nodes will be upgraded in parallel. ofedDriver.upgradePolicy.safeLoad Bool false Cordon and drain (if enabled) a node before loading the driver on it, requires ofedDriver.initContainer to be enabled and ofedDriver.upgradePolicy.autoUpgrade to be true ofedDriver.upgradePolicy.drain.enable Bool true Options for node drain (`kubectl drain`) before driver reload, if auto upgrade is enabled. ofedDriver.upgradePolicy.drain.force Bool false Use force drain of pods ofedDriver.upgradePolicy.drain.podSelector String "" Pod selector to specify which pods will be drained from the node. An empty selector means all pods. ofedDriver.upgradePolicy.drain.timeoutSeconds Int 300 Number of seconds to wait for pod eviction ofedDriver.upgradePolicy.drain.deleteEmptyDir Bool false Delete pods local storage ofedDriver.upgradePolicy.waitForCompletion.podSelector String Not set Specifies a label selector for the pods to wait for completion before starting the driver upgrade ofedDriver.upgradePolicy.waitForCompletion.timeoutSeconds int Not set Specify the length of time in seconds to wait before giving up for workload to finish. Zero means infinite ofedDriver.containerResources List Not set Optional resource requests and limits for the `mofed-container`

The following are special environment variables supported by the MLNX_OFED container to configure its behavior:

Name Default Description CREATE_IFNAMES_UDEV "true” for Ubuntu 20.04, RHEL v8.x and OCP <= v4.13. "false" for newer OS. Create an udev rule to preserve "old-style" path based netdev names, e.g enp3s0f0 UNLOAD_STORAGE_MODULES "false" Unload host storage modules prior to loading MLNX_OFED modules: ib_isert

nvme_rdma

nvmet_rdma

rpcrdma

xprtrdma

ib_srpt ENABLE_NFSRDMA "false" Enable loading of NFS related storage modules from a MLNX_OFED container RESTORE_DRIVER_ON_POD_TERMINATION "true" R estore host drivers when a container is gracefully stopped

In addition, it is possible to specify any environment variables to be exposed to the MLNX_OFED container, such as the standard "HTTP_PROXY" , "HTTPS_PROXY" , "NO_PROXY".

Warning CREATE_IFNAMES_UDEV is set automatically by the Network Operator, depending on the Operating System of the worker nodes in the cluster (the cluster is assumed to be homogenous).

To set these variables, change them into Helm values. For example:

ofedDriver:
  env:
    - name: RESTORE_DRIVER_ON_POD_TERMINATION
      value: "true"
    - name: UNLOAD_STORAGE_MODULES
      value: "true"
    - name: CREATE_IFNAMES_UDEV
      value: "true"

The variables can also be configured directly via the NicClusterPolicy CRD.

Name Type Default Description rdmaSharedDevicePlugin.deploy Bool true Deploy RDMA shared device plugin rdmaSharedDevicePlugin.repository String nvcr.io/nvidia/cloud-native RDMA shared device plugin image repository rdmaSharedDevicePlugin.image String k8s-rdma-shared-dev-plugin RDMA shared device plugin image name rdmaSharedDevicePlugin.version String v1.3.2 RDMA shared device plugin version rdmaSharedDevicePlugin.imagePullSecrets List [] An optional list of references to secrets to use for pulling any of the RDMA Shared device plugin image rdmaSharedDevicePlugin.resources List See below RDMA shared device plugin resources rdmaSharedDevicePlugin.useCdi Bool false Enable Container Device Interface (CDI) mode. NOTE: NVIDIA Network Operator does not configure container runtime to enable CDI rdmaSharedDevicePlugin.containerResources List Not set Optional resource requests and limits for the `rdma-shared-dp` container

These configurations consist of a list of RDMA resources, each with a name and a selector of RDMA capable network devices to be associated with the resource. Refer to RDMA Shared Device Plugin Selectors for supported selectors.

resources:
  - name: rdma_shared_device_a
    vendors: [15b3]
    deviceIDs: [ 1017 ]
    ifNames: [enp5s0f0]
  - name: rdma_shared_device_b
    vendors: [15b3]
    deviceIDs: [ 1017 ]
    ifNames: [enp4s0f0, enp4s0f1]





Name Type Default Description sriovDevicePlugin.deploy Bool false Deploy SR-IOV Network device plugin sriovDevicePlugin.repository String ghcr.io/k8snetworkplumbingwg SR-IOV Network device plugin image repository sriovDevicePlugin.image String sriov-network-device-plugin SR-IOV Network device plugin image name sriovDevicePlugin.version String 7e7f979087286ee950bd5ebc89d8bbb6723fc625 SR-IOV Network device plugin version For ARM-based deployments, it is recommended to use the `ghcr.io/k8snetworkplumbingwg/sriov-network-device-plugin:v3.6.2-amd64` image sriovDevicePlugin.imagePullSecrets List [] An optional list of references to secrets to use for pulling any of the SR-IOV Network device plugin image sriovDevicePlugin.resources List See below SR-IOV Network device plugin resources sriovDevicePlugin.useCdi Bool false Enable Container Device Interface (CDI) mode. NOTE: NVIDIA Network Operator does not configure container runtime to enable CD. sriovDevicePlugin.containerResources List Not set Optional resource requests and limits for the `kube-sriovdp` container

Consists of a list of RDMA resources, each with a name and a selector of RDMA capable network devices to be associated with the resource. Refer to SR-IOV Network Device Plugin Selectors for supported selectors.

resources:
  - name: hostdev
    vendors: [15b3]
  - name: ethernet_rdma
    vendors: [15b3]
    linkTypes: [ether]
  - name: sriov_rdma
    vendors: [15b3]
    devices: [ 1018 ]
    drivers: [mlx5_ib]





IB Kubernetes provides a daemon that works in conjunction with the SR-IOV Network Device Plugin. It acts on Kubernetes pod object changes (Create/Update/Delete), reading the pod's network annotation, fetching its corresponding network CRD and reading the PKey. This is done in order to add the newly generated GUID or the predefined GUID in the GUID field of the CRD cni-args to that PKey for pods with mellanox.infiniband.app. annotation.

Name Type Default Description ibKubernetes.deploy bool false Deploy IB Kubernetes ibKubernetes.repository string ghcr.io/mellanox IB Kubernetes image repository ibKubernetes.image string ib-kubernetes IB Kubernetes image name ibKubernetes.version string v1.0.2 IB Kubernetes version ibKubernetes.imagePullSecrets list [] An optional list of references to secrets used for pulling any of the IB Kubernetes images ibKubernetes.periodicUpdateSeconds int 5 Interval of periodic update in seconds ibKubernetes.pKeyGUIDPoolRangeStart string 02:00:00:00:00:00:00:00 Minimal available GUID value to be allocated for the pod ibKubernetes.pKeyGUIDPoolRangeEnd string 02:FF:FF:FF:FF:FF:FF:FF Maximal available GUID value to be allocated for the pod ibKubernetes.ufmSecret string See below Name of the Secret with the NVIDIA UFM access credentials, deployed in advance ibKubernetes.containerResources List Not set Optional resource requests and limits for the `ib-kubernetes` container

IB Kubernetes must access NVIDIA UFM in order to manage pods' GUIDs. To provide its credentials, the secret of the following format should be deployed in advance:

apiVersion: v1
kind: Secret
metadata:
  name: ib-kubernetes-ufm-secret
  namespace: nvidia-network-operator
stringData:
  UFM_USERNAME: "admin"
  UFM_PASSWORD: "123456"
  UFM_ADDRESS: "ufm-hostname"
  UFM_HTTP_SCHEMA: ""
  UFM_PORT: ""
data:
  UFM_CERTIFICATE: ""

Warning The InfiniBand Fabric manages a single pool of GUIDs. In order to use IB Kubernetes in different clusters, different GUID ranges must be specified to avoid collisions.





Name Type Default Description secondaryNetwork.deploy Bool true Deploy Secondary Network

Specifies components to deploy in order to facilitate a secondary network in Kubernetes. It consists of the following optionally deployed components:

Multus-CNI: Delegate CNI plugin to support secondary networks in Kubernetes

CNI plugins: Currently only containernetworking-plugins are supported

IPAM CNI: Currently only Whereabout IPAM CNI is supported as a part of the secondaryNetwork section. NVIDIA-IPAM is configured separately.

IPoIB CNI: Allows the user to create an IPoIB child link and move it to the pod

Name Type Default Description secondaryNetwork.cniPlugins.deploy Bool true Deploy CNI Plugins Secondary Network secondaryNetwork.cniPlugins.image String plugins CNI Plugins image name secondaryNetwork.cniPlugins.repository String ghcr.io/k8snetworkplumbingwg CNI Plugins image repository secondaryNetwork.cniPlugins.version String v1.2.0-amd64 CNI Plugins image version secondaryNetwork.cniPlugins.imagePullSecrets List [] An optional list of references to secrets to use for pulling any of the CNI Plugins images secondaryNetwork.cniPlugins.containerResources List Not set Optional resource requests and limits for the `cni-plugins` container

Name Type Default Description secondaryNetwork.multus.deploy Bool true Deploy Multus Secondary Network secondaryNetwork.multus.image String multus-cni Multus image name secondaryNetwork.multus.repository String ghcr.io/k8snetworkplumbingwg Multus image repository secondaryNetwork.multus.version String v3.9.3 Multus image version secondaryNetwork.multus.imagePullSecrets List [] An optional list of references to secrets to use for pulling any of the Multus images secondaryNetwork.multus.config String `` Multus CNI config. If empty, the config will be automatically generated from the CNI configuration file of the master plugin (the first file in lexicographical order in the cni-confg-dir). secondaryNetwork.multus.containerResources List Not set Optional resource requests and limits for the `kube-multus` container

Name Type Default Description secondaryNetwork.ipoib.deploy Bool false Deploy IPoIB CNI secondaryNetwork.ipoib.image String ipoib-cni IPoIB CNI image name secondaryNetwork.ipoib.repository String IPoIB CNI image repository secondaryNetwork.ipoib.version String v1.1.0 IPoIB CNI image version secondaryNetwork.ipoib.imagePullSecrets List [] An optional list of references to secrets to use for pulling any of the IPoIB CNI images secondaryNetwork.ipoib.containerResources List Not set Optional resource requests and limits for the `ipoib-cni` container

Name Type Default Description secondaryNetwork.ipamPlugin.deploy Bool true Deploy IPAM CNI Plugin Secondary Network secondaryNetwork.ipamPlugin.image String whereabouts IPAM CNI Plugin image name secondaryNetwork.ipamPlugin.repository String ghcr.io/k8snetworkplumbingwg IPAM CNI Plugin image repository secondaryNetwork.ipamPlugin.version String v0.6.1-amd64 IPAM CNI Plugin image version secondaryNetwork.ipamPlugin.imagePullSecrets List [] An optional list of references to secrets to use for pulling any of the IPAM CNI Plugin images secondaryNetwork.ipamPlugin.containerResources List Not set Optional resource requests and limits for the `whereabouts` container

NVIDIA IPAM Plugin is recommended to be used on large-scale deployments of the NVIDIA Network Operator.

Name Type Default Description nvIpam.deploy Bool false Deploy NVIDIA IPAM Plugin nvIpam.image String nvidia-k8s-ipam NVIDIA IPAM Plugin image name nvIpam.repository String ghcr.io/mellanox NVIDIA IPAM Plugin image repository nvIpam.version String v0.1.1 NVIDIA IPAM Plugin image version nvIpam.imagePullSecrets List [] An optional list of references to secrets to use for pulling any of the Plugin images nvIpam.enableWebhook Bool false Enable deployment of the validataion webhook for IPPool CRD nvIpam.containerResources List Not set Optional resource requests and limits for the `nv-ipam-node` and `nv-ipam-controller` containers

Warning Supported X.509 certificate management system should be available in the cluster to enable the validation webhook. Currently, the supported systems are certmanager and Openshift certificate management.





NVIDIA NIC Feature Discovery leverages Node Feature Discovery to advertise NIC specific labels on K8s Node objects.

Name Type Default Description nicFeatureDiscovery.deploy Bool false Deploy NVIDIA NIC Feature Discovery nicFeatureDiscovery.image String nic-feature-discovery NVIDIA NIC Feature Discovery image name nicFeatureDiscovery.repository String ghcr.io/mellanox NVIDIA NIC Feature Discovery repository nicFeatureDiscovery.version String v0.0.1 NVIDIA NIC Feature Discovery image version nicFeatureDiscovery.containerResources List Not set Optional resource requests and limits for the `nic-feature-discovery` container

Warning Since several parameters should be provided when creating custom resources during operator deployment, it is recommended to use a configuration file. While it is possible to override the parameters via CLI, we recommend to avoid the use of CLI arguments in favor of a configuration file.

$ helm install -f ./values.yaml -n nvidia-network-operator --create-namespace --wait nvidia/network-operator network-operator




