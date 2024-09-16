Warning Since several parameters should be provided when creating custom resources during operator deployment, it is recommended to use a configuration file. While it is possible to override the parameters via CLI, we recommend to avoid the use of CLI arguments in favor of a configuration file.

Below are deployment examples, which the values.yaml file provided to the Helm during the installation of the network operator. This was achieved by running:

Copy Copied! helm install -f ./values.yaml -n nvidia-network-operator --create-namespace --wait nvidia/network-operator network-operator

First install the Network Operator with NFD enabled:

values.yaml :

Copy Copied! nfd: enabled: true

Once the Network Operator is installed create a NicClusterPolicy with * DOCA driver * RDMA Shared device plugin configured to a netdev with name ens1f0.

Note: You may need to change the interface names in the NicClusterPolicy to those used by your target nodes.

Copy Copied! apiVersion: mellanox.com/v1alpha1 kind: NicClusterPolicy metadata: name: nic-cluster-policy spec: ofedDriver: image: doca-driver repository: nvcr.io/nvidia/mellanox version: 24.07-0.6.1.0-0 forcePrecompiled: false imagePullSecrets: [] terminationGracePeriodSeconds: 300 startupProbe: initialDelaySeconds: 10 periodSeconds: 20 livenessProbe: initialDelaySeconds: 30 periodSeconds: 30 readinessProbe: initialDelaySeconds: 10 periodSeconds: 30 upgradePolicy: autoUpgrade: true maxParallelUpgrades: 1 safeLoad: false drain: enable: true force: true podSelector: "" timeoutSeconds: 300 deleteEmptyDir: true rdmaSharedDevicePlugin: # [map[ifNames:[ens1f0] name:rdma_shared_device_a]] image: k8s-rdma-shared-dev-plugin repository: ghcr.io/mellanox version: v1.5.1 imagePullSecrets: [] # The config below directly propagates to k8s-rdma-shared-device-plugin configuration. # Replace 'devices' with your (RDMA capable) netdevice name. config: | { "configList": [ { "resourceName": "rdma_shared_device_a", "rdmaHcaMax": 63, "selectors": { "vendors": [], "deviceIDs": [], "drivers": [], "ifNames": ["ens1f0"], "linkTypes": [] } } ] }

First install the Network Operator with NFD enabled:

values.yaml :

Copy Copied! nfd: enabled: true

Once the Network Operator is installed create a NicClusterPolicy with: * DOCA driver * RDMA Shared Device pluging with two RDMA resources - the first mapped to ens1f0 and ens1f1 and the second mapped to ens2f0 and ens2f1.

Note: You may need to change the interface names in the NicClusterPolicy to those used by your target nodes.

Copy Copied! apiVersion: mellanox.com/v1alpha1 kind: NicClusterPolicy metadata: name: nic-cluster-policy spec: ofedDriver: image: doca-driver repository: nvcr.io/nvidia/mellanox version: 24.07-0.6.1.0-0 forcePrecompiled: false imagePullSecrets: [] terminationGracePeriodSeconds: 300 startupProbe: initialDelaySeconds: 10 periodSeconds: 20 livenessProbe: initialDelaySeconds: 30 periodSeconds: 30 readinessProbe: initialDelaySeconds: 10 periodSeconds: 30 upgradePolicy: autoUpgrade: true maxParallelUpgrades: 1 safeLoad: false drain: enable: true force: true podSelector: "" timeoutSeconds: 300 deleteEmptyDir: true rdmaSharedDevicePlugin: # [map[ifNames:[ens1f0 ens1f1] name:rdma_shared_device_a] map[ifNames:[ens2f0 ens2f1] name:rdma_shared_device_b]] image: k8s-rdma-shared-dev-plugin repository: ghcr.io/mellanox version: v1.5.1 imagePullSecrets: [] # The config below directly propagates to k8s-rdma-shared-device-plugin configuration. # Replace 'devices' with your (RDMA capable) netdevice name. config: | { "configList": [ { "resourceName": "rdma_shared_device_a", "rdmaHcaMax": 63, "selectors": { "vendors": [], "deviceIDs": [], "drivers": [], "ifNames": ["ens1f0","ens1f1"], "linkTypes": [] } }, { "resourceName": "rdma_shared_device_b", "rdmaHcaMax": 63, "selectors": { "vendors": [], "deviceIDs": [], "drivers": [], "ifNames": ["ens2f0","ens2f1"], "linkTypes": [] } } ] }

First install the Network Operator with NFD enabled:

values.yaml :

Copy Copied! nfd: enabled: true

Once the Network Operator is installed create a NicClusterPolicy with the following enabled: * Secondary network * Multus CNI * Container-networking-plugins CNI plugins * IPAM Plugin

Copy Copied! apiVersion: mellanox.com/v1alpha1 kind: NicClusterPolicy metadata: name: nic-cluster-policy spec: secondaryNetwork: cniPlugins: image: plugins repository: ghcr.io/k8snetworkplumbingwg version: v1.5.0 imagePullSecrets: [] multus: image: multus-cni repository: ghcr.io/k8snetworkplumbingwg version: v3.9.3 imagePullSecrets: [] ipamPlugin: image: whereabouts repository: ghcr.io/k8snetworkplumbingwg version: v0.7.0 imagePullSecrets: []

First install the Network Operator with NFD enabled: values.yaml :

Copy Copied! nfd: enabled: true

Once the Network Operator is installed deploy a NicClusterPolicy with the following enabled:



Secondary network

Multus CNI

Container Networking plugins

NVIDIA-IPAM plugin

Copy Copied! apiVersion: mellanox.com/v1alpha1 kind: NicClusterPolicy metadata: name: nic-cluster-policy spec: secondaryNetwork: cniPlugins: image: plugins repository: ghcr.io/k8snetworkplumbingwg version: v1.5.0 imagePullSecrets: [] multus: image: multus-cni repository: ghcr.io/k8snetworkplumbingwg version: v3.9.3 imagePullSecrets: [] nvIpam: image: nvidia-k8s-ipam repository: ghcr.io/mellanox version: v0.2.0 imagePullSecrets: [] enableWebhook: false

To create an NV-IPAM IPPool, apply:

Copy Copied! apiVersion: nv-ipam.nvidia.com/v1alpha1 kind: IPPool metadata: name: my-pool namespace: nvidia-network-operator spec: subnet: 192.168.0.0/24 perNodeBlockSize: 100 gateway: 192.168.0.1

Example of a MacvlanNetwork that uses NVIDIA-IPAM:

Copy Copied! apiVersion: mellanox.com/v1alpha1 kind: MacvlanNetwork metadata: name: example-macvlannetwork spec: networkNamespace: "default" master: "ens2f0" mode: "bridge" mtu: 1500 ipam: | { "type": "nv-ipam", "poolName": "my-pool" }

In this mode, the Network Operator could be deployed on virtualized deployments as well. It supports both Ethernet and InfiniBand modes. From the Network Operator perspective, there is no difference between the deployment procedures. To work on a VM (virtual machine), the PCI passthrough must be configured for SR-IOV devices. The Network Operator works both with VF (Virtual Function) and PF (Physical Function) inside the VMs.

Warning If the Host Device Network is used without the MLNX_OFED driver, the following packages should be installed: the linux-generic package on Ubuntu hosts

the kernel-modules-extra package on the RedHat-based hosts

First install the Network Operator with NFD enabled: values.yaml :

Copy Copied! nfd: enabled: true

Once the Network Operator is installed deploy a NicClusterPolicy with:



SR-IOV device plugin configured with a single SR-IOV resource pool

Secondary network

Multus CNI

Container Networking plugins

IPAM plugin

Copy Copied! apiVersion: mellanox.com/v1alpha1 kind: NicClusterPolicy metadata: name: nic-cluster-policy spec: sriovDevicePlugin: image: sriov-network-device-plugin repository: ghcr.io/k8snetworkplumbingwg version: v3.7.0 imagePullSecrets: [] config: | { "resourceList": [ { "resourcePrefix": "nvidia.com", "resourceName": "hostdev", "selectors": { "vendors": ["15b3"], "devices": [], "drivers": [], "pfNames": [], "pciAddresses": [], "rootDevices": [], "linkTypes": [], "isRdma": true } } ] } secondaryNetwork: cniPlugins: image: plugins repository: ghcr.io/k8snetworkplumbingwg version: v1.5.0 imagePullSecrets: [] multus: image: multus-cni repository: ghcr.io/k8snetworkplumbingwg version: v3.9.3 imagePullSecrets: [] ipamPlugin: image: whereabouts repository: ghcr.io/k8snetworkplumbingwg version: v0.7.0 imagePullSecrets: []

Following the deployment, the network operator should be configured, and K8s networking should be deployed to use it in pod configuration.

The host-device-net.yaml configuration file for such a deployment:

Copy Copied! apiVersion: mellanox.com/v1alpha1 kind: HostDeviceNetwork metadata: name: hostdev-net spec: networkNamespace: "default" resourceName: "nvidia.com/hostdev" ipam: | { "type": "whereabouts", "datastore": "kubernetes", "kubernetes": { "kubeconfig": "/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig" }, "range": "192.168.3.225/28", "exclude": [ "192.168.3.229/30", "192.168.3.236/32" ], "log_file": "/var/log/whereabouts.log", "log_level": "info" }

The host-device-net-ocp.yaml configuration file for such a deployment in the OpenShift Platform:

Copy Copied! apiVersion: mellanox.com/v1alpha1 kind: HostDeviceNetwork metadata: name: hostdev-net spec: networkNamespace: "default" resourceName: "nvidia.com/hostdev" ipam: | { "type": "whereabouts", "range": "192.168.3.225/28", "exclude": [ "192.168.3.229/30", "192.168.3.236/32" ] }

The pod.yaml configuration file for such a deployment:

Copy Copied! apiVersion: v1 kind: Pod metadata: name: hostdev-test-pod annotations: k8s.v1.cni.cncf.io/networks: hostdev-net spec: restartPolicy: OnFailure containers: - image: name: mofed-test-ctr securityContext: capabilities: add: [ "IPC_LOCK" ] resources: requests: nvidia.com/hostdev: 1 limits: nvidia.com/hostdev: 1 command: - sh - -c - sleep inf

In this combined deployment, different NVIDIA NICs are used for RDMA Shared Device Plugin and SR-IOV Network Device Plugin in order to work with a Host Device Network or a Macvlan Network on different NICs. It is impossible to combine different networking types on the same NICs. The same principle should be applied for other networking combinations.

First install the Network Operator with NFD enabled: values.yaml :

Copy Copied! nfd: enabled: true

Once the Network Operator is installed deploy a NicClusterPolicy with:



RDMA shared device plugin with

SR-IOV device plugin, single SR-IOV resource pool

Secondary network

Multus CNI

Container-networking-plugins CNI plugins

RDMA Shared device plugin

Whereabouts IPAM CNI plugin

Copy Copied! apiVersion: mellanox.com/v1alpha1 kind: NicClusterPolicy metadata: name: nic-cluster-policy spec: rdmaSharedDevicePlugin: # [map[linkTypes:[ether] name:rdma_shared_device_a]] image: k8s-rdma-shared-dev-plugin repository: ghcr.io/mellanox version: v1.5.1 imagePullSecrets: [] # The config below directly propagates to k8s-rdma-shared-device-plugin configuration. # Replace 'devices' with your (RDMA capable) netdevice name. config: | { "configList": [ { "resourceName": "rdma_shared_device_a", "rdmaHcaMax": 63, "selectors": { "vendors": [], "deviceIDs": [], "drivers": [], "ifNames": [], "linkTypes": ["ether"] } } ] } sriovDevicePlugin: image: sriov-network-device-plugin repository: ghcr.io/k8snetworkplumbingwg version: v3.7.0 imagePullSecrets: [] config: | { "resourceList": [ { "resourcePrefix": "nvidia.com", "resourceName": "hostdev", "selectors": { "vendors": [], "devices": [], "drivers": [], "pfNames": [], "pciAddresses": [], "rootDevices": [], "linkTypes": ["IB"], "isRdma": true } } ] } secondaryNetwork: cniPlugins: image: plugins repository: ghcr.io/k8snetworkplumbingwg version: v1.5.0 imagePullSecrets: [] multus: image: multus-cni repository: ghcr.io/k8snetworkplumbingwg version: v3.9.3 imagePullSecrets: [] ipamPlugin: image: whereabouts repository: ghcr.io/k8snetworkplumbingwg version: v0.7.0 imagePullSecrets: []

For pods and network configuration examples please refer to the corresponding sections: Network Operator Deployment with the RDMA Shared Device Plugin and Network Operator Deployment with a Host Device Network.

In this mode, the Network Operator could be deployed on virtualized deployments as well. It supports both Ethernet and InfiniBand modes. From the Network Operator perspective, there is no difference between the deployment procedures. To work on a VM (virtual machine), the PCI passthrough must be configured for SR-IOV devices. The Network Operator works both with VF (Virtual Function) and PF (Physical Function) inside the VMs.

First install the Network Operator with NFD enabled: values.yaml :

Copy Copied! nfd: enabled: true

Once the Network Operator is installed create a NicClusterPolicy with: * DOCA driver * RDMA shared device plugin * Secondary network * Multus CNI * IPoIB CNI * Whereabouts IPAM CNI plugin

Copy Copied! apiVersion: mellanox.com/v1alpha1 kind: NicClusterPolicy metadata: name: nic-cluster-policy spec: ofedDriver: image: doca-driver repository: nvcr.io/nvidia/mellanox version: 24.07-0.6.1.0-0 forcePrecompiled: false imagePullSecrets: [] terminationGracePeriodSeconds: 300 startupProbe: initialDelaySeconds: 10 periodSeconds: 20 livenessProbe: initialDelaySeconds: 30 periodSeconds: 30 readinessProbe: initialDelaySeconds: 10 periodSeconds: 30 upgradePolicy: autoUpgrade: true maxParallelUpgrades: 1 safeLoad: false drain: enable: true force: true podSelector: "" timeoutSeconds: 300 deleteEmptyDir: true rdmaSharedDevicePlugin: # [map[ifNames:[ibs1f0] name:rdma_shared_device_a]] image: k8s-rdma-shared-dev-plugin repository: ghcr.io/mellanox version: v1.5.1 imagePullSecrets: [] # The config below directly propagates to k8s-rdma-shared-device-plugin configuration. # Replace 'devices' with your (RDMA capable) netdevice name. config: | { "configList": [ { "resourceName": "rdma_shared_device_a", "rdmaHcaMax": 63, "selectors": { "vendors": [], "deviceIDs": [], "drivers": [], "ifNames": ["ibs1f0"], "linkTypes": [] } } ] } secondaryNetwork: cniPlugins: image: plugins repository: ghcr.io/k8snetworkplumbingwg version: v1.5.0 imagePullSecrets: [] multus: image: multus-cni repository: ghcr.io/k8snetworkplumbingwg version: v3.9.3 imagePullSecrets: [] ipoib: image: ipoib-cni repository: ghcr.io/mellanox version: 428715a57c0b633e48ec7620f6e3af6863149ccf ipamPlugin: image: whereabouts repository: ghcr.io/k8snetworkplumbingwg version: v0.7.0 imagePullSecrets: []

Following the deployment, the network operator should be configured, and K8s networking deployed to use it in the pod configuration.

The ipoib-net.yaml configuration file for such a deployment:

Copy Copied! apiVersion: mellanox.com/v1alpha1 kind: IPoIBNetwork metadata: name: example-ipoibnetwork spec: networkNamespace: "default" master: "ibs1f0" ipam: | { "type": "whereabouts", "datastore": "kubernetes", "kubernetes": { "kubeconfig": "/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig" }, "range": "192.168.5.225/28", "exclude": [ "192.168.6.229/30", "192.168.6.236/32" ], "log_file" : "/var/log/whereabouts.log", "log_level" : "info", "gateway": "192.168.6.1" }

The ipoib-net-ocp.yaml configuration file for such a deployment in the OpenShift Platform:

Copy Copied! apiVersion: mellanox.com/v1alpha1 kind: IPoIBNetwork metadata: name: example-ipoibnetwork spec: networkNamespace: "default" master: "ibs1f0" ipam: | { "type": "whereabouts", "range": "192.168.5.225/28", "exclude": [ "192.168.6.229/30", "192.168.6.236/32" ] }

The pod.yaml configuration file for such a deployment:

Copy Copied! apiVersion: v1 kind: Pod metadata: name: iboip-test-pod annotations: k8s.v1.cni.cncf.io/networks: example-ipoibnetwork spec: restartPolicy: OnFailure containers: - image: name: mofed-test-ctr securityContext: capabilities: add: [ "IPC_LOCK" ] resources: requests: rdma/rdma_shared_device_a: 1 limits: edma/rdma_shared_device_a: 1 command: - sh - -c - sleep inf

GPUDirect requires the following:



NVIDIA DOCA Driver v5.5-1.0.3.2 or newer

GPU Operator v1.9.0 or newer

NVIDIA GPU and driver supporting GPUDirect e.g Quadro RTX 6000/8000 or NVIDIA T4/NVIDIA V100/NVIDIA A100

First install the Network Operator with NFD enabled: values.yaml :

Copy Copied! nfd: enabled: true

Once the Network Operator is installed create a NicClusterPolicy with: * DOCA driver * SR-IOV Device Plugin * Secondary network * Multus CNI * Container Networking plugins * IPAM plugin

Copy Copied! apiVersion: mellanox.com/v1alpha1 kind: NicClusterPolicy metadata: name: nic-cluster-policy spec: ofedDriver: image: doca-driver repository: nvcr.io/nvidia/mellanox version: 24.07-0.6.1.0-0 forcePrecompiled: false imagePullSecrets: [] terminationGracePeriodSeconds: 300 startupProbe: initialDelaySeconds: 10 periodSeconds: 20 livenessProbe: initialDelaySeconds: 30 periodSeconds: 30 readinessProbe: initialDelaySeconds: 10 periodSeconds: 30 upgradePolicy: autoUpgrade: true maxParallelUpgrades: 1 safeLoad: false drain: enable: true force: true podSelector: "" timeoutSeconds: 300 deleteEmptyDir: true sriovDevicePlugin: image: sriov-network-device-plugin repository: ghcr.io/k8snetworkplumbingwg version: v3.7.0 imagePullSecrets: [] config: | { "resourceList": [ { "resourcePrefix": "nvidia.com", "resourceName": "hostdev", "selectors": { "vendors": ["15b3"], "devices": [], "drivers": [], "pfNames": [], "pciAddresses": [], "rootDevices": [], "linkTypes": [], "isRdma": true } } ] } secondaryNetwork: cniPlugins: image: plugins repository: ghcr.io/k8snetworkplumbingwg version: v1.5.0 imagePullSecrets: [] multus: image: multus-cni repository: ghcr.io/k8snetworkplumbingwg version: v3.9.3 imagePullSecrets: [] ipamPlugin: image: whereabouts repository: ghcr.io/k8snetworkplumbingwg version: v0.7.0 imagePullSecrets: []

host-device-net.yaml:

Copy Copied! apiVersion: mellanox.com/v1alpha1 kind: HostDeviceNetwork metadata: name: hostdevice-net spec: networkNamespace: "default" resourceName: "hostdev" ipam: | { "type": "whereabouts", "datastore": "kubernetes", "kubernetes": { "kubeconfig": "/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig" }, "range": "192.168.3.225/28", "exclude": [ "192.168.3.229/30", "192.168.3.236/32" ], "log_file" : "/var/log/whereabouts.log", "log_level" : "info" }

The host-device-net-ocp.yaml configuration file for such a deployment in the OpenShift Platform:

Copy Copied! apiVersion: mellanox.com/v1alpha1 kind: HostDeviceNetwork metadata: name: hostdevice-net spec: networkNamespace: "default" resourceName: "hostdev" ipam: | { "type": "whereabouts", "range": "192.168.3.225/28", "exclude": [ "192.168.3.229/30", "192.168.3.236/32" ] }

host-net-gpudirect-pod.yaml

Copy Copied! apiVersion: v1 kind: Pod metadata: name: testpod1 annotations: k8s.v1.cni.cncf.io/networks: hostdevice-net spec: containers: - name: appcntr1 image: <image> imagePullPolicy: IfNotPresent securityContext: capabilities: add: ["IPC_LOCK"] command: - sh - -c - sleep inf resources: requests: nvidia.com/hostdev: '1' nvidia.com/gpu: '1' limits: nvidia.com/hostdev: '1' nvidia.com/gpu: '1'

Warning The SR-IOV Network Operator will be deployed with the default configuration. You can override these settings using a CLI argument, or the ‘sriov-network-operator’ section in the values.yaml file. For more information, refer to the Project Documentation.

Warning This deployment mode supports SR-IOV in legacy mode.

First install the Network Operator with NFD and SRIOV Network Operator enabled: values.yaml :

Copy Copied! nfd: enabled: true sriovNetworkOperator: enabled: true

Once the Network Operator is installed create a NicClusterPolicy with: * DOCA driver * Secondary network * Multus CNI * IPoIB CNI * IPAM CNI plugin

Copy Copied! apiVersion: mellanox.com/v1alpha1 kind: NicClusterPolicy metadata: name: nic-cluster-policy spec: ofedDriver: image: doca-driver repository: nvcr.io/nvidia/mellanox version: 24.07-0.6.1.0-0 forcePrecompiled: false imagePullSecrets: [] terminationGracePeriodSeconds: 300 startupProbe: initialDelaySeconds: 10 periodSeconds: 20 livenessProbe: initialDelaySeconds: 30 periodSeconds: 30 readinessProbe: initialDelaySeconds: 10 periodSeconds: 30 upgradePolicy: autoUpgrade: true maxParallelUpgrades: 1 safeLoad: false drain: enable: true force: true podSelector: "" timeoutSeconds: 300 deleteEmptyDir: true secondaryNetwork: cniPlugins: image: plugins repository: ghcr.io/k8snetworkplumbingwg version: v1.5.0 imagePullSecrets: [] multus: image: multus-cni repository: ghcr.io/k8snetworkplumbingwg version: v3.9.3 imagePullSecrets: [] ipamPlugin: image: whereabouts repository: ghcr.io/k8snetworkplumbingwg version: v0.7.0 imagePullSecrets: []

Following the deployment, the Network Operator should be configured, and sriovnetwork node policy and K8s networking should be deployed.

The sriovnetwork-node-policy.yaml configuration file for such a deployment:

Copy Copied! apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: policy-1 namespace: nvidia-network-operator spec: deviceType: netdevice mtu: 1500 nicSelector: vendor: "15b3" pfNames: ["ens2f0"] nodeSelector: feature.node.kubernetes.io/pci-15b3.present: "true" numVfs: 8 priority: 90 isRdma: true resourceName: sriov_resource

The sriovnetwork.yaml configuration file for such a deployment:

Copy Copied! apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetwork metadata: name: "example-sriov-network" namespace: nvidia-network-operator spec: vlan: 0 networkNamespace: "default" resourceName: "sriov_resource" ipam: |- { "datastore": "kubernetes", "kubernetes": { "kubeconfig": "/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig" }, "log_file": "/tmp/whereabouts.log", "log_level": "debug", "type": "whereabouts", "range": "192.168.101.0/24" }

Warning The ens2f0 network interface name has been chosen from the following command output: kubectl -n nvidia-network-operator get sriovnetworknodestates.sriovnetwork.openshift.io -o yaml .

Copy Copied! ... status: interfaces: - deviceID: 101d driver: mlx5_core linkSpeed: 100000 Mb/s linkType: ETH mac: 0c:42:a1:2b:74:ae mtu: 1500 name: ens2f0 pciAddress: "0000:07:00.0" totalvfs: 8 vendor: 15b3 - deviceID: 101d driver: mlx5_core linkType: ETH mac: 0c:42:a1:2b:74:af mtu: 1500 name: ens2f1 pciAddress: "0000:07:00.1" totalvfs: 8 vendor: 15b3 ...

Wait for all required pods to be spawned:

Copy Copied! # kubectl get pod -n nvidia-network-operator | grep sriov network-operator-sriov-network-operator-544c8dbbb9-vzkmc 1/1 Running 0 5d sriov-device-plugin-vwpzn 1/1 Running 0 2d6h sriov-network-config-daemon-qv467 3/3 Running 0 5d # kubectl get pod -n nvidia-network-operator NAME READY STATUS RESTARTS AGE cni-plugins-ds-kbvnm 1/1 Running 0 5d cni-plugins-ds-pcllg 1/1 Running 0 5d kube-multus-ds-5j6ns 1/1 Running 0 5d kube-multus-ds-mxgvl 1/1 Running 0 5d mofed-ubuntu20.04-ds-2zzf4 1/1 Running 0 5d mofed-ubuntu20.04-ds-rfnsw 1/1 Running 0 5d whereabouts-nw7hn 1/1 Running 0 5d whereabouts-zvhrv 1/1 Running 0 5d ...

The pod.yaml configuration file for such a deployment:

Copy Copied! apiVersion: v1 kind: Pod metadata: name: testpod1 annotations: k8s.v1.cni.cncf.io/networks: example-sriov-network spec: containers: - name: appcntr1 image: <image> imagePullPolicy: IfNotPresent securityContext: capabilities: add: ["IPC_LOCK"] resources: requests: nvidia.com/sriov_resource: '1' limits: nvidia.com/sriov_resource: '1' command: - sh - -c - sleep inf

Warning This feature is supported only for Vanilla Kubernetes deployments with SR-IOV Network Operator.

To apply SR-IOV configuration on several nodes in parallel, create a SriovNetworkPoolConfig CR and specify the maximum number or percentage of nodes that can be unavailable at the same time:

sriov-network-pool-config-number.yaml

Copy Copied! apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkPoolConfig metadata: name: pool-1 namespace: nvidia-network-operator spec: maxUnavailable: "20" nodeSelector: - matchExpressions: - key: some-label operator: In values: - val-2 - matchExpressions: - key: other-label operator: "Exists"

sriov-network-pool-config-percent.yaml

Copy Copied! apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkPoolConfig metadata: name: pool-1 namespace: nvidia-network-operator spec: maxUnavailable: "10%" nodeSelector: - matchExpressions: - key: some-label operator: In values: - val-2 - matchExpressions: - key: other-label operator: "Exists"

To upgrade SR-IOV Network operator you need to create SriovNetworkPoolConfig CR with the number of nodes to be configured in a parallel as we did in SriovOperatorConfig` in previous releases.

E.g.: old method to configure nodes in a parallel:

Copy Copied! kubectl patch sriovoperatorconfigs.sriovnetwork.openshift.io -n nvidia-network-operator default --patch '{ "spec": { "maxParallelNodeConfiguration": 5 } }' --type='merge'

New method to configure nodes in a parallel:

sriov-network-pool-config-new.yaml

Copy Copied! apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkPoolConfig metadata: name: pool-1 namespace: nvidia-network-operator spec: maxUnavailable: "5" nodeSelector: - matchExpressions: - key: node-role.kubernetes.io/master operator: Exists

Warning This feature is supported only for Vanilla Kubernetes deployments with SR-IOV Network Operator.

To apply SriovNetworkNodePolicy on several nodes in parallel, specify the featureGates option in the SriovOperatorConfig CRD:

Copy Copied! kubectl patch sriovoperatorconfigs.sriovnetwork.openshift.io -n nvidia-network-operator default --patch '{ "spec": { "featureGates": { "parallelNicConfig": true } } }' --type='merge'

To enable systemd SR-IOV configuration mode, specify the configurationMode option in the SriovOperatorConfig CRD:

Copy Copied! kubectl patch sriovoperatorconfigs.sriovnetwork.openshift.io -n nvidia-network-operator default --patch '{ "spec": { "configurationMode": "systemd"} }' --type='merge'

Network Operator deployment with InfiniBand network requires the following:



NVIDIA DOCA Driver and OpenSM running. OpenSM runs on top of the NVIDIA DOCA Driver stack, so both the driver and the subnet manager should come from the same installation. Note that partitions that are configured by OpenSM should specify defmember=full to enable the SR-IOV functionality over InfiniBand. For more details, please refer to this article.

InfiniBand device – Both the host device and switch ports must be enabled in InfiniBand mode.

rdma-core package should be installed when an inbox driver is used.

First install the Network Operator with NFD and SR-IOV Network Operator enabled: values.yaml

Copy Copied! nfd: enabled: true sriovNetworkOperator: enabled: true

Once the Network Operator is installed create a NicClusterPolicy with: * DOCA driver * Secondary network * Multus CNI * Container Networking Plugins * IPAM plugin

Copy Copied! apiVersion: mellanox.com/v1alpha1 kind: NicClusterPolicy metadata: name: nic-cluster-policy spec: ofedDriver: image: doca-driver repository: nvcr.io/nvidia/mellanox version: 24.07-0.6.1.0-0 forcePrecompiled: false imagePullSecrets: [] terminationGracePeriodSeconds: 300 startupProbe: initialDelaySeconds: 10 periodSeconds: 20 livenessProbe: initialDelaySeconds: 30 periodSeconds: 30 readinessProbe: initialDelaySeconds: 10 periodSeconds: 30 upgradePolicy: autoUpgrade: true maxParallelUpgrades: 1 safeLoad: false drain: enable: true force: true podSelector: "" timeoutSeconds: 300 deleteEmptyDir: true secondaryNetwork: cniPlugins: image: plugins repository: ghcr.io/k8snetworkplumbingwg version: v1.5.0 imagePullSecrets: [] multus: image: multus-cni repository: ghcr.io/k8snetworkplumbingwg version: v3.9.3 imagePullSecrets: [] ipamPlugin: image: whereabouts repository: ghcr.io/k8snetworkplumbingwg version: v0.7.0 imagePullSecrets: []

sriov-ib-network-node-policy.yaml

Copy Copied! apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: infiniband-sriov namespace: nvidia-network-operator spec: deviceType: netdevice mtu: 1500 nodeSelector: feature.node.kubernetes.io/pci-15b3.present: "true" nicSelector: vendor: "15b3" linkType: IB isRdma: true numVfs: 8 priority: 90 resourceName: mlnxnics

sriov-ib-network.yaml

Copy Copied! apiVersion: sriovnetwork.openshift.io/v1 kind: SriovIBNetwork metadata: name: example-sriov-ib-network namespace: nvidia-network-operator spec: ipam: | { "type": "whereabouts", "datastore": "kubernetes", "kubernetes": { "kubeconfig": "/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig" }, "range": "192.168.5.225/28", "exclude": [ "192.168.5.229/30", "192.168.5.236/32" ], "log_file": "/var/log/whereabouts.log", "log_level": "info" } resourceName: mlnxnics linkState: enable networkNamespace: default

sriov-ib-network-pod.yaml

Copy Copied! apiVersion: v1 kind: Pod metadata: name: test-sriov-ib-pod annotations: k8s.v1.cni.cncf.io/networks: example-sriov-ib-network spec: containers: - name: test-sriov-ib-pod image: centos/tools imagePullPolicy: IfNotPresent command: - sh - -c - sleep inf securityContext: capabilities: add: [ "IPC_LOCK" ] resources: requests: nvidia.com/mlnxics: "1" limits: nvidia.com/mlnxics: "1"

Network Operator deployment with InfiniBand network requires the following:



NVIDIA DOCA Driver and OpenSM running. OpenSM runs on top of the NVIDIA DOCA Driver stack, so both the driver and the subnet manager should come from the same installation. Note that partitions that are configured by OpenSM should specify defmember=full to enable the SR-IOV functionality over InfiniBand. For more details, please refer to this article.

NVIDIA UFM running on top of OpenSM. For more details, please refer to the project documentation.

InfiniBand device – Both the host device and the switch ports must be enabled in InfiniBand mode.

rdma-core package should be installed when an inbox driver is used.

Current limitations:





Only a single PKey can be configured per workload pod.

When a single instance of NVIDIA UFM is used with several K8s clusters, different PKey GUID pools should be configured for each cluster.

Warning ib-kubernetes-ufm-secret should be created before NicClusterPolicy.

ufm-secret.yaml

Copy Copied! apiVersion: v1 kind: Secret metadata: name: ib-kubernetes-ufm-secret namespace: nvidia-network-operator stringData: UFM_USERNAME: "admin" UFM_PASSWORD: "123456" UFM_ADDRESS: "ufm-host" UFM_HTTP_SCHEMA: "" UFM_PORT: "" data: UFM_CERTIFICATE: ""

First install the Network Operator with NFD enabled: values.yaml

Copy Copied! nfd: enabled: true sriovNetworkOperator: enabled: true resourcePrefix: "nvidia.com"

Once the Network Operator is installed create a NicClusterPolicy with: * DOCA driver * ibKubernetes * Secondary network * Multus CNI * Container Networking plugins * IPAM Plugin

Copy Copied! apiVersion: mellanox.com/v1alpha1 kind: NicClusterPolicy metadata: name: nic-cluster-policy spec: ofedDriver: image: doca-driver repository: nvcr.io/nvidia/mellanox version: 24.07-0.6.1.0-0 forcePrecompiled: false imagePullSecrets: [] terminationGracePeriodSeconds: 300 startupProbe: initialDelaySeconds: 10 periodSeconds: 20 livenessProbe: initialDelaySeconds: 30 periodSeconds: 30 readinessProbe: initialDelaySeconds: 10 periodSeconds: 30 upgradePolicy: autoUpgrade: true maxParallelUpgrades: 1 safeLoad: false drain: enable: true force: true podSelector: "" timeoutSeconds: 300 deleteEmptyDir: true ibKubernetes: image: ib-kubernetes repository: ghcr.io/mellanox version: v1.0.2 imagePullSecrets: [] pKeyGUIDPoolRangeStart: 02:00:00:00:00:00:00:00 pKeyGUIDPoolRangeEnd: 02:FF:FF:FF:FF:FF:FF:FF ufmSecret: "ufm-secret" nvIpam: image: nvidia-k8s-ipam repository: ghcr.io/mellanox version: v0.2.0 imagePullSecrets: [] enableWebhook: false secondaryNetwork: cniPlugins: image: plugins repository: ghcr.io/k8snetworkplumbingwg version: v1.5.0 imagePullSecrets: [] multus: image: multus-cni repository: ghcr.io/k8snetworkplumbingwg version: v3.9.3 imagePullSecrets: []

Create IPPool object for nv-ipam

Copy Copied! apiVersion: nv-ipam.nvidia.com/v1alpha1 kind: IPPool metadata: name: pool1 namespace: nvidia-network-operator spec: subnet: 192.168.0.0/16 perNodeBlockSize: 100 gateway: 192.168.0.1 nodeSelector: nodeSelectorTerms: - matchExpressions: - key: node-role.kubernetes.io/worker operator: Exists

Wait for NVIDIA DOCA Driver to install and apply the following CRs:

sriov-ib-network-node-policy.yaml

Copy Copied! apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: infiniband-sriov namespace: nvidia-network-operator spec: deviceType: netdevice mtu: 1500 nodeSelector: feature.node.kubernetes.io/pci-15b3.present: "true" nicSelector: vendor: "15b3" linkType: IB isRdma: true numVfs: 8 priority: 90 resourceName: mlnxnics

sriov-ib-network.yaml

Copy Copied! apiVersion: "k8s.cni.cncf.io/v1" kind: NetworkAttachmentDefinition metadata: name: ib-sriov-network annotations: k8s.v1.cni.cncf.io/resourceName: nvidia.com/mlnxnics spec: config: '{ "type":"ib-sriov", "cniVersion":"0.3.1", "name":"ib-sriov-network", "pkey":"0x6", "link_state":"enable", "ibKubernetesEnabled":true, "ipam":{ "type":"nv-ipam", "poolName":"pool1" } }'

Note To use the IB network with Pkey management feature with RDMA isolation, use the following sriov-ib-network.yaml :

Copy Copied! apiVersion: "k8s.cni.cncf.io/v1" kind: NetworkAttachmentDefinition metadata: name: ib-sriov-network annotations: k8s.v1.cni.cncf.io/resourceName: nvidia.com/mlnxnics spec: config: '{ "cniVersion":"0.3.1", "name":"ib-sriov-network", "plugins":[ { "type":"ib-sriov", "pkey":"0x6", "link_state":"enable", "ibKubernetesEnabled":true, "ipam":{ "type":"nv-ipam", "poolName":"pool1" } }, { "type":"rdma" } ] }'

sriov-ib-network-pod.yaml

Copy Copied! apiVersion: v1 kind: Pod metadata: name: test-sriov-ib-pod annotations: k8s.v1.cni.cncf.io/networks: ib-sriob-network spec: containers: - name: test-sriov-ib-pod image: centos/tools imagePullPolicy: IfNotPresent command: - sh - -c - sleep inf securityContext: capabilities: add: [ "IPC_LOCK" ] resources: requests: nvidia.com/mlnxics: "1" limits: nvidia.com/mlnxics: "1"

This deployment mode supports DPDK applications. In order to run DPDK applications, HUGEPAGE should be configured on the required K8s Worker Nodes. By default, the inbox operating system driver is used. For support of cases with specific requirements, OFED container should be deployed.

Network Operator deployment with:



Host Device Network

DPDK pod

nicclusterpolicy.yaml

Copy Copied! apiVersion: mellanox.com/v1alpha1 kind: NicClusterPolicy metadata: name: nic-cluster-policy spec: ofedDriver: image: doca-driver repository: nvcr.io/nvidia/mellanox version: 24.07-0.6.1.0-0 sriovDevicePlugin: image: sriov-network-device-plugin repository: ghcr.io/k8snetworkplumbingwg version: v3.7.0 config: | { "resourceList": [ { "resourcePrefix": "nvidia.com", "resourceName": "rdma_host_dev", "selectors": { "vendors": ["15b3"], "devices": ["1018"], "drivers": ["mlx5_core"] } } ] } secondaryNetwork: cniPlugins: image: plugins repository: ghcr.io/k8snetworkplumbingwg version: v1.5.0-amd64 ipamPlugin: image: whereabouts repository: ghcr.io/k8snetworkplumbingwg version: v0.7.0-amd64 multus: image: multus-cni repository: ghcr.io/k8snetworkplumbingwg version: v3.9.3

host-device-net.yaml

Copy Copied! apiVersion: mellanox.com/v1alpha1 kind: HostDeviceNetwork metadata: name: example-hostdev-net spec: networkNamespace: "default" resourceName: "rdma_host_dev" ipam: | { "type": "whereabouts", "datastore": "kubernetes", "kubernetes": { "kubeconfig": "/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig" }, "range": "192.168.3.225/28", "exclude": [ "192.168.3.229/30", "192.168.3.236/32" ], "log_file" : "/var/log/whereabouts.log", "log_level" : "info" }

pod.yaml

Copy Copied! apiVersion: v1 kind: Pod metadata: name: testpod1 annotations: k8s.v1.cni.cncf.io/networks: example-hostdev-net spec: containers: - name: appcntr1 image: <dpdk image> imagePullPolicy: IfNotPresent securityContext: capabilities: add: ["IPC_LOCK"] volumeMounts: - mountPath: /dev/hugepages name: hugepage resources: requests: memory: 1Gi hugepages-1Gi: 2Gi nvidia.com/rdma_host_dev: '1' command: [ "/bin/bash", "-c", "--" ] args: [ "whiletrue;dosleep300000;done;" ] volumes: - name: hugepage emptyDir: medium: HugePages

Warning This feature is supported only for Vanilla Kubernetes deployments with SR-IOV Network Operator.

Warning To use OFED container with this mode of operation, set the RESTORE_DRIVER_ON_POD_TERMINATION environment variable to false in the driver configuration section in the NicClusterPolicy. Restoration to the inbox driver is not supported for this feature.

Warning Tech Preview feature.

Deploy network-operator by Helm with sriov-network-operator and nv-ipam.

First install the Network Operator with NFD enabled: values.yaml

Copy Copied! sriovNetworkOperator: enabled: true

Once the Network Operator has been installed create a NicClusterPolicy with nv-ipam:

Copy Copied! apiVersion: mellanox.com/v1alpha1 kind: NicClusterPolicy metadata: name: nic-cluster-policy spec: nvIpam: image: nvidia-k8s-ipam repository: ghcr.io/mellanox version: v0.2.0 imagePullSecrets: [] enableWebhook: false

Enable manageSoftwareBridges featureGate for sriov-network-operator

Copy Copied! kubectl patch sriovoperatorconfigs.sriovnetwork.openshift.io -n nvidia-network-operator default --patch '{ "spec": { "featureGates": { "manageSoftwareBridges": true } } }' --type='merge'

Create IPPool object for nv-ipam

Copy Copied! apiVersion: nv-ipam.nvidia.com/v1alpha1 kind: IPPool metadata: name: pool1 namespace: nvidia-network-operator spec: subnet: 192.168.0.0/16 perNodeBlockSize: 100 gateway: 192.168.0.1 nodeSelector: nodeSelectorTerms: - matchExpressions: - key: node-role.kubernetes.io/worker operator: Exists

Supported operating systems:



Ubuntu 22.04

RHEL 8.8

OpenvSwitch from the NVIDIA DOCA for Host package with doca-all or doca-networking profile should be installed on each worker node.

Check NVIDIA DOCA Official installation guide for details.

Supported OpenvSwitch dataplanes:



OVS-kernel

OVS-doca

Check OpenvSwitch Offload document to know about differences.

These steps are for OVS-kernel data plane, to use OVS-doca follow instructions from the relevant section.

Configure Open_vSwitch

Copy Copied! ovs-vsctl set Open_vSwitch . other_config:hw-offload=true

Restart Open_vSwitch

Copy Copied! systemctl restart openvswitch-switch.service

Create SriovNetworkNodePolicy for selected NIC

Copy Copied! apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: ovs-switchdev namespace: nvidia-network-operator spec: eSwitchMode: switchdev mtu: 1500 nicSelector: deviceID: 101d vendor: 15b3 nodeSelector: node-role.kubernetes.io/worker: "" numVfs: 4 isRdma: true linkType: ETH resourceName: switchdev bridge: ovs: {}

Create OVSNetwork CR

Copy Copied! apiVersion: sriovnetwork.openshift.io/v1 kind: OVSNetwork metadata: name: ovs namespace: nvidia-network-operator spec: networkNamespace: default ipam: | { "type": "nv-ipam", "poolName": "pool1" } resourceName: switchdev

These steps are for OVS-doca data plane, to use OVS-kernel follow instructions from the relevant section.

Configure hugepages

Copy Copied! mkdir -p /hugepages mount -t hugetlbfs hugetlbfs /hugepages echo 4096 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages

Note: for multi CPU system hugepages should be created for each NUMA node: node0, node1, …

Configure system to create hugepages on boot

Copy Copied! echo "vm.nr_hugepages=8192" > /etc/sysctl.d/99-hugepages.conf

Note: this example is for a server with two CPU

Configure Open_vSwitch

Copy Copied! ovs-vsctl --no-wait set Open_vSwitch . other_config:doca-init=true ovs-vsctl set Open_vSwitch . other_config:hw-offload=true

Restart Open_vSwitch

Copy Copied! systemctl restart openvswitch-switch.service

Create SriovNetworkNodePolicy for selected NIC

Copy Copied! apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: ovs-switchdev namespace: nvidia-network-operator spec: eSwitchMode: switchdev mtu: 1500 nicSelector: deviceID: 101d vendor: 15b3 nodeSelector: node-role.kubernetes.io/worker: "" numVfs: 4 isRdma: true linkType: ETH resourceName: switchdev bridge: ovs: bridge: datapathType: netdev uplink: interface: type: dpdk

Create OVSNetwork CR

Copy Copied! apiVersion: sriovnetwork.openshift.io/v1 kind: OVSNetwork metadata: name: ovs namespace: nvidia-network-operator spec: networkNamespace: default ipam: | { "type": "nv-ipam", "poolName": "pool1" } resourceName: switchdev interfaceType: dpdk

Copy Copied! apiVersion: apps/v1 kind: Deployment metadata: name: ovs-offload labels: app: ovs-offload spec: replicas: 2 selector: matchLabels: app: ovs-offload template: metadata: labels: app: ovs-offload annotations: k8s.v1.cni.cncf.io/networks: ovs spec: containers: - name: ovs-offload-container command: ["/bin/bash", "-c"] args: - | while true; do sleep 1000; done image: mellanox/rping-test securityContext: capabilities: add: ["IPC_LOCK"] resources: requests: nvidia.com/switchdev: 1 limits: nvidia.com/switchdev: 1

For OVS hardware offload verification and troubleshooting steps, please refer to the following DOCA documentation: