Installing NVIDIA Network Operator

Prerequisites

Note

If Mellanox NICs are not connected to your nodes, please skip this step and proceed to next step Installing GPU Operator

The below instructions assume that Mellanox NICs are connected to your machines.

Execute the below command to verify Mellanox NICs are enabled on your machines:

$ lspci | grep -i "Mellanox"

Output:

1
2
0c:00.0 Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]
0c:00.1 Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]

Execute the below command to know which Mellanox Device is Active:

Note

Use the Device whichever shows as Link Detected: yes in further steps. Below command works only if you add the NICs before installing the Operating System.

for device in `sudo lshw -class network -short | grep -i ConnectX | awk '{print $2}' | egrep -v 'Device|path' | sed '/^$/d'`;do echo -n $device; sudo ethtool $device | grep -i "Link detected"; done

Output:

1
2
ens160f0        Link detected: yes
ens160f1        Link detected: no

Create the custom network operator values.yaml.

$ nano network-operator-values.yaml

Update the active Mellanox device from the above command.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
deployCR: true
ofedDriver:
deploy: true
nvPeerDriver:
deploy: true
rdmaSharedDevicePlugin:
deploy: true
resources:
    - name: rdma_shared_device_a
    vendors: [15b3]
    devices: [ens160f0]

For more information about custom network operator values.yaml, please refer Network Operator.

Add the NVIDIA repo:

Note

Helm is required to install GPU Operator.

$ helm repo add mellanox https://mellanox.github.io/network-operator

Update the Helm repo:

$ helm repo update

Install NVIDIA Network Operator

Execute the commands below:

1
2
$ kubectl label nodes --all node-role.kubernetes.io/master- --overwrite
$ helm install -f ./network-operator-values.yaml -n network-operator --create-namespace --wait network-operator mellanox/network-operator

Validating the State of Network Operator

Please note that the installation of the Network Operator can take a couple of minutes. How long the installation will take depends on your internet speed.

kubectl get pods --all-namespaces | egrep 'network-operator|nvidia-network-operator-resources'
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
NAMESPACE                           NAME                                                              READY   STATUS      RESTARTS   AGE
network-operator                    network-operator-547cb8d999-mn2h9                                 1/1     Running            0          17m
network-operator                    network-operator-node-feature-discovery-master-596fb8b7cb-qrmvv   1/1     Running            0          17m
network-operator                    network-operator-node-feature-discovery-worker-qt5xt              1/1     Running            0          17m
nvidia-network-operator-resources   cni-plugins-ds-dl5vl                                              1/1     Running            0          17m
nvidia-network-operator-resources   kube-multus-ds-w82rv                                              1/1     Running            0          17m
nvidia-network-operator-resources   mofed-ubuntu20.04-ds-xfpzl                                        1/1     Running            0          17m
nvidia-network-operator-resources   rdma-shared-dp-ds-2hgb6                                           1/1     Running            0          17m
nvidia-network-operator-resources   sriov-device-plugin-ch7bz                                         1/1     Running            0          10m
nvidia-network-operator-resources   whereabouts-56ngr                                                 1/1     Running            0          10m

Please refer to the Network Operator page for more information.