Install the Node Feature Discovery (NFD) Operator#
The Node Feature Discovery (NFD) Operator is a prerequisite for the NVIDIA GPU and Network Operators. NFD will perform a discovery and reconciliation loop and apply node labels to each machine that describe the hardware configuration.
Install the NFD Operator using the Red Hat Software Catalog (Red Hat OperatorHub in versions before 4.20). Follow the Red Hat documented instructions in the Node Feature Discovery Operator guide to install the Node Feature Discovery Operator on Red Hat OpenShift.
The Node Feature Discovery Operator uses vendor PCI IDs to identify hardware in a node. 0x10de and 15b3 are the PCI vendor IDs assigned to NVIDIA. Inspect the node labels using using the OpenShift Container Platform web console or the CLI to verify that the Node Feature Discovery Operator is functioning correctly.
Verifying NFD node labels using the web console#
In the OpenShift Container Platform web console, click Compute > Nodes from the side menu.
Select a worker node that contains a GPU.
Click the Details tab.
Under Node Labels, verify that the following label is present:
feature.node.kubernetes.io/pci-10de.present=true
Verifying NFD node labels using the CLI#
Verify that the PCI devices are discovered on the nodes:
1oc describe node | egrep 'Roles|pci-10de|pci-15b3' 2Roles: control-plane,master,worker 3 feature.node.kubernetes.io/pci-10de.present=true 4 feature.node.kubernetes.io/pci-10de.sriov.capable=true 5 feature.node.kubernetes.io/pci-15b3.present=true 6 feature.node.kubernetes.io/pci-15b3.sriov.capable=true