NVIDIA AI Enterprise 2.0 or later
Next, we will install the NVIDIA Network Operator. This only applicable if you worker nodes have NVIDIA Networking. The Network Operator’s goal is to install the host networking components required to enable RDMA and GPUDirect in a Kubernetes cluster. It does so by configuring a high-speed data path for IO intensive workloads on a secondary network in each cluster node.
Select Operators > Operator Hub, and search for the NVIDIA Network Operator.
Select the NVIDIA Network Operator, and click Install in the first screen and in the subsequent one.
NoteFor additional information, see the Red Hat OpenShift Container Platform Documentation.
The NVIDIA Network Operator can also be installed using CLI. The steps are provided for informational purposes.
Create a namespace for the Network Operator.
Create the following Namespace custom resource (CR) that defines the network-operator namespace, and then save the YAML in the
network-operator-namespace.yaml
file:apiVersion: v1 kind: Namespace metadata: name: network-operator
Create the namespace by running the following command:
$ oc create -f network-operator-namespace.yaml
Install the Network Operator in the namespace you created in the previous step by creating the below objects.
Run the following command to get the channel value required for the next step:
$ oc get packagemanifest network-operator -n openshift-marketplace -o jsonpath='{.status.defaultChannel}'
Example Output:
stable
Create the following Subscription CR, and save the YAML in the
network-operator-sub.yaml
file:apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: network-operator namespace: network-operator spec: channel: "stable" installPlanApproval: Manual name: network-operator sourceNamespace: openshift-marketplace
Create the subscription object by running the following command:
$ oc create -f network-operator-sub.yaml
Change to the network-operator project:
$ oc project network-operator
To verify that the operator deployment is successful, run:
$ oc get pods
Example Output:
NAME READY STATUS RESTARTS AGE vidia-network-operator-controller-manager-8f8ccf45c-zgfsq 2/2 Running 0 1
A successful deployment shows a Running status.