Installing OpenShift Operators#

To run media applications on the cluster, the following operators are required:

Node Feature Discovery Operator
NUMA Resources Operator
NVIDIA Network Operator
NVIDIA GPU Operator
SR-IOV Network Operator
Kubernetes NMState Operator
PTP Operator

Installing Node Feature Discovery Operator#

Node Feature Discovery Operator (NFD) manages the detection of hardware features and labels the nodes with hardware-specific information. NFD labels the host with node-specific attributes, such as PCI cards, kernel, operating system version, and so on.

To install the operator using the Web console:

Expand the Operators section and select Operator Hub.
Use the search bar to search for Node Feature Discovery.
Select the operator that is provided by Red Hat and maintained by Red Hat.
In the pop-up window, click Install.
Check A specific namespace on the cluster and click Install.
Navigate to Operators > Installed Operators and wait for 2–3 minutes for operator installation to complete.
Click Node Feature Discovery Operator. Under the Provided APIs section, click Create instance for NodeFeatureDiscovery. In the subsequent screen, click the Create button.

Wait for 2–3 minutes for Node Feature Discovery pods to start. On jump node, use oc CLI tool to monitor status of various pods:

oc get pods -o wide -n openshift-nfd

NAME                                      READY   STATUS    RESTARTS   AGE   IP             NODE
nfd-controller-manager-668b4cb675-pbcvz   1/1     Running   0          14h   10.131.0.32    h4m-d
nfd-gc-6b46f5f846-sk4rd                   1/1     Running   0          14h   10.131.0.33    h4m-d
nfd-master-b4c548d99-4vfxf                1/1     Running   0          14h   10.129.0.157   h4m-a
nfd-worker-pbskf                          1/1     Running   0          14h   10.47.33.0     h4m-e
nfd-worker-pxzqg                          1/1     Running   0          14h   10.47.32.255   h4m-d

Installing NUMA Resources Operator#

NUMA Resources Operator allows you to schedule high-performance workloads in the same NUMA zone. It deploys a node resources exporting agent that reports on available cluster node NUMA resources, and a secondary scheduler that manages the workloads. For more information see Scheduling NUMA-aware workloads.

Note

For servers with fewer network adapters or GPUs, or an unbalanced topology, disable NUMA-aware scheduling (see Create Performance Profile) and skip Installing NUMA Resources Operator.

To install the operator using the Web console:

Click Administration > Namespaces > Create Namespace.
Enter openshift-numaresources in the Name field, and then click Create.
Expand the Operators section and select Operator Hub. Use the search bar to search for numa. Select numaresources-operator.
In the opened pop-up window, click Install.
Select Installed Namespace as openshift-numaresources and click Install.

Creating a Custom Resource#

Create a file, nrop.yaml, based on the following template:

apiVersion: nodetopology.openshift.io/v1
kind: NUMAResourcesOperator
metadata:
    name: numaresourcesoperator
spec:
    nodeGroups:
    - machineConfigPoolSelector:
        matchLabels:
            <machine-config-pool>
    podExcludes:
    - name: installer-*
      namespace: openshift-etcd
    - name: revision-pruner-*
      namespace: openshift-etcd
    - name: revision-pruner-*
      namespace: openshift-kube-scheduler
    - name: installer-*
      namespace: openshift-kube-controller-manager
    - name: revision-pruner-*
      namespace: openshift-kube-controller-manager
    - name: installer-*
      namespace: openshift-kube-apiserver
    - name: revision-pruner-*
      namespace: openshift-kube-apiserver

Modify <machine-config-pool> value according to the cluster type.

For a standard (5-node) cluster replace <machine-config-pool> with following line:
```
machineconfiguration.openshift.io/role: "holoscanmedia"
```
For SNO or a compact (3-node) cluster, replace <machine-config-pool> with following line:
```
pools.operator.machineconfiguration.openshift.io/master: ""
```
Apply the custom resource:
```
oc create -f nrop.yaml
```

Deploying the NUMA-aware Secondary Pod Scheduler#

Create a file, nro-scheduler.yaml, with the following content:

apiVersion: nodetopology.openshift.io/v1
kind: NUMAResourcesScheduler
metadata:
    name: numaresourcesscheduler
spec:
    imageSpec: "registry.redhat.io/openshift4/noderesourcetopology-scheduler-rhel9:<version-tag>"
    cacheResyncPeriod: "5s"
    schedulerInformer: Shared

where <version-tag> is based on the OpenShift version, for example, v4.18

Apply the custom resource:
```
oc create -f nro-scheduler.yaml
```

After a few minutes, run the following command to confirm successful deployment of the required resources:

oc get all -n openshift-numaresources

NAME                                                    READY   STATUS    RESTARTS   AGE
pod/numaresources-controller-manager-7b58bd4c8c-7vz9b   1/1     Running   0          26m
pod/numaresourcesoperator-holoscanmedia-vggmd           2/2     Running   0          3m5s
pod/numaresourcesoperator-holoscanmedia-zkzwn           2/2     Running   0          3m5s
pod/secondary-scheduler-65bcb645b8-qtk8z                1/1     Running   0          25s

NAME                                                       TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
service/numaresources-controller-manager-metrics-service   ClusterIP   172.30.187.155   <none>        8080/TCP   26m
service/numaresources-rte-metrics-service                  ClusterIP   172.30.223.80    <none>        2112/TCP   3m7s

NAME                                                 DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                            AGE
daemonset.apps/numaresourcesoperator-holoscanmedia   2         2         2       2            2           node-role.kubernetes.io/holoscanmedia=   3m7s

NAME                                               READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/numaresources-controller-manager   1/1     1            1           26m
deployment.apps/secondary-scheduler                1/1     1            1           25s

NAME                                                          DESIRED   CURRENT   READY   AGE
replicaset.apps/numaresources-controller-manager-7b58bd4c8c   1         1         1       26m
replicaset.apps/secondary-scheduler-65bcb645b8                1         1         1       25s

Note

Pods that request Guaranteed QoS for isolated CPUs and memory on the same NUMA node, and/or huge pages, GPU, or SR-IOV networking, must specify schedulerName: topo-aware-scheduler. After configuring the Topology Manager single-numa-node policy, relying on the default-scheduler for these pods can cause runaway pod creation errors (ContainerStatusUnknown).

Installing NVIDIA Network Operator#

NVIDIA Network Operator is required on the cluster to enable availability to compile and install the RDMA GPUDirect module as part of the NVIDIA GPU Operator.

To install the operator using the Web console:

Expand the Operators section and select Operator Hub.
Use the search bar to search for NVIDIA. Select NVIDIA Network Operator.
In the opened pop-up window, click Install.
Select Update channel v25.4 and version 25.4.0. Now click Install. Installation will take about 2 minutes to complete.
Navigate to Operators > Installed Operators.
Select NVIDIA Network Operator.
On the NVIDIA Network Operator details screen, click Create instance in the NicClusterPolicy section.
In the NicClusterPolicy tab, make the following modifications:
- In the ibKubernetes section, remove the value from the periodicUpdateSeconds field.
- In the ofedDriver section, expand the env subsection and add the following environment variables:
  - UNLOAD_STORAGE_MODULES: true
- In the rdmaSharedDevicePlugin section, remove all values in the following subsections: image, repository, version and config.
- Now click YAML view and remove the nicConfigurationOperator section from the YAML file.
- Click the Create button.
Note

Applying the NicClusterPolicy is dependent on the server platform hardware configuration and takes some time.

Wait for 5–10 minutes. Ensure that all pods in NVIDIA Network Operator namespace have Ready status:

oc get pods -o wide -n nvidia-network-operator

NAME                                                        READY   STATUS    RESTARTS   AGE   IP             NODE
mofed-rhcos4.18-6f4f75ff7c-ds-fh2nh                         2/2     Running   0          16h   10.47.33.0     h4m-d
mofed-rhcos4.18-6f4f75ff7c-ds-jdtbz                         2/2     Running   0          16h   10.47.32.255   h4m-e
nvidia-network-operator-controller-manager-b678d987-p297n   1/1     Running   0          16h   10.130.0.42    h4m-a

Check Network Adapter Firmware and Configuration#

To produce ST 2110 compliant streams, it is important to ensure each network adapter is optimally configured. To check and update the network adapter firmware, we will use Driver Toolkit. The Driver Toolkit is a container image, which includes the kernel packages commonly required by tools in driver containers. For more information, see the Driver Toolkit documentation.

You must iterate through this section for all of your driver-toolkit containers.

Find the Driver Toolkit image for the cluster:

oc adm release info --image-for=driver-toolkit
quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4feae2e6bb59f667431e11e832d294fc5900c1ab38a3bb845191ee93524208e4

Create a new project namespace:
oc new-project driver-toolkit

Create driver_toolkit.yaml based on the following template:

apiVersion: v1
kind: ServiceAccount
metadata:
    name: driver-toolkit-container
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
    name: driver-toolkit-container
rules:
- apiGroups:
  - security.openshift.io
  resources:
  - securitycontextconstraints
  verbs:
  - use
  resourceNames:
  - privileged
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
    name: driver-toolkit-container
roleRef:
    apiGroup: rbac.authorization.k8s.io
    kind: Role
    name: driver-toolkit-container
subjects:
- kind: ServiceAccount
  name: driver-toolkit-container
userNames:
- system:serviceaccount:mft-driver:driver-toolkit-container
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
    name: driver-toolkit-container
spec:
    selector:
        matchLabels:
            app: driver-toolkit-container
    template:
        metadata:
            labels:
                app: driver-toolkit-container
        spec:
            serviceAccount: driver-toolkit-container
            serviceAccountName: driver-toolkit-container
            containers:
            - image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<SHA>
              name: driver-toolkit-container
              imagePullPolicy: Always
              command: [sleep, infinity]
              securityContext:
                privileged: true
            nodeSelector:
                node-role.kubernetes.io/<machine-config-pool>: ""

Replace <machine-config-pool> with holoscanmedia for a 5-node cluster or master for a 3-node cluster or SNO.
Replace image <SHA> with the driver-toolkit container information extracted in the section above.
Deploy the daemon set:
```
oc create -f driver-toolkit.yaml
```
Note

This step reports “Warning: would violate PodSecurity “restricted:v1.24”…”. You can safely ignore it.

List pods created under the new project driver-toolkit:

oc get pods

NAME                             READY   STATUS    RESTARTS   AGE
driver-toolkit-container-l6d5c   1/1     Running   0          19m
driver-toolkit-container-qlj5d   1/1     Running   0          19m

Log into the first container:

oc exec -it driver-toolkit-container-l6d5c -- /bin/bash

Download firmware tools using the following command. The firmware tools archive will be downloaded into the current directory:
```
wget https://www.mellanox.com/downloads/MFT/mft-4.33.0-169-x86_64-rpm.tgz
```
Unzip the archive:
```
tar -xvf mft-4.33.0-169-x86_64-rpm.tgz
```

Navigate to the extracted directory and install:

cd mft-4.33.0-169-x86_64-rpm
./install.sh
...
-I- In order to start mst, please run "mst start".

Install the PCI Utilities package and start mst as instructed in previous output:

yum install pciutils -y
...
Complete!

mst start

Starting MST (Mellanox Software Tools) driver set
Loading MST PCI module - Success
Loading MST PCI configuration module - Success
Create devices
-W- Missing "lsusb" command, skipping MTUSB devices detection
Unloading MST PCI module (unused) - Success

Check the network adapter firmware using the following command:
```
mlxfwmanager -u --online
```
Follow the prompts to update the firmware as required. Firmware updates take several minutes, and a restart is needed for updates to take effect. This can be done once, after all configuration changes have been made.

Check PCI address of MST devices, that is, network adapters:

mst status

MST modules:
------------
    MST PCI module is not loaded
    MST PCI configuration module loaded
MST devices:
------------

/dev/mst/mt4129_pciconf0      - PCI configuration cycles access.
                                domain:bus:dev.fn=0000:37:00.0 addr.reg=88
                                Chip revision is: 00
/dev/mst/mt4129_pciconf1      - PCI configuration cycles access.
                                domain:bus:dev.fn=0000:8b:00.0 addr.reg=88
                                Chip revision is: 00
/dev/mst/mt41692_pciconf0     - PCI configuration cycles access.
                                domain:bus:dev.fn=0000:a0:00.0 addr.reg=88
                                Chip revision is: 01

For each device listed above, ensure REAL_TIME_CLOCK_ENABLE is True. Query the configuration setting using highlighted command and change as needed.

mlxconfig -d /dev/mst/mt4129_pciconf0 query REAL_TIME_CLOCK_ENABLE

Device #1:
----------

Device type: ConnectX7
Name: MCX713106AC-VEA_Ax
Description: NVIDIA ConnectX-7 HHHL Adapter Card; 200GbE; Dual-port
QSFP112; PCIe 5.0 x16; Crypto Enabled; Secure Boot Enabled
Device: /dev/mst/mt4129_pciconf0

Configurations: Next Boot
    REAL_TIME_CLOCK_ENABLE False(0)

Because REAL_TIME_CLOCK_ENABLE is set to False (0), let’s enable it:

mlxconfig -d /dev/mst/mt4129_pciconf0 set REAL_TIME_CLOCK_ENABLE=1

Device #1:
----------

Device type: ConnectX7
Name: MCX713106AC-VEA_Ax
Description: NVIDIA ConnectX-7 HHHL Adapter Card; 200GbE; Dual-port
QSFP112; PCIe 5.0 x16; Crypto Enabled; Secure Boot Enabled
Device: /dev/mst/mt4129_pciconf0

Configurations: Next Boot New
    REAL_TIME_CLOCK_ENABLE False(0) True(1)

Apply new Configuration? (y/n) [n] : y
Applying... Done!
-I- Please reboot machine to load new configurations.

Any change in configuration requires the machine to be rebooted. This can be done once, after all configuration changes have been made.

If you are using AMD based servers, ensure PCI_WR_ORDERING is set to force_relax for each device listed in the output of the step 15 mst status command. Query the the value of PCI_WR_ORDERING using the following command:

mlxconfig -d /dev/mst/mt4129_pciconf0 query PCI_WR_ORDERING

Device #1:
----------

Device type: ConnectX7
Name: MCX713106AC-VEA_Ax
Description: NVIDIA ConnectX-7 HHHL Adapter Card; 200GbE; Dual-port
QSFP112; PCIe 5.0 x16; Crypto Enabled; Secure Boot Enabled
Device: /dev/mst/mt4129_pciconf0

Configurations: Next Boot
    PCI_WR_ORDERING per_mkey(0)

Because PCI_WR_ORDERING is set to per_mkey (0), let’s set it:

mlxconfig -d /dev/mst/mt4129_pciconf0 set PCI_WR_ORDERING=1

Device #1:
----------

Device type: ConnectX7
Name: MCX713106AC-VEA_Ax
Description: NVIDIA ConnectX-7 HHHL Adapter Card; 200GbE; Dual-port
QSFP112; PCIe 5.0 x16; Crypto Enabled; Secure Boot Enabled
Device: /dev/mst/mt4129_pciconf0

Configurations: Next Boot New
    PCI_WR_ORDERING per_mkey(0) force_relax(1)

Apply new Configuration? (y/n) [n] : y
Applying... Done!
-I- Please reboot machine to load new configurations.

If you are using a purchased Rivermax license, rather than a time-limited development license, you must enable NIC serial number validation using VF_VPD_ENABLE.

mlxconfig -d /dev/mst/mt4129_pciconf0 query VF_VPD_ENABLE

Device #1:
----------

Device type: ConnectX6DX
Name: MCX623106AN-CDA_Ax
Description: ConnectX-6 Dx EN adapter card; 100GbE; Dual-port QSFP56;
PCIe 4.0/3.0 x16;
Device: /dev/mst/mt4129_pciconf0
Configurations: Next Boot
VF_VPD_ENABLE False(0)

Because VF_VPD_ENABLE is set to False (0), let’s enable it:

mlxconfig -d /dev/mst/mt4129_pciconf0 set VF_VPD_ENABLE=1

Device #1:
----------

Device type: ConnectX7
Name: MCX713106AC-VEA_Ax
Description: NVIDIA ConnectX-7 HHHL Adapter Card; 200GbE; Dual-port
QSFP112; PCIe 5.0 x16; Crypto Enabled; Secure Boot Enabled
Device: /dev/mst/mt4129_pciconf0

Configurations: Next Boot New
    VF_VPD_ENABLE False(0) True(1)

Apply new Configuration? (y/n) [n] : y
Applying... Done!
-I- Please reboot machine to load new configurations.

Notice that any change in configuration requires the machine to be rebooted. This can be done once, after all configuration changes have been made.
Exit from the Driver Toolkit container:
```
exit

Exit...
```
Repeat the above steps for all the other driver-toolkit containers.
Delete the driver-toolkit daemon set:
```
oc delete -f driver-toolkit.yaml
```
Delete the driver-toolkit project:
```
oc delete project driver-toolkit
```

Restore the OpenShift CLI default project:

oc project default

Now using project "default" on server "https://api.h4m.example.com:6443".

Reboot each of the worker machines using the instructions listed in Reboot Node Gracefully.

Installing NVIDIA GPU Operator#

NVIDIA GPU Operator uses the operator framework within Kubernetes to automate the management of all NVIDIA software components needed to provision GPUs. These components include the NVIDIA drivers (to enable CUDA), Kubernetes device plugin for GPUs, the NVIDIA Container Toolkit, automatic node labelling using GPU Feature Discovery (GFD), Data Center GPU Manager (DCGM) based monitoring, and others.

To install the operator using the Web console:

Expand the Operators section and select Operator Hub.
Use the search bar to search for NVIDIA.
Select NVIDIA GPU Operator and click Install in the next window.
In the opened pop-up window, click Install.
Select Update channel v25.3 and version 25.3.2. Now click Install.
When the installation is complete, in the left menu bar go to the Operators section and Click the Installed Operators, then select NVIDIA GPU Operator.
On the NVIDIA GPU Operator details screen, click Create instance in the ClusterPolicy section.
In the ClusterPolicy tab, open the NVIDIA GPU/vGPU Driver config section, tick the enabled checkbox in the rdma subsection and click the Create button.

Wait for 10–15 minutes. Ensure that all pods in NVIDIA GPU Operator namespace have Ready or Completed status:

oc get pods -n nvidia-gpu-operator -o wide

NAME                                                  READY   STATUS      RESTARTS      AGE   IP            NODE
gpu-feature-discovery-c4qd5                           1/1     Running     0             86m   10.131.0.71   h4m-d
gpu-feature-discovery-wj2w4                           1/1     Running     0             86m   10.128.2.70   h4m-e
gpu-operator-d589cbf48-h7r6q                          1/1     Running     0             88m   10.128.2.57   h4m-d
nvidia-container-toolkit-daemonset-8jwns              1/1     Running     0             86m   10.128.2.66   h4m-d
nvidia-container-toolkit-daemonset-lhtfq              1/1     Running     0             86m   10.131.0.73   h4m-e
nvidia-cuda-validator-dmc5b                           0/1     Completed   0             84m   10.131.0.77   h4m-e
nvidia-cuda-validator-qtxb6                           0/1     Completed   0             84m   10.128.2.72   h4m-d
nvidia-dcgm-exporter-s7xxg                            1/1     Running     1             86m   10.128.2.71   h4m-d
nvidia-dcgm-exporter-vbpdn                            1/1     Running     0             86m   10.131.0.76   h4m-e
nvidia-dcgm-gx25h                                     1/1     Running     0             86m   10.131.0.75   h4m-e
nvidia-dcgm-h8qzg                                     1/1     Running     0             86m   10.128.2.69   h4m-d
nvidia-device-plugin-daemonset-484tb                  1/1     Running     0             86m   10.131.0.72   h4m-e
nvidia-device-plugin-daemonset-txpb5                  1/1     Running     0             86m   10.128.2.67   h4m-d
nvidia-driver-daemonset-418.94.202507091512-0-6wlrx   3/3     Running     0             87m   10.131.0.63   h4m-e
nvidia-driver-daemonset-418.94.202507091512-0-gnnxl   3/3     Running     0             87m   10.128.2.58   h4m-d
nvidia-node-status-exporter-9xh24                     1/1     Running     0             87m   10.128.2.65   h4m-d
nvidia-node-status-exporter-rpn8x                     1/1     Running     0             87m   10.131.0.70   h4m-e
nvidia-operator-validator-2hh9s                       1/1     Running     0             86m   10.131.0.74   h4m-e
nvidia-operator-validator-v2ptl                       1/1     Running     0             86m   10.128.2.68   h4m-d

Installing SR-IOV Network Operator#

SR-IOV Network Operator is responsible for configuring the SR-IOV networking components in an OpenShift cluster.

To install the operator using the web console:

Expand the Operators section and select Operator Hub.
Use the search bar to search for SR-IOV.
Select SR-IOV Network Operator and click Install in the next window.
In the opened pop-up window, click Install.
Check A specific namespace on the cluster and click Install.
When the installation is complete, click Create SriovOperatorConfig.
On the Create SriovOperatorConfig screen, click Create.

Note

For SNO, select the disableDrain checkbox. This will prevent the operator from draining the node after applying the Node Policy.

Wait for 3–5 minutes for the operator installation to complete. Verify that installation completed successfully using the following command:

oc get pods -n openshift-sriov-network-operator

NAME                                     READY   STATUS    RESTARTS   AGE
network-resources-injector-c728j         1/1     Running   0          69s
network-resources-injector-jjv9s         1/1     Running   0          69s
network-resources-injector-n2zpv         1/1     Running   0          69s
operator-webhook-4f9p7                   1/1     Running   0          69s
operator-webhook-tg6dl                   1/1     Running   0          69s
operator-webhook-zfpwr                   1/1     Running   0          69s
sriov-network-config-daemon-f6sd7        1/1     Running   0          69s
sriov-network-config-daemon-wgbrs        1/1     Running   0          69s
sriov-network-operator-694c67878-fct5t   1/1     Running   0          2m2s

For a complete network configuration, the following components must be created:
- SR-IOV Network Node Policies
- SR-IOV Networks
Refer to the following sections.

Configure SR-IOV Network Node Policy#

Use the following command to identify the SR-IOV capable interfaces, which are connected. Replace <worker-node> with a node name.

In the reference environment, four ports are connected on each node.

oc -n openshift-sriov-network-operator \
  get sriovnetworknodestates.sriovnetwork.openshift.io <worker-node> -o json | \
  jq '.status.interfaces[] |
      select(.linkSpeed | test("^[1-9][0-9]{4,} Mb/s$")) | .name'
"ens3f0np0"
"ens3f1np1"
"ens6f0np0"
"ens6f1np1"

Create an SriovNetworkNodePolicy CR for each of the network interfaces.

Create sriov_policy.yaml based on the following template:

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
    name: <policy-name>
    namespace: openshift-sriov-network-operator
spec:
    nodeSelector:
        feature.node.kubernetes.io/rdma.capable: "true"
    resourceName: <resource_name>
    priority: 99
    mtu: 1500
    numVfs: 32
    nicSelector:
        pfNames: ["<interface_name>#0-31"]
    deviceType: netdevice
    isRdma: true
---

Repeat the above snippet for each interface to be configured. Replace <interface_name> in the above snippet with the interface name from the previous command. Replace <policy-name> and <resource_name> to indicate the intended purpose of the Virtual Functions (VFs), which the resource will expose.

In the reference environment, the snippet is repeated four times, to reflect the solution logical design:

`<interface_name>`	`<policy-name>`	`<resource_name>`	Purpose
`ens3f0np0`	`media-a-tx-pool`	`media_a_tx_pool`	Transmit-centric (red)
`ens3f1np1`	`media-b-tx-pool`	`media_b_tx_pool`	Transmit-centric (blue)
`ens6f0np0`	`media-a-rx-pool`	`media_a_rx_pool`	Receive-centric (red)
`ens6f1np1`	`media-b-rx-pool`	`media_b_rx_pool`	Receive-centric (blue

Apply the network node policy:

oc create -f sriov_policy.yaml

sriovnetworknodepolicy.sriovnetwork.openshift.io/media-a-tx-pool created
sriovnetworknodepolicy.sriovnetwork.openshift.io/media-b-tx-pool created
sriovnetworknodepolicy.sriovnetwork.openshift.io/media-a-rx-pool created
sriovnetworknodepolicy.sriovnetwork.openshift.io/media-b-rx-pool created

Note

This step takes a while. This depends on the number of worker nodes and the number of VFs for each network interface.

Wait for 2–3 minutes for all pods to have the Ready status. Check pod state using the following command:

oc get pods -n openshift-sriov-network-operator

NAME                                     READY   STATUS    RESTARTS   AGE
network-resources-injector-c728j         1/1     Running   0          69s
network-resources-injector-jjv9s         1/1     Running   0          69s
network-resources-injector-n2zpv         1/1     Running   0          69s
operator-webhook-4f9p7                   1/1     Running   0          69s
operator-webhook-tg6dl                   1/1     Running   0          69s
operator-webhook-zfpwr                   1/1     Running   0          69s
sriov-network-config-daemon-f6sd7        1/1     Running   0          69s
sriov-network-config-daemon-wgbrs        1/1     Running   0          69s
sriov-network-operator-694c67878-fct5t   1/1     Running   0          2m2s

Wait for 15–20 minutes for virtual functions to get created. Ensure that the pools all have non-zero positive values for all worker nodes before proceeding, for example:

oc get node h4m-d -o json | jq '.status.allocatable |
with_entries(select(.key|test("^openshift.io/.+pool$")))'
{
    "openshift.io/media_a_rx_pool": "16",
    "openshift.io/media_a_tx_pool": "16",
    "openshift.io/media_b_rx_pool": "16",
    "openshift.io/media_b_tx_pool": "16"
}

Execute above command for each worker node and confirm that virtual functions have been created.

Configure SR-IOV Network#

Create SriovNetwork resources for each of the network interfaces by reference to the resourceName defined in the SriovNetworkNodePolicy resources. In the example below, two networks are created for each interface. The first is configured to use the Whereabouts plugin for dynamic IP Address Management (IPAM). The second is configured with static IPAM to allow manual, fixed, assignment of IP addresses.

Create sriov_network.yaml based on the following example. Update the vlan and other configuration based on your environment. For example, remove the vlan entry if not applicable.

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
    name: media-a-tx-net
    namespace: openshift-sriov-network-operator
spec:
    ipam: |
        {
            "type": "whereabouts",
            "range": "192.168.20.0/24",
            "exclude": [ "192.168.20.0/26", "192.168.20.128/25" ]
        }
    networkNamespace: default
    resourceName: media_a_tx_pool
    vlan: 200
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
    name: media-a-tx-net-static
    namespace: openshift-sriov-network-operator
spec:
    ipam: |
        {
            "type": "static"
        }
    networkNamespace: default
    resourceName: media_a_tx_pool
    vlan: 200
---

In the reference environment, the snippet is repeated four times, with appropriate range and exclude values to reflect the solution logical design:

`<resource_name>`	`<network-names>`	Static IPAM Range	Dynamic IPAM Range
`media_a_tx_pool`	`media-a-tx-net(-static)`	192.168.20.0-63	192.168.20.64-127
`media_b_tx_pool`	`media-b-tx-net(-static)`	192.168.120.0-63	192.168.120.64-127
`media_a_rx_pool`	`media-a-rx-net(-static)`	192.168.20.128-191	192.168.20.192-255
`media_b_rx_pool`	`media-b-rx-net(-static)`	192.168.120.128-191	192.168.120.192-255

Create the networks:

oc create -f sriov_network.yaml

sriovnetwork.sriovnetwork.openshift.io/media-a-rx-net created
sriovnetwork.sriovnetwork.openshift.io/media-a-rx-net-static created
sriovnetwork.sriovnetwork.openshift.io/media-a-tx-net created
sriovnetwork.sriovnetwork.openshift.io/media-a-tx-net-static created
sriovnetwork.sriovnetwork.openshift.io/media-b-rx-net created
sriovnetwork.sriovnetwork.openshift.io/media-b-rx-net-static created
sriovnetwork.sriovnetwork.openshift.io/media-b-tx-net created
sriovnetwork.sriovnetwork.openshift.io/media-b-tx-net-static created

Execute the following command to validate success:

oc get network-attachment-definitions

NAME AGE
media-a-rx-net 31s
media-a-rx-net-static 31s
media-a-tx-net 31s
media-a-tx-net-static 31s
media-b-rx-net 31s
media-b-rx-net-static 31s
media-b-tx-net 31s
media-b-tx-net-static 31s

Installing Kubernetes NMState Operator#

Kubernetes NMState Operator is used for advanced configuration of network interface cards, for example, creation of VLAN under a specific network interface.

To install the operator using the Web console:

Expand the Operators section and select Operator Hub. Use the search bar to search for NMState. Select Kubernetes NMState Operator.
In the opened pop-up window, click Install.
Check A specific namespace on the cluster and click Install.
After the installation completes, navigate to the Operators section, click Installed Operators, and select Kubernetes NMState Operator. On the Kubernetes NMState Operator details screen, click Create instance in the Provided APIs section.
Click the Create button.

To ensure that the NMState instance is deployed properly, run the following commands:

oc get pods -o wide -n openshift-nmstate

NAME                                     READY   STATUS    RESTARTS   AGE   IP             NODE
nmstate-console-plugin-c69f95794-ztkkl   1/1     Running   0          74m   10.131.0.81    h4m-e
nmstate-handler-6vljk                    1/1     Running   0          74m   10.47.32.255   h4m-d
nmstate-handler-92t7h                    1/1     Running   0          74m   10.47.29.253   h4m-a
nmstate-handler-cq2tw                    1/1     Running   0          74m   10.47.11.228   h4m-a
nmstate-handler-gls2d                    1/1     Running   0          74m   10.47.31.241   h4m-b
nmstate-handler-gqg84                    1/1     Running   0          74m   10.47.33.0     h4m-e
nmstate-metrics-69d774d9b6-bcjf7         2/2     Running   0          74m   10.130.0.59    h4m-c
nmstate-operator-85d697d7f4-jxpvc        1/1     Running   0          75m   10.130.0.57    h4m-c
nmstate-webhook-755bbbc9fc-gdrng         1/1     Running   0          74m   10.129.1.62    h4m-a
nmstate-webhook-755bbbc9fc-xhm2x         1/1     Running   0          74m   10.130.0.58    h4m-c

Assign Static IP Addresses on Worker Nodes#

To ensure that PTP Operator can sync time from a PTP grandmaster clock, each worker node needs a network adapter to be assigned an IP address in the network where PTP is running.

Create nmstate_config.yaml based on the following template. Replace <config>, <worker>, <interface-name>, <ptp-vlan>, and <ptp-ip>, in the above snippet. Repeat for each worker node:

apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
    name: <config>
spec:
    nodeSelector:
        kubernetes.io/hostname: <worker>
    desiredState:
        interfaces:
        - name: <interface-name>.<ptp-vlan>
          description: VLAN using <interface-name>
          type: vlan
          state: up
          vlan:
            base-iface: <interface-name>
            id: <ptp-vlan>
          ipv4:
            dhcp: false
            address:
            - ip: <ptp-ip>
              prefix-length: 24
            enabled: true
---

Apply the NetworkManager state:
```
oc create -f nmstate_config.yaml
```

Confirm that the policy has been applied successfully on the cluster:

oc get nncp

NAME STATUS REASON
nmstate-node-d-network Available SuccessfullyConfigured
nmstate-node-e-network Available SuccessfullyConfigured

Installing PTP Operator#

To install the operator using the Web console:

Expand the Operators section and select Operator Hub. Use the search bar to search for PTP.
Select PTP Operator provided by Red Hat.
In the opened pop-up window, click Install.
Check A specific namespace on the cluster and click Install.

Confirm that the PTP Operator is deployed properly by running the following command:

oc get pods -o wide -n openshift-ptp

NAME                           READY   STATUS    RESTARTS   AGE     IP             NODE
linuxptp-daemon-2cklw          2/2     Running   0          43m     10.47.31.241   h4m-a
linuxptp-daemon-4dnjb          2/2     Running   0          43m     10.47.33.0     h4m-b
linuxptp-daemon-g2x9t          2/2     Running   0          43m     10.47.29.253   h4m-c
linuxptp-daemon-sd7j7          2/2     Running   0          3m22s   10.47.32.255   h4m-d
linuxptp-daemon-v5hsv          2/2     Running   0          43m     10.47.11.228   h4m-e
ptp-operator-79c558dcf-5hdzb   1/1     Running   0          44m     10.129.1.66    h4m-d

Configure PTP#

Create ptp-config.yaml based on the following template:

apiVersion: ptp.openshift.io/v1
kind: PtpConfig
metadata:
    name: holoscanmedia-ptp-config
    namespace: openshift-ptp
spec:
    profile:
    - name: ordinary-clock
      interface: "<interface_name>.<ptp-vlan>"
      phc2sysOpts: "-w -m -n <ptp-domain-number> -s <interface_name>.<ptp-vlan>"
      ptp4lOpts: "-2 -s"
      ptpSchedulingPolicy: SCHED_FIFO
      ptpSchedulingPriority: 10
      ptp4lConf: |
        [global]
        #
        # Default Data Set
        #
        priority1 128
        priority2 127
        domainNumber <ptp-domain-number>
        use_syslog 1
        logging_level 6
        tx_timestamp_timeout 30
        hybrid_e2e 1
        dscp_event 46
        dscp_general 46

        [<interface_name>.<ptp-vlan>]
        logAnnounceInterval -2
        announceReceiptTimeout 3
        logSyncInterval -3
        logMinDelayReqInterval -3
        delay_mechanism E2E
        network_transport UDPv4
    recommend:
    - profile: ordinary-clock
      priority: 4
      match:
      - nodeLabel: "node-role.kubernetes.io/<role>"
        nodeName: "<worker-01>"
      - nodeLabel: "node-role.kubernetes.io/<role>"
        nodeName: "<worker-02>"

Replace <interface_name> in the above snippet with interface name determined previously.
Replace <ptp-vlan> with PTP VLAN of current environment, for example 300 in this case.
Replace <ptp-domain-number> with the correct domain, for example 127 indicated in section 5.1. Check other PTP parameters match the current environment.
Replace <role> with worker for 5-node cluster and master for 3-node cluster or SNO.
Replace <worker-01>, <worker-02> with node names. Add more entries as needed.

Apply the PTP configuration:
```
oc create -f ptp-config.yaml
```

Wait for 1–2 minutes for the worker nodes to pick up new config. List the pods running under the openshift-ptp namespace on worker nodes and confirm that they have started syncing time from grandmaster clock.

oc get pods -o wide -n openshift-ptp

NAME                           READY   STATUS    RESTARTS   AGE     IP             NODE
linuxptp-daemon-2cklw          2/2     Running   0          43m     10.47.31.241   h4m-a
linuxptp-daemon-4dnjb          2/2     Running   0          43m     10.47.33.0     h4m-b
linuxptp-daemon-g2x9t          2/2     Running   0          43m     10.47.29.253   h4m-c
linuxptp-daemon-sd7j7          2/2     Running   0          3m22s   10.47.32.255   h4m-d
linuxptp-daemon-v5hsv          2/2     Running   0          43m     10.47.11.228   h4m-e
ptp-operator-79c558dcf-5hdzb   1/1     Running   0          44m     10.129.1.66    h4m-d

Confirm that the worker nodes can sync time from GM clock:

oc logs linuxptp-daemon-vbx5m -n openshift-ptp -c linuxptp-daemon-container --tail=1000 -f

I0119 10:31:28.444921 7501 main.go:111] ticker pull
I0119 10:31:28.638207 7501 daemon.go:466] Recreating phc2sys...
I0119 10:31:28.638234 7501 daemon.go:359] Starting phc2sys...
I0119 10:31:28.638238 7501 daemon.go:360] phc2sys cmd: /bin/chrt -f 10
/usr/sbin/phc2sys -w -m -n 127 -s ens6f0np0.300 -u 1 -z
/var/run/ptp4l.0.socket -t [ptp4l.0.config]
...
I0119 10:31:29.281613 7501 daemon.go:466] Recreating ptp4l...
I0119 10:31:29.281640 7501 daemon.go:359] Starting ptp4l...
I0119 10:31:29.281644 7501 daemon.go:360] ptp4l cmd: /bin/chrt -f 10
/usr/sbin/ptp4l -f /var/run/ptp4l.0.config -2 -s -m
...
ptp4l[643.370]: [ptp4l.0.config] port 1: INITIALIZING to LISTENING on
INIT_COMPLETE
ptp4l[643.370]: [ptp4l.0.config] port 0: INITIALIZING to LISTENING on
INIT_COMPLETE
ptp4l[643.370]: [ptp4l.0.config] port 0: INITIALIZING to LISTENING on
INIT_COMPLETE
ptp4l[643.543]: [ptp4l.0.config] port 1: :mark:`new foreign master ec0d9a.fffe.fcd0c8-13`
I0119 10:31:29.643024 7501 daemon.go:466] Recreating phc2sys...
I0119 10:31:29.643051 7501 daemon.go:359] Starting phc2sys...
I0119 10:31:29.643054 7501 daemon.go:360] phc2sys cmd: /bin/chrt -f 10
/usr/sbin/phc2sys -w -m -n 127 -s ens6f0np0.300 -u 1 -z
/var/run/ptp4l.0.socket -t [ptp4l.0.config]
I0119 10:31:29.643111 7501 daemon.go:336]
phc2sys[1705660289]:[ptp4l.0.config] PTP_PROCESS_STATUS:1
ptp4l[644.043]: [ptp4l.0.config] selected best master clock
ec0d9a.fffe.fcd0c8
ptp4l[644.043]: [ptp4l.0.config] port 1: LISTENING to UNCALIBRATED on
RS_SLAVE
ptp4l[644.544]: [ptp4l.0.config] port 1: UNCALIBRATED to SLAVE on
MASTER_CLOCK_SELECTED
phc2sys[644.726]: [ptp4l.0.config] Waiting for ptp4l...
phc2sys[645.726]: [ptp4l.0.config] CLOCK_REALTIME rms 446078597 max
446078597 freq +0 +/- 0 delay 437 +/- 0
ptp4l[646.170]: [ptp4l.0.config] rms 12923840978 max 36554143684 freq
-2424 +/- 16106 delay -744 +/- 149
phc2sys[646.726]: [ptp4l.0.config] CLOCK_REALTIME rms 446083500 max
446083500 freq +4902 +/- 0 delay 440 +/- 0
...
phc2sys[680.731]: [ptp4l.0.config] CLOCK_REALTIME rms 2 max 2 freq
+10670 +/- 0 delay 440 +/- 0
phc2sys[681.731]: [ptp4l.0.config] CLOCK_REALTIME rms 5 max 5 freq
+10664 +/- 0 delay 440 +/- 0
ptp4l[682.184]: [ptp4l.0.config] rms 3 max 6 freq +22374 +/- 4 delay 97
+/- 0