Compute network/ IB Interfaces Configuration#

Validate IB/Compute interfaces#

Reference: Configuring SR-IOV (InfiniBand) section in OFED user guide

Check Physical link type (LINK_TYPE_P1), SRIOV(SRIOV_EN) state and VF count (NUM_OF_VF) configuration for the DGX Compute interfaces.

Physical link should be type 1 for InfiniBand, SRIOV in enabled state with 8 VFs

On the BCM headnode, run cmsh

root@bcm10-headnode1:~# cmsh
[bcm10-headnode1]% device
[bcm10-headnode1->device]%
[bcm10-headnode1->device]%pexec -c dgx-h100 -j "for i in dc 9a ce c0 4f 40 5e 18 ; do mst start; mlxconfig -d $i:00.0 q; done \| grep -e \\"SRIOV_EN\\|LINK_TYPE\\|NUM_OF_VFS\""
[dgx-01..dgx-04]
NUM_OF_VFS 8
SRIOV_EN True(1)
LINK_TYPE_P1 IB(1)
NUM_OF_VFS 8
SRIOV_EN True(1)
LINK_TYPE_P1 IB(1)
NUM_OF_VFS 8
SRIOV_EN True(1)
LINK_TYPE_P1 IB(1)
NUM_OF_VFS 8
SRIOV_EN True(1)
LINK_TYPE_P1 IB(1)
NUM_OF_VFS 8
SRIOV_EN True(1)
LINK_TYPE_P1 IB(1)
NUM_OF_VFS 8
SRIOV_EN True(1)
LINK_TYPE_P1 IB(1)
NUM_OF_VFS 8
SRIOV_EN True(1)
LINK_TYPE_P1 IB(1)
NUM_OF_VFS 8
SRIOV_EN True(1)
LINK_TYPE_P1 IB(1)

Refer to DGX H100 user guide for interface name/PCI address mapping

If SR-IOV is enabled, the interface type is InfiniBand, and eight or more VFs are already configured, proceed to section Configure SR-IOV NetworkNodePolicy CR

If SRIOV/IB/VFs are not configured, enable them using the following command

[bcm10-headnode1->device]% pexec -c dgx-h100 -j "for i in dc 9a ce c0 4f 40 5e 18 ; do mst start; mlxconfig -d $i:00.0 -y set SRIOV_EN=1 NUM_OF_VFS=8 LINK_TYPE_P1=1 ; done"
Device #1:
----------
Device type: ConnectX7
Name: MCX750500B-0D00_Ax
Description: Nvidia adapter card with four ConnectX-7; each 400Gb/s NDR
IB; PCIe 5.0 x32; PCIe switch; secured boot; No Crypto
Device: 18:00.0
Configurations: Next Boot New
SRIOV_EN True(1) True(1)
NUM_OF_VFS 8 8
LINK_TYPE_P1 IB(1) IB(1)
Apply new Configuration? (y/n) [n] : y
Applying... Done!

-I- Please reboot the machine to load new configurations.

Reboot the DGX nodes from BCM to apply the changes.

[bcm10-headnode1->device]% reboot -c dgx-h100

Wait for the nodes to come back up

[bcm10-headnode1->device]% list -c dgx-h100
Type Hostname (key) MAC Category IP Network Status
--------------------------------------------------------
PhysicalNode dgx-01 5A:9F:F8:65:70:C4 dgx-h100 10.184.94.11 managementnet [ UP ]
PhysicalNode dgx-02 1E:BB:21:13:FF:60 dgx-h100 10.184.94.12 managementnet [ UP ]
PhysicalNode dgx-03 AA:4C:1C:14:84:1F dgx-h100 10.184.94.13 managementnet [ UP ]
PhysicalNode dgx-04 06:9E:B9:10:E5:DD dgx-h100 10.184.94.14 managementnet [ UP ]

Configure 8 VFs per IB interface

[bcm10-headnode1->device] pexec -c dgx-h100 -j "for i in 0 3 4 5 6 9 10 11; do echo 8 > /sys/class/infiniband/mlx5_${i}/device/sriov_numvfs; done"
[dgx-01..dgx-04]

Check IB Interface status on the DGX nodes.

[bcm10-headnode1->device]% pexec -c dgx-h100 -j "for i in 0 3 4 5 6 9 10 11; do ibstat -d mlx5_${i} \| grep -i \\"mlx5_\\|state\\|infiniband\"; done"
[dgx-01..dgx-04]
CA 'mlx5_0'
State: Active
Physical state: LinkUp
Link layer: InfiniBand
CA 'mlx5_3'
State: Active
Physical state: LinkUp
Link layer: InfiniBand
CA 'mlx5_4'
State: Active
Physical state: LinkUp
Link layer: InfiniBand
CA 'mlx5_5'
State: Active
Physical state: LinkUp
Link layer: InfiniBand
CA 'mlx5_6'
State: Active
Physical state: LinkUp
Link layer: InfiniBand
CA 'mlx5_9'
State: Active
Physical state: LinkUp
Link layer: InfiniBand
CA 'mlx5_10'
State: Active
Physical state: LinkUp
Link layer: InfiniBand
CA 'mlx5_11'
State: Active
Physical state: LinkUp
Link layer: InfiniBand

All interfaces should be in Active and physical layers in LinkUp state.

Verify VFs under each IB interface PCI device, for e.g. 18:00.0 to 18:00.7 for OSFP4, Port 2, mlx5_0

[bcm10-headnode1->device]% pexec -c dgx-h100 -j "lspci \| grep ConnectX"
[dgx-01..dgx-04]
16:00.0 PCI bridge: Mellanox Technologies MT2910 Family [ConnectX-7 PCIe
Bridge]
17:00.0 PCI bridge: Mellanox Technologies MT2910 Family [ConnectX-7 PCIe
Bridge]
17:02.0 PCI bridge: Mellanox Technologies MT2910 Family [ConnectX-7 PCIe
Bridge]
18:00.0 Infiniband controller: Mellanox Technologies MT2910 Family
[ConnectX-7]
18:00.1 Infiniband controller: Mellanox Technologies ConnectX Family
mlx5Gen Virtual Function
18:00.2 Infiniband controller: Mellanox Technologies ConnectX Family
mlx5Gen Virtual Function
18:00.3 Infiniband controller: Mellanox Technologies ConnectX Family
mlx5Gen Virtual Function
18:00.4 Infiniband controller: Mellanox Technologies ConnectX Family
mlx5Gen Virtual Function
18:00.5 Infiniband controller: Mellanox Technologies ConnectX Family
mlx5Gen Virtual Function
18:00.6 Infiniband controller: Mellanox Technologies ConnectX Family
mlx5Gen Virtual Function
18:00.7 Infiniband controller: Mellanox Technologies ConnectX Family
mlx5Gen Virtual Function
18:01.0 Infiniband controller: Mellanox Technologies ConnectX Family
mlx5Gen Virtual Function

Configure SR-IOV NetworkNodePolicy CR#

Create a file named ‘sriov-ib-network-node-policy.yaml’ with the following information:

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: ibp24s0
  namespace: network-operator
spec:
  deviceType: netdevice
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  nicSelector:
    vendor: "15b3"
    pfNames: ["ibp24s0"]
  linkType: ib
  isRdma: true
  numVfs: 8
  priority: 90
  resourceName: resibp24s0

---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: ibp64s0
  namespace: network-operator
spec:
  deviceType: netdevice
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  nicSelector:
    vendor: "15b3"
    pfNames: ["ibp64s0"]
  linkType: ib
  isRdma: true
  numVfs: 8
  priority: 90
  resourceName: resibp64s0

---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: ibp79s0
  namespace: network-operator
spec:
  deviceType: netdevice
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  nicSelector:
    vendor: "15b3"
    pfNames: ["ibp79s0"]
  linkType: ib
  isRdma: true
  numVfs: 8
  priority: 90
  resourceName: resibp79s0

---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: ibp94s0
  namespace: network-operator
spec:
  deviceType: netdevice
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  nicSelector:
    vendor: "15b3"
    pfNames: ["ibp94s0"]
  linkType: ib
  isRdma: true
  numVfs: 8
  priority: 90
  resourceName: resibp94s0

---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: ibp154s0
  namespace: network-operator
spec:
  deviceType: netdevice
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  nicSelector:
    vendor: "15b3"
    pfNames: ["ibp154s0"]
  linkType: ib
  isRdma: true
  numVfs: 8
  priority: 90
  resourceName: resibp154s0

---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: ibp192s0
  namespace: network-operator
spec:
  deviceType: netdevice
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  nicSelector:
    vendor: "15b3"
    pfNames: ["ibp192s0"]
  linkType: ib
  isRdma: true
  numVfs: 8
  priority: 90
  resourceName: resibp192s0

---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: ibp206s0
  namespace: network-operator
spec:
  deviceType: netdevice
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  nicSelector:
    vendor: "15b3"
    pfNames: ["ibp206s0"]
  linkType: ib
  isRdma: true
  numVfs: 8
  priority: 90
  resourceName: resibp206s0

---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: ibp220s0
  namespace: network-operator
spec:
  deviceType: netdevice
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  nicSelector:
    vendor: "15b3"
    pfNames: ["ibp220s0"]
  linkType: ib
  isRdma: true
  numVfs: 8
  priority: 90
  resourceName: resibp220s0

---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: ibp24s0
  namespace: network-operator
spec:
  deviceType: netdevice
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  nicSelector:
    vendor: "15b3"
    pfNames: ["ibp24s0"]
  linkType: ib
  isRdma: true
  numVfs: 8
  priority: 90
  resourceName: resibp24s0

---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: ibp64s0
  namespace: network-operator
spec:
  deviceType: netdevice
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  nicSelector:
    vendor: "15b3"
    pfNames: ["ibp64s0"]
  linkType: ib
  isRdma: true
  numVfs: 8
  priority: 90
  resourceName: resibp64s0

---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: ibp79s0
  namespace: network-operator
spec:
  deviceType: netdevice
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  nicSelector:
    vendor: "15b3"
    pfNames: ["ibp79s0"]
  linkType: ib
  isRdma: true
  numVfs: 8
  priority: 90
  resourceName: resibp79s0

---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: ibp94s0
  namespace: network-operator
spec:
  deviceType: netdevice
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  nicSelector:
    vendor: "15b3"
    pfNames: ["ibp94s0"]
  linkType: ib
  isRdma: true
  numVfs: 8
  priority: 90
  resourceName: resibp94s0

---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: ibp154s0
  namespace: network-operator
spec:
  deviceType: netdevice
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  nicSelector:
    vendor: "15b3"
    pfNames: ["ibp154s0"]
  linkType: ib
  isRdma: true
  numVfs: 8
  priority: 90
  resourceName: resibp154s0

---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: ibp192s0
  namespace: network-operator
spec:
  deviceType: netdevice
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  nicSelector:
    vendor: "15b3"
    pfNames: ["ibp192s0"]
  linkType: ib
  isRdma: true
  numVfs: 8
  priority: 90
  resourceName: resibp192s0

---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: ibp206s0
  namespace: network-operator
spec:
  deviceType: netdevice
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  nicSelector:
    vendor: "15b3"
    pfNames: ["ibp206s0"]
  linkType: ib
  isRdma: true
  numVfs: 8
  priority: 90
  resourceName: resibp206s0

---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: ibp220s0
  namespace: network-operator
spec:
  deviceType: netdevice
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  nicSelector:
    vendor: "15b3"
    pfNames: ["ibp220s0"]
  linkType: ib
  isRdma: true
  numVfs: 8
  priority: 90
  resourceName: resibp220s0

Create the CRD using

kubectl create –f sriov-ib-network-node-policy.yaml*

Create SR-IOV IB Network CR

Create another file named ‘sriov-ib-network.yaml’ with the following information:

---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovIBNetwork
metadata:
  name: ibp24s0
  namespace: network-operator
spec:
  ipam: |
    {
      "type": "whereabouts",
      "datastore": "kubernetes",
      "kubernetes": {
        "kubeconfig": "/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig"
      },
      "range": "192.168.1.0/24",
      "log_file": "/var/log/whereabouts.log",
      "log_level": "info"
    }
  resourceName: resibp24s0
  linkState: enable
  networkNamespace: default

---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovIBNetwork
metadata:
  name: ibp64s0
  namespace: network-operator
spec:
  ipam: |
    {
      "type": "whereabouts",
      "datastore": "kubernetes",
      "kubernetes": {
        "kubeconfig": "/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig"
      },
      "range": "192.168.2.0/24",
      "log_file": "/var/log/whereabouts.log",
      "log_level": "info"
    }
  resourceName: resibp64s0
  linkState: enable
  networkNamespace: default

---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovIBNetwork
metadata:
  name: ibp79s0
  namespace: network-operator
spec:
  ipam: |
    {
      "type": "whereabouts",
      "datastore": "kubernetes",
      "kubernetes": {
        "kubeconfig": "/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig"
      },
      "range": "192.168.3.0/24",
      "log_file": "/var/log/whereabouts.log",
      "log_level": "info"
    }
  resourceName: resibp79s0
  linkState: enable
  networkNamespace: default

---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovIBNetwork
metadata:
  name: ibp94s0
  namespace: network-operator
spec:
  ipam: |
    {
      "type": "whereabouts",
      "datastore": "kubernetes",
      "kubernetes": {
        "kubeconfig": "/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig"
      },
      "range": "192.168.4.0/24",
      "log_file": "/var/log/whereabouts.log",
      "log_level": "info"
    }
  resourceName: resibp94s0
  linkState: enable
  networkNamespace: default

---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovIBNetwork
metadata:
  name: ibp154s0
  namespace: network-operator
spec:
  ipam: |
    {
      "type": "whereabouts",
      "datastore": "kubernetes",
      "kubernetes": {
        "kubeconfig": "/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig"
      },
      "range": "192.168.5.0/24",
      "log_file": "/var/log/whereabouts.log",
      "log_level": "info"
    }
  resourceName: resibp154s0
  linkState: enable
  networkNamespace: default

---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovIBNetwork
metadata:
  name: ibp192s0
  namespace: network-operator
spec:
  ipam: |
    {
      "type": "whereabouts",
      "datastore": "kubernetes",
      "kubernetes": {
        "kubeconfig": "/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig"
      },
      "range": "192.168.6.0/24",
      "log_file": "/var/log/whereabouts.log",
      "log_level": "info"
    }
  resourceName: resibp192s0
  linkState: enable
  networkNamespace: default

---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovIBNetwork
metadata:
  name: ibp206s0
  namespace: network-operator
spec:
  ipam: |
    {
      "type": "whereabouts",
      "datastore": "kubernetes",
      "kubernetes": {
        "kubeconfig": "/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig"
      },
      "range": "192.168.7.0/24",
      "log_file": "/var/log/whereabouts.log",
      "log_level": "info"
    }
  resourceName: resibp206s0
  linkState: enable
  networkNamespace: default

---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovIBNetwork
metadata:
  name: ibp220s0
  namespace: network-operator
spec:
  ipam: |
    {
      "type": "whereabouts",
      "datastore": "kubernetes",
      "kubernetes": {
        "kubeconfig": "/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig"
      },
      "range": "192.168.8.0/24",
      "log_file": "/var/log/whereabouts.log",
      "log_level": "info"
    }
  resourceName: resibp220s0
  linkState: enable
  networkNamespace: default

Apply the CRD using

kubectl create –f sriov-ib-network.yaml

Restart the services to apply the changes:

pdsh -g category=k8s-control-plane service containerd restart
pdsh -g category=k8s-control-plane service kubelet restart