Compute network/ IB Interfaces Configuration#
Validate IB/Compute interfaces#
Reference: Configuring SR-IOV (InfiniBand) section in OFED user guide
Check Physical link type (LINK_TYPE_P1), SRIOV(SRIOV_EN) state and VF count (NUM_OF_VF) configuration for the DGX Compute interfaces.
Physical link should be type 1 for InfiniBand, SRIOV in enabled state with 8 VFs
On the BCM headnode, run cmsh
root@bcm10-headnode1:~# cmsh
[bcm10-headnode1]% device
[bcm10-headnode1->device]%
[bcm10-headnode1->device]%pexec -c dgx-h100 -j "for i in dc 9a ce c0 4f 40 5e 18 ; do mst start; mlxconfig -d $i:00.0 q; done \| grep -e \\"SRIOV_EN\\|LINK_TYPE\\|NUM_OF_VFS\""
[dgx-01..dgx-04]
NUM_OF_VFS 8
SRIOV_EN True(1)
LINK_TYPE_P1 IB(1)
NUM_OF_VFS 8
SRIOV_EN True(1)
LINK_TYPE_P1 IB(1)
NUM_OF_VFS 8
SRIOV_EN True(1)
LINK_TYPE_P1 IB(1)
NUM_OF_VFS 8
SRIOV_EN True(1)
LINK_TYPE_P1 IB(1)
NUM_OF_VFS 8
SRIOV_EN True(1)
LINK_TYPE_P1 IB(1)
NUM_OF_VFS 8
SRIOV_EN True(1)
LINK_TYPE_P1 IB(1)
NUM_OF_VFS 8
SRIOV_EN True(1)
LINK_TYPE_P1 IB(1)
NUM_OF_VFS 8
SRIOV_EN True(1)
LINK_TYPE_P1 IB(1)
Refer to DGX H100 user guide for interface name/PCI address mapping
If SR-IOV is enabled, the interface type is InfiniBand, and eight or more VFs are already configured, proceed to section Configure SR-IOV NetworkNodePolicy CR
If SRIOV/IB/VFs are not configured, enable them using the following command
[bcm10-headnode1->device]% pexec -c dgx-h100 -j "for i in dc 9a ce c0 4f 40 5e 18 ; do mst start; mlxconfig -d $i:00.0 -y set SRIOV_EN=1 NUM_OF_VFS=8 LINK_TYPE_P1=1 ; done"
Device #1:
----------
Device type: ConnectX7
Name: MCX750500B-0D00_Ax
Description: Nvidia adapter card with four ConnectX-7; each 400Gb/s NDR
IB; PCIe 5.0 x32; PCIe switch; secured boot; No Crypto
Device: 18:00.0
Configurations: Next Boot New
SRIOV_EN True(1) True(1)
NUM_OF_VFS 8 8
LINK_TYPE_P1 IB(1) IB(1)
Apply new Configuration? (y/n) [n] : y
Applying... Done!
-I- Please reboot the machine to load new configurations.
Reboot the DGX nodes from BCM to apply the changes.
[bcm10-headnode1->device]% reboot -c dgx-h100
Wait for the nodes to come back up
[bcm10-headnode1->device]% list -c dgx-h100
Type Hostname (key) MAC Category IP Network Status
--------------------------------------------------------
PhysicalNode dgx-01 5A:9F:F8:65:70:C4 dgx-h100 10.184.94.11 managementnet [ UP ]
PhysicalNode dgx-02 1E:BB:21:13:FF:60 dgx-h100 10.184.94.12 managementnet [ UP ]
PhysicalNode dgx-03 AA:4C:1C:14:84:1F dgx-h100 10.184.94.13 managementnet [ UP ]
PhysicalNode dgx-04 06:9E:B9:10:E5:DD dgx-h100 10.184.94.14 managementnet [ UP ]
Configure 8 VFs per IB interface
[bcm10-headnode1->device] pexec -c dgx-h100 -j "for i in 0 3 4 5 6 9 10 11; do echo 8 > /sys/class/infiniband/mlx5_${i}/device/sriov_numvfs; done"
[dgx-01..dgx-04]
Check IB Interface status on the DGX nodes.
[bcm10-headnode1->device]% pexec -c dgx-h100 -j "for i in 0 3 4 5 6 9 10 11; do ibstat -d mlx5_${i} \| grep -i \\"mlx5_\\|state\\|infiniband\"; done"
[dgx-01..dgx-04]
CA 'mlx5_0'
State: Active
Physical state: LinkUp
Link layer: InfiniBand
CA 'mlx5_3'
State: Active
Physical state: LinkUp
Link layer: InfiniBand
CA 'mlx5_4'
State: Active
Physical state: LinkUp
Link layer: InfiniBand
CA 'mlx5_5'
State: Active
Physical state: LinkUp
Link layer: InfiniBand
CA 'mlx5_6'
State: Active
Physical state: LinkUp
Link layer: InfiniBand
CA 'mlx5_9'
State: Active
Physical state: LinkUp
Link layer: InfiniBand
CA 'mlx5_10'
State: Active
Physical state: LinkUp
Link layer: InfiniBand
CA 'mlx5_11'
State: Active
Physical state: LinkUp
Link layer: InfiniBand
All interfaces should be in Active and physical layers in LinkUp state.
Verify VFs under each IB interface PCI device, for e.g. 18:00.0 to 18:00.7 for OSFP4, Port 2, mlx5_0
[bcm10-headnode1->device]% pexec -c dgx-h100 -j "lspci \| grep ConnectX"
[dgx-01..dgx-04]
16:00.0 PCI bridge: Mellanox Technologies MT2910 Family [ConnectX-7 PCIe
Bridge]
17:00.0 PCI bridge: Mellanox Technologies MT2910 Family [ConnectX-7 PCIe
Bridge]
17:02.0 PCI bridge: Mellanox Technologies MT2910 Family [ConnectX-7 PCIe
Bridge]
18:00.0 Infiniband controller: Mellanox Technologies MT2910 Family
[ConnectX-7]
18:00.1 Infiniband controller: Mellanox Technologies ConnectX Family
mlx5Gen Virtual Function
18:00.2 Infiniband controller: Mellanox Technologies ConnectX Family
mlx5Gen Virtual Function
18:00.3 Infiniband controller: Mellanox Technologies ConnectX Family
mlx5Gen Virtual Function
18:00.4 Infiniband controller: Mellanox Technologies ConnectX Family
mlx5Gen Virtual Function
18:00.5 Infiniband controller: Mellanox Technologies ConnectX Family
mlx5Gen Virtual Function
18:00.6 Infiniband controller: Mellanox Technologies ConnectX Family
mlx5Gen Virtual Function
18:00.7 Infiniband controller: Mellanox Technologies ConnectX Family
mlx5Gen Virtual Function
18:01.0 Infiniband controller: Mellanox Technologies ConnectX Family
mlx5Gen Virtual Function
Configure SR-IOV NetworkNodePolicy CR#
Create a file named ‘sriov-ib-network-node-policy.yaml’ with the following information:
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: ibp24s0
namespace: network-operator
spec:
deviceType: netdevice
nodeSelector:
feature.node.kubernetes.io/network-sriov.capable: "true"
nicSelector:
vendor: "15b3"
pfNames: ["ibp24s0"]
linkType: ib
isRdma: true
numVfs: 8
priority: 90
resourceName: resibp24s0
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: ibp64s0
namespace: network-operator
spec:
deviceType: netdevice
nodeSelector:
feature.node.kubernetes.io/network-sriov.capable: "true"
nicSelector:
vendor: "15b3"
pfNames: ["ibp64s0"]
linkType: ib
isRdma: true
numVfs: 8
priority: 90
resourceName: resibp64s0
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: ibp79s0
namespace: network-operator
spec:
deviceType: netdevice
nodeSelector:
feature.node.kubernetes.io/network-sriov.capable: "true"
nicSelector:
vendor: "15b3"
pfNames: ["ibp79s0"]
linkType: ib
isRdma: true
numVfs: 8
priority: 90
resourceName: resibp79s0
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: ibp94s0
namespace: network-operator
spec:
deviceType: netdevice
nodeSelector:
feature.node.kubernetes.io/network-sriov.capable: "true"
nicSelector:
vendor: "15b3"
pfNames: ["ibp94s0"]
linkType: ib
isRdma: true
numVfs: 8
priority: 90
resourceName: resibp94s0
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: ibp154s0
namespace: network-operator
spec:
deviceType: netdevice
nodeSelector:
feature.node.kubernetes.io/network-sriov.capable: "true"
nicSelector:
vendor: "15b3"
pfNames: ["ibp154s0"]
linkType: ib
isRdma: true
numVfs: 8
priority: 90
resourceName: resibp154s0
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: ibp192s0
namespace: network-operator
spec:
deviceType: netdevice
nodeSelector:
feature.node.kubernetes.io/network-sriov.capable: "true"
nicSelector:
vendor: "15b3"
pfNames: ["ibp192s0"]
linkType: ib
isRdma: true
numVfs: 8
priority: 90
resourceName: resibp192s0
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: ibp206s0
namespace: network-operator
spec:
deviceType: netdevice
nodeSelector:
feature.node.kubernetes.io/network-sriov.capable: "true"
nicSelector:
vendor: "15b3"
pfNames: ["ibp206s0"]
linkType: ib
isRdma: true
numVfs: 8
priority: 90
resourceName: resibp206s0
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: ibp220s0
namespace: network-operator
spec:
deviceType: netdevice
nodeSelector:
feature.node.kubernetes.io/network-sriov.capable: "true"
nicSelector:
vendor: "15b3"
pfNames: ["ibp220s0"]
linkType: ib
isRdma: true
numVfs: 8
priority: 90
resourceName: resibp220s0
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: ibp24s0
namespace: network-operator
spec:
deviceType: netdevice
nodeSelector:
feature.node.kubernetes.io/network-sriov.capable: "true"
nicSelector:
vendor: "15b3"
pfNames: ["ibp24s0"]
linkType: ib
isRdma: true
numVfs: 8
priority: 90
resourceName: resibp24s0
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: ibp64s0
namespace: network-operator
spec:
deviceType: netdevice
nodeSelector:
feature.node.kubernetes.io/network-sriov.capable: "true"
nicSelector:
vendor: "15b3"
pfNames: ["ibp64s0"]
linkType: ib
isRdma: true
numVfs: 8
priority: 90
resourceName: resibp64s0
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: ibp79s0
namespace: network-operator
spec:
deviceType: netdevice
nodeSelector:
feature.node.kubernetes.io/network-sriov.capable: "true"
nicSelector:
vendor: "15b3"
pfNames: ["ibp79s0"]
linkType: ib
isRdma: true
numVfs: 8
priority: 90
resourceName: resibp79s0
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: ibp94s0
namespace: network-operator
spec:
deviceType: netdevice
nodeSelector:
feature.node.kubernetes.io/network-sriov.capable: "true"
nicSelector:
vendor: "15b3"
pfNames: ["ibp94s0"]
linkType: ib
isRdma: true
numVfs: 8
priority: 90
resourceName: resibp94s0
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: ibp154s0
namespace: network-operator
spec:
deviceType: netdevice
nodeSelector:
feature.node.kubernetes.io/network-sriov.capable: "true"
nicSelector:
vendor: "15b3"
pfNames: ["ibp154s0"]
linkType: ib
isRdma: true
numVfs: 8
priority: 90
resourceName: resibp154s0
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: ibp192s0
namespace: network-operator
spec:
deviceType: netdevice
nodeSelector:
feature.node.kubernetes.io/network-sriov.capable: "true"
nicSelector:
vendor: "15b3"
pfNames: ["ibp192s0"]
linkType: ib
isRdma: true
numVfs: 8
priority: 90
resourceName: resibp192s0
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: ibp206s0
namespace: network-operator
spec:
deviceType: netdevice
nodeSelector:
feature.node.kubernetes.io/network-sriov.capable: "true"
nicSelector:
vendor: "15b3"
pfNames: ["ibp206s0"]
linkType: ib
isRdma: true
numVfs: 8
priority: 90
resourceName: resibp206s0
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: ibp220s0
namespace: network-operator
spec:
deviceType: netdevice
nodeSelector:
feature.node.kubernetes.io/network-sriov.capable: "true"
nicSelector:
vendor: "15b3"
pfNames: ["ibp220s0"]
linkType: ib
isRdma: true
numVfs: 8
priority: 90
resourceName: resibp220s0
Create the CRD using
kubectl create –f sriov-ib-network-node-policy.yaml*
Create SR-IOV IB Network CR
Create another file named ‘sriov-ib-network.yaml’ with the following information:
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovIBNetwork
metadata:
name: ibp24s0
namespace: network-operator
spec:
ipam: |
{
"type": "whereabouts",
"datastore": "kubernetes",
"kubernetes": {
"kubeconfig": "/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig"
},
"range": "192.168.1.0/24",
"log_file": "/var/log/whereabouts.log",
"log_level": "info"
}
resourceName: resibp24s0
linkState: enable
networkNamespace: default
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovIBNetwork
metadata:
name: ibp64s0
namespace: network-operator
spec:
ipam: |
{
"type": "whereabouts",
"datastore": "kubernetes",
"kubernetes": {
"kubeconfig": "/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig"
},
"range": "192.168.2.0/24",
"log_file": "/var/log/whereabouts.log",
"log_level": "info"
}
resourceName: resibp64s0
linkState: enable
networkNamespace: default
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovIBNetwork
metadata:
name: ibp79s0
namespace: network-operator
spec:
ipam: |
{
"type": "whereabouts",
"datastore": "kubernetes",
"kubernetes": {
"kubeconfig": "/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig"
},
"range": "192.168.3.0/24",
"log_file": "/var/log/whereabouts.log",
"log_level": "info"
}
resourceName: resibp79s0
linkState: enable
networkNamespace: default
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovIBNetwork
metadata:
name: ibp94s0
namespace: network-operator
spec:
ipam: |
{
"type": "whereabouts",
"datastore": "kubernetes",
"kubernetes": {
"kubeconfig": "/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig"
},
"range": "192.168.4.0/24",
"log_file": "/var/log/whereabouts.log",
"log_level": "info"
}
resourceName: resibp94s0
linkState: enable
networkNamespace: default
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovIBNetwork
metadata:
name: ibp154s0
namespace: network-operator
spec:
ipam: |
{
"type": "whereabouts",
"datastore": "kubernetes",
"kubernetes": {
"kubeconfig": "/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig"
},
"range": "192.168.5.0/24",
"log_file": "/var/log/whereabouts.log",
"log_level": "info"
}
resourceName: resibp154s0
linkState: enable
networkNamespace: default
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovIBNetwork
metadata:
name: ibp192s0
namespace: network-operator
spec:
ipam: |
{
"type": "whereabouts",
"datastore": "kubernetes",
"kubernetes": {
"kubeconfig": "/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig"
},
"range": "192.168.6.0/24",
"log_file": "/var/log/whereabouts.log",
"log_level": "info"
}
resourceName: resibp192s0
linkState: enable
networkNamespace: default
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovIBNetwork
metadata:
name: ibp206s0
namespace: network-operator
spec:
ipam: |
{
"type": "whereabouts",
"datastore": "kubernetes",
"kubernetes": {
"kubeconfig": "/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig"
},
"range": "192.168.7.0/24",
"log_file": "/var/log/whereabouts.log",
"log_level": "info"
}
resourceName: resibp206s0
linkState: enable
networkNamespace: default
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovIBNetwork
metadata:
name: ibp220s0
namespace: network-operator
spec:
ipam: |
{
"type": "whereabouts",
"datastore": "kubernetes",
"kubernetes": {
"kubeconfig": "/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig"
},
"range": "192.168.8.0/24",
"log_file": "/var/log/whereabouts.log",
"log_level": "info"
}
resourceName: resibp220s0
linkState: enable
networkNamespace: default
Apply the CRD using
kubectl create –f sriov-ib-network.yaml
Restart the services to apply the changes:
pdsh -g category=k8s-control-plane service containerd restart
pdsh -g category=k8s-control-plane service kubelet restart