[TECH PREVIEW] NVIDIA Spectrum-X NIC Configuration
NVIDIA NIC Configuration Operator offers NVIDIA Spectrum-X-specific NIC configuration for different versions of the Reference Architecture.
Currently, only ConnectX-8 (device ID 1023) and BlueField-3 SuperNIC (device ID a2dc) devices are supported for this configuration.
Tech Preview feature.
To install the operator and for more information about the CRDs follow the NIC FW Configuration and Configuration Details doc articles.
To enable the DOCA SPC-X CC algorithm on NIC devices, the DOCA SPC-X CC .deb package for ubuntu 22.04 is required. This configuration step will be removed in the future, once the DOCA SPC-X CC algorithm will be publicly available. To access the package, contact your NVIDIA CPM. The package should be made available in the cluster and then its URL should be provided in the packageUrlSource field of the SpectrumXOperator CR.
apiVersion: configuration.net.nvidia.com/v1alpha1
kind: NicFirmwareSource
metadata:
name: spectrum-x-configuration
namespace: nvidia-network-operator
spec:
# should point to the URL of the DOCA SPC-X CC .deb package for Ubuntu 22.04
docaSpcXCCUrlSource: "https://example.com/doca-spcx-cc_3.1.0105-1_amd64.deb"
If firmware on the devices also needs to be updated, extend the NicFirmwareSource CR with fields for ConnectX and BlueField firmware. Please, use the correct firmware for your devices.
apiVersion: configuration.net.nvidia.com/v1alpha1
kind: NicFirmwareSource
metadata:
name: spectrum-x-configuration
namespace: nvidia-network-operator
spec:
# should point to the URL of the DOCA SPC-X CC .deb package for Ubuntu 22.04
docaSpcXCCUrlSource: "https://example.com/doca-spcx-cc_3.1.0105-1_amd64.deb"
# a list of firmware binaries zip archives from the Mellanox website, can point to any URL accessible from the cluster
binUrlSources:
- https://www.mellanox.com/downloads/firmware/fw-ConnectX8-rel-40_46_3048-900-9X85E-00NX-MC0_Ax-UEFI-14.39.14-FlexBoot-3.8.100.signed.bin.zip
# a URL to the BlueField Bundle (BFB) file, can point to any URL accessible from the cluster
bfbUrlSource:
- https://example.com/bf-fwbundle-3.1.0-77_25.07-prod.bfb
Configure and apply the NicFirmwareTemplate CR:
apiVersion: configuration.net.nvidia.com/v1alpha1
kind: NicFirmwareTemplate
metadata:
name: spectrum-x-configuration
namespace: nvidia-network-operator
spec:
nicSelector:
nicType: "a2dc" # BlueField-3 SuperNIC, Can also be "1023" for ConnectX-8
template:
nicFirmwareSourceRef: spectrum-x-configuration
updatePolicy: Update
apiVersion: configuration.net.nvidia.com/v1alpha1
kind: NicConfigurationTemplate
metadata:
name: spectrum-x-configuration
namespace: nvidia-network-operator
spec:
nodeSelector:
feature.node.kubernetes.io/network-sriov.capable: "true"
nicSelector:
nicType: a2dc # BlueField-3 SuperNIC, Can also be "1023" for ConnectX-8
template:
numVfs: 1
linkType: Ethernet
spectrumXOptimized:
enabled: true
version: "RA2.0" # For Reference Architecture v1.3, use "RA1.3" value for this field.
overlay: "none" # For L3 overlay, use "l3" value for this field.
Configuration details
Following configuration parameters are applied with spectrumXOptimized.enabled == true and spectrumXOptimized.version == “RA2.0”:
- name: NIC mode
value: NIC
dmsPath: /nvidia/mode/config/mode
valueType: string
deviceId: "a2dc"
- name: RoCE Adaptive Routing
value: true
dmsPath: /nvidia/roce/config/adaptive-routing
valueType: bool
- name: Programmable Congestion Control
value: true
dmsPath: /nvidia/cc/config/user-programmable
valueType: bool
- name: RoCE TX Scheduling Locality Mode
value: TX_SCHED_LOCALITY_ACCUMULATIVE
dmsPath: /nvidia/roce/config/tx-sched-locality-mode
valueType: string
- name: RoCE Multipath DSCP
value: MULTIPATH_DSCP_DEFAULT
dmsPath: /nvidia/roce/config/multipath-dscp
valueType: string
- name: CNP DSCP
value: 0
dmsPath: /interfaces/interface/nvidia/roce/config/rtt-resp-dscp
valueType: int
- name: CNP DSCP mode
value: RTT_RESP_DSCP_DEFAULT
dmsPath: /interfaces/interface/nvidia/roce/config/rtt-resp-dscp-mode
valueType: string
- name: RoCE CC Steering Ext
value: ENABLED
dmsPath: /nvidia/roce/config/cc-steering-ext
valueType: string
runtimeConfig:
roce:
- name: Trust
value: dscp
dmsPath: /interfaces/interface/nvidia/qos/config/trust-mode
valueType: string
alternativeValue: QOS_TRUST_MODE_DSCP
- name: PFC
value: "00010000"
dmsPath: /interfaces/interface/nvidia/qos/config/pfc
valueType: string
- name: Type of Service
value: 96
dmsPath: /interfaces/interface/nvidia/roce/config/tos
valueType: int
adaptiveRouting:
- name: Adaptive Retransmission
value: true
dmsPath: /interfaces/interface/nvidia/roce/config/adaptive-retransmission
valueType: bool
- name: Tx Window
value: true
dmsPath: /interfaces/interface/nvidia/roce/config/tx-window
valueType: bool
- name: Slow Restart
value: false
dmsPath: /interfaces/interface/nvidia/roce/config/slow-restart
valueType: bool
- name: Slow Restart Idle
value: false
dmsPath: /interfaces/interface/nvidia/roce/config/slow-restart-idle
valueType: bool
- name: Adaptive Routing Force
value: true
dmsPath: /interfaces/interface/nvidia/roce/config/adaptive-routing-force
valueType: bool
congestionControl:
- name: Congestion Control on RP points
value: true
dmsPath: /interfaces/interface/nvidia/cc/config/priority/rp_enabled # priority[id=0..7]
valueType: bool
alternativeValue: "1"
- name: Congestion Control on NP points
value: true
dmsPath: /interfaces/interface/nvidia/cc/config/priority/np_enabled # priority[id=0..7]
valueType: bool
alternativeValue: "1"
- name: Congestion Control
value: true
dmsPath: /interfaces/interface/nvidia/cc/slot[id=0]/config/enabled
valueType: bool
- name: Congestion Control with Counters
value: true
dmsPath: /interfaces/interface/nvidia/cc/slot[id=0]/config/counter_enable
valueType: bool
- name: DCQCN
value: false
dmsPath: /interfaces/interface/nvidia/cc/slot[id=15]/config/enabled
valueType: bool
- name: Bandwidth
value: 400
dmsPath: /interfaces/interface/nvidia/cc/slot[id=0]/param[id=0]/config/value
valueType: int
- name: Responsiveness Alpha Factor
value: 6553
dmsPath: /interfaces/interface/nvidia/cc/slot[id=0]/param[id=1]/config/value
valueType: int
- name: Maximum Decrease Factor
value: 63570
dmsPath: /interfaces/interface/nvidia/cc/slot[id=0]/param[id=2]/config/value
valueType: int
- name: Maximum Increase Factor
value: 69468
dmsPath: /interfaces/interface/nvidia/cc/slot[id=0]/param[id=3]/config/value
valueType: int
- name: Additive Increase Step Size
value: 36
dmsPath: /interfaces/interface/nvidia/cc/slot[id=0]/param[id=4]/config/value
valueType: int
- name: High Additive Increase Step Size
value: 1200
dmsPath: /interfaces/interface/nvidia/cc/slot[id=0]/param[id=5]/config/value
valueType: int
- name: High Additive Increase Interval Period
value: 7000000
dmsPath: /interfaces/interface/nvidia/cc/slot[id=0]/param[id=6]/config/value
valueType: int
- name: Base Round Trip Time
value: 15000
dmsPath: /interfaces/interface/nvidia/cc/slot[id=0]/param[id=7]/config/value
valueType: int
- name: Maximum Queuing Delay
value: 250000
dmsPath: /interfaces/interface/nvidia/cc/slot[id=0]/param[id=8]/config/value
valueType: int
- name: Rate on First Congestion
value: 524288
dmsPath: /interfaces/interface/nvidia/cc/slot[id=0]/param[id=9]/config/value
valueType: int
- name: Delay Only
value: 0
dmsPath: /interfaces/interface/nvidia/cc/slot[id=0]/param[id=10]/config/value
valueType: int
- name: CNP Validity
value: 1
dmsPath: /interfaces/interface/nvidia/cc/slot[id=0]/param[id=11]/config/value
valueType: int
- name: Transmit Rate Decrement Step
value: 0
dmsPath: /interfaces/interface/nvidia/cc/slot[id=0]/param[id=12]/config/value
valueType: int
- name: Fixed Transmission Rate
value: 0
dmsPath: /interfaces/interface/nvidia/cc/slot[id=0]/param[id=13]/config/value
valueType: int
- name: Fast Scheduling Factor
value: 2097152
dmsPath: /interfaces/interface/nvidia/cc/slot[id=0]/param[id=14]/config/value
valueType: int
- name: Topology Awareness
value: 1
dmsPath: /interfaces/interface/nvidia/cc/slot[id=0]/param[id=15]/config/value
valueType: int
- name: Advanced Features
value: 1
dmsPath: /interfaces/interface/nvidia/cc/slot[id=0]/param[id=16]/config/value
valueType: int
- name: Troubleshooting Capabilities
value: 0
dmsPath: /interfaces/interface/nvidia/cc/slot[id=0]/param[id=17]/config/value
valueType: int
interPacketGap:
pureL3:
name: Inter Packet Gap for no overlay
value: 25
dmsPath: /interfaces/interface/ethernet/nvidia/config/inter-packet-gap
valueType: int
l3EVPN:
name: Inter Packet Gap for L3 EVPN overlay
value: 33
dmsPath: /interfaces/interface/ethernet/nvidia/config/inter-packet-gap
valueType: int
docaCCVersion: 3.1.0105-1
useSoftwareCCAlgorithm: true