Confirm Components' Firmware and Software Versions
This chapter will cover how to read firmware and software version for the following:
Switch ASICs
Transceivers
HCA cards
The recommended guideline is to confirm that the versions among the cluster are aligned, or differ with up to 2 versions.
Information of the recommended NDR cluster bundle can be found here.
The process can be done using UFM GUI ( which is recommended), or through MOFED commands.
ASICs and HCAs FW version
From the left side main menu, click on Managed Elements, and then on Devices.
The Devices page opens and displays a table with all the managed switches/hosts in the cluster.
For switch ASIC, the FW version is listed in the main table.
For node HCA, select its row, Device Information section should pop up from the right side of the window, containing information about the selected device. If this section does not pop up, you should be able to open it by clicking on the left arrow on the top-right side of the table.
Click on the HCAs tab to see the device HCAs and the FW versions.
For HCAs only, click on HCAs from the left side main menu. All connected HCAs are listed there with the FW versions.
Managed switch SW (NOS) version
Click on Network Map from the left side main menu. The visualization of the cluster should display.
Select a switch. The switch information and the SW Version (NOS) should appear in the table on the left side.
Transceivers
From the Devices page, select a switch, and from the Device Information table on the right, click on Cables tab.
The page displays a table with the connected cables and the FW versions.
Alternatively, go to Cables page from the left side main menu, which displays information on all the connected cables at once.
Prerequisite
Make sure you have the latest MFT installed. If not, install it either as part of MLNX_OFED installation process or according to the instructions found here.
Before using it, start the MST driver, run mst start
This command will create files that represent NVIDIA devices in directory /dev/mst
For the relevant devices, run "mst status"
For further information, see the mst Service section in the MFT User Manual.
Identify the Switch Firmware Version
This section is applicable only to externally managed (unmanaged) switches (the ASIC firmware is bundled in NOS in managed systems).
Access the unmanaged switches via its LID.
Identify the switch LID, run ibswitches.
root
@ufmx
-qnt-02
: # ibswitches Switch0x900a8403006
f f780 ports65
"MF0 ;grla -quanta -01:MQM9700/U l"
enhanced port0
lid1
lmc0
Switch0x900a8403006
f e0c0 ports65
"MF0 ;grla -quanta -s2:MQM9700/U l"
enhanced port0
lid5
lmc0
Switch0x900a8403006
f f8c0 ports65
"MF0 ;grla -quanta -s1:MQM9700/U l"
enhanced port0
lid14
lmc0
Switch0x900a8403006
f e040 ports65
"MF0 ;grla -quanta -02:MQM9700/U l"
enhanced port0
ltd15
lmc0
Check the firmware version, run flint -d lid-X -qq q.
root
@ufmx
-qnt-02
: # flint -d lid-1
-qq q Image type: FS4 FW Version:31.2012
.3008
FW Release Date:3.1
.2024
Product Version:31.2012
.3008
Rom Info: type=UEFI version=skipped cpu=skipped type=PXE version=skipped devid=skipped type=NVMe version=skipped devid=skipped Description: UID GuidsNumber Base GUID: 900a8403006ff78064
Base MAC: 900a846ff78064
Image VSD: N/A Device VSD: N/A PSID: MT0000000577
Security Attributes: secure-fw
Identify the Switch Version
Connect to your switch remotely with SSH: #ssh admin@my-switch-name(e.g. ssh admin@172.28.3.216)
Enter config mode.
switch
> enableswitch
# configure terminalswitch
(config)#Check the NOS' version.
switch
(config)# show version Product name: MLNX-OS Product release:3.4
.2002
Build ID: #1
-dev Build date:2015
-07
-30
20
:13
:19
Target arch: x86_64 Target hw: x86_64 Built by: jenkins@fit74
_Version summary: X86_643.4
.2002
2015
-07
-30
20
:13
:19
x86_64
Identify the HCA Firmware Version
Identify the HCA device, run mst status.
[root
@fit229
~]# mst status MST modules: ------------ MST PCI module is not loaded MST PCI configuration module loaded MST devices: ------------ /dev/mst/mt4129_pciconf0 - PCI configuration cycles access. domain:bus:dev.fn=0000
:04
:00.0
addr.reg=88
data.reg=92
cr_bar.gw_offset=-1
Chip revision is:00
Check the firmware version.
[root
@fit229
~]# flint -d /dev/mst/mt4129_pciconf0 -qq q Image type: FS4 FW Version:28.98
.2400
FW Release Date:14.2
.2022
Product Version:28.98
.2400
Rom Info: type=UEFI version=14.25
.21
cpu=AMD64,AARCH64 type=PXE version=3.6
.502
cpu=AMD64 Description: UID GuidsNumber Base GUID: 1070fd0300d846444
Base MAC: 1070fdd846444
Image VSD: N/A Device VSD: N/A PSID: MT_0000000798 Security Attributes: N/AFor further details, see https://docs.nvidia.com/networking/display/mftv4270/Querying+the+Firmware+Image.
Identify the Transceiver Firmware Version
To check what is the transceiver firmware version, run flint -d lid-1 --linkx --downstream_device_ids 1 q.
[admin@gorilla
-169
~]# flint -d lid-1
--linkx --downstream_device_ids 1
q
Host : lid-1
Device index 1
Component Index 3
Component Status NOT_PRESENT
Component Update State IDLE
Running state is : Image A is running
Information block is : FW image A is present
FW A Version : 46.130
.0023
FW B Version : 00.00
.0000
FW Factory Version : 00.00
.0000
SupportedProtocol: CMIS 4.0
is implemented
Activation type: Self-activation with HW reset contained in the Run FW Image command. No additional actions required from the host.
Serial number is 0
Identify the Driver Version
Make sure all the servers are using the latest driver version, run - ofed_info -s.
~ $ofed_info -s
MLNX_OFED_LINUX-23.04
-0.5
.3.3