mstfwreset – Loading Firmware on 5th Generation Devices Tool

NVIDIA Firmware Tools (MFT) Documentation v4.26.1 LTS

mstfwreset tool enables the user to load updated firmware on a NIC/switch without having to reboot the machine. mstfwreset supports 5th Generation (Group II) HCAs and allows a smooth firmware upgrade.

  • Access to device through BDF format

  • Firmware supporting ISFU

    • Connect-IB: v10.10.3000 or above

    • ConnectX-4: v12.0100.0000 or above

    • ConnectX-4 Lx: v14.0100.0000 or above

  • Device's firmware updated with latest mstflint burning tools (mstflint)

  • Supported devices: Connect-IB / ConnectX-4 / ConnectX-4 Lx / ConnectX-5 / BlueField / ConnectX-6

  • Supported OSs: FreeBSD, Linux

Copy
Copied!
            

mstfwreset -d <device> query

Copy
Copied!
            

mstfwreset -d <device> reset-[y] [--level <0,3,4>] [--type <0..2>] [--sync <0,1>] [-s] [-m]

where:

-d|--device <device>

Device to work with

-

-l|--level <0..5>

Run reset with the specified reset-level

N/A for switch devices.

-t|--type <0,1>

Run reset with the specified reset-type

N/A for switch devices.

-m|--mst_flags MST_FLAGS

Provide mst flags to be used when invoking mst restart step. For example: --mst_flags="--with_fpga"

N/A for switch devices.

-y|--yes

Answer “yes” on prompt

-

--skip_driver| -s

Skip driver start/stop stage (driver must be stopped manually)

N/A for switch devices.

-v|--version

Print tool version

-

-h|--help

Show help message and exit

-

--skip_fsm_sync

Skip fsm syncing

-

q|query

Query for reset level required to load new firmware

N/A for switch devices.

r|reset

Execute reset Level

-

reset_fsm_register

Reset the fsm sync register to idle state

-

--sync

Run reset with the specified reset-sync

N/A for switch devices.

Reset levels and types depend on the extent of the changes introduced when updating the device's firmware. The tool will display the supported reset levels and types that will ensure the loading of the new firmware. Those reset levels and types are:

  • Reset-levels:

    • 0: Driver, PCI link, network link will remain up ("live-Patch")

    • 3: Driver restart and PCI reset

    • 4: Warm Reboot

    • 5: Cold Reboot

  • Reset-types (relevant only for reset-levels 3,4):

    • 0: Full chip reset

    • 1: Phy-less reset ("port-alive" - network link will remain up) – Not Suppported

Warning

Exact reset level and types needed to load new firmware may differ, as it depends on the difference between the running firmware and the firmware we are upgrading to.

mstfwreset supports a Multi-Host setup. To reset the firmware for a device in a Multi-Host setup, you have to run the tool on all the hosts simultaneously when in legacy mode. The tool utilizes a synchronization mechanism supported by the firmware in order to synchronize between the different running instances of the tool on the hosts.

For debugging purposes, it is possible to avoid the synchronization by running the tool with the flag --skip_fsm_sync.

Warning

When running mstfwreset on a Multi-Host setup, a time-out of 3 minutes is expected for all the hosts until they join the firmware reset process.

To reset the firmware on a socket-direct NIC, run the tool on all PCI devices related to the same NIC with function 0 simultaneously.

See the following example on a Linux OSs:

Copy
Copied!
            

$ lspci -d 15b3: 08:00.0 Infiniband controller: Mellanox Technologies MT27800 Family [ConnectX-5] 08:00.1 Infiniband controller: Mellanox Technologies MT27800 Family [ConnectX-5] 0e:00.0 Infiniband controller: Mellanox Technologies MT27800 Family [ConnectX-5] 0e:00.1 Infiniband controller: Mellanox Technologies MT27800 Family [ConnectX-5] * All PCI devices above are related to the same NIC   * Run mstfwreset on all the PCI devices with function 0 (08:00.0, 0e:00.0) $ mstfwreset -d 08:00.0 reset -y & $ mstfwreset -d 0e:00.0 reset -y &

To reset the firmware on a SmartNIC, run the tool simultaneously on the host and on the NIC's integrated Arm processor.

Warning

Firmware reset will trigger the adapter card’s reset which will reboot the Arm processor.

Running mstfwreset on a switch device is done in the same form as running mstfwreset on a NIC. The only difference is that there are no level, types or sync parameters.

To query device reset level after firmware update use the following command line:

Copy
Copied!
            

# mstfwreset -d 41:00.0 query

Supported reset levels for loading firmware on device, 41:00.0

Example:

Copy
Copied!
            

Reset-levels: 0: Driver, PCI link, network link will remain up ("live-Patch") -Not Supported 3: Driver restart and PCI reset -Supported (default) 4: Warm Reboot -Supported 5: Cold Reboot -Supported   Reset-types (relevant only for reset-levels 3,4): 0: Full chip reset -Supported (default) 1: Phy-less reset ("port-alive" - network link will remain up) -Not Supported   Reset-sync (relevant only for reset-level 3): 0: Tool is the owner -Supported (default) 1: Driver is the owner -Supported In the new mlxfwreset for BF2 and BF3, sync1 is the default. In order to switch to sync0, provide --sync 0 manually.

To reset device in order to load new firmware, use the following command line:

Copy
Copied!
            

# mstfwreset -d 41:00.0 reset

Example

Copy
Copied!
            

3: Driver restart and PCI reset Continue with reset?[y/N] y -I- Stopping Driver -Done -I- Sending Reset Command To Fw -Done -I- Resetting PCI -Done -I- Starting Driver -Done -I- Restarting MST -Done -I- FW was loaded successfully.

Warning
  • When running the reset command without specifying a reset level the minimal reset level will be performed.

  • When running the reset command without specifying a reset type the default reset type would be 0 (Full chip reset).

To reset a device with a specific reset level to load new firmware, use the following command line:

Copy
Copied!
            

# mstfwreset -d 41:00.0 -l 4 reset

Example

Copy
Copied!
            

Requested reset level for device, 41:00.0: 4: Warm Reboot Continue with reset?[y/N] y -I- Sending reboot command to machine

The following are the limitations of mstfwreset:

  • Executing a reset level that is lower than the minimal level (as shown in query command) will yield an error

  • When burning firmware with mstfwreset at the end of the burn the following message is displayed:
    -I- To load new FW run mstfwreset or reboot machine.
    If this message is not displayed, a reboot is required to load a new firmware.

  • On an old firmware, after a successful reset execution, attempting to query or reset again will yield an error as the load new firmware command was already sent to the firmware.

  • In case mstfwreset exits with error after the “Stopping driver” step and before the “Starting driver” step, the driver will remain down. The user should start the driver manually in this case.

  • The new mstfwreset sync capability (–sync) is available only if supported by the firmware and all the drivers on all the hosts. To check if this is supported, run the "query" command.

  • mstfwreset for switch devices does not work over InfiniBand.

© Copyright 2023, NVIDIA. Last updated on Mar 21, 2024.