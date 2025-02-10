Note This capability is supported only in NVIDIA Quantum switch systems and hosts with NVIDIA ConnectX-6 adapter cards.

The In-Field-Firmware-Update (IFFU) tool works via the switches/NICs in the datacenters and is intended for remote control. The tool is used to update cables transceivers' firmware.

Optical Cables and Transceivers are active network components which run firmware, and as any component running firmware, the ability to update firmware is mandatory. Transceiver firmware update is a system flow which requires the following elements:

Tool/Manager which will perform the firmware update

Switch/NIC firmware management used as a middleman between the Manager and the cable transceiver

Transceiver firmware: target for upgrade

The figure below shows the tool/manager which runs on a remotely controlled host or (in case of managed switches) on a switch, shown as ‘Device’.

The manager can query the transceivers type and the current running firmware to understand if an update is required. When an update is required, the manager can apply set of commands that will send the remote host device a new firmware images for the specific transceiver(s) and activate a firmware update flow. The set of commands is defined with low level primitives to support full flexibility for the user. High level script can be applied on top of the manager and allow system wide update.

Note The update of modules/AOCs connected to switches is done over InfiniBand (inband) PRM registries. Whereas, the update of modules connected to NICs is done over MCC (RegAccess) on the host. Inband connection implies that unmanaged switches like QM8790 support IFFU. Each device (NIC, Switch) can update only the modules connected directly to it, not the far end. Updating the far end transceiver/end of the AOC requires the same operation to be done at the far end switch(es).

The Tool/Manager host must have MST rev. 4.16.00 or later installed.

Remote control from outside the cluster (data center) requires access to the host being used as Tool/Manager. When the cluster has many switches, multiple hosts may be engaged in the upgrade process. The host(s) can be remotely controlled via VNC access.

The IFFU function described below works on one switch. Cluster-wide firmware updating is done by use of a script which initiates the update procedure in multiple switches in parallel by initiating an instance of the flint command for each switch. In large clusters the script can be executed on multiple hosts, each handling a different part of the cluster.

Copy Copied! # flint -d <device> --linkx <flags> <commands>

where:

Flags:

<device> The name of the target switch (one only). --downstream_device_id_start_index <downstream_device_id_start_index> The port number of the first LinkX cable/transceiver. (min. port number = 1) --num_of_downstream_devices <num_of_downstream_devices> the number of cables/transceivers to burn. They are burnt sequentially. --linkx_auto_update Use this flag to burn all supported cables/transceivers connected to the switch. --download_transfer Use this flag to perform download and transfer of all cable data for cables. Download and transfer are not performed by default. This flag is only relevant for cable components. --activate Use this flag to apply the activation of the new firmware in the updated devices. Activation is not performed by default. --activate_delay_sec <timeout in seconds> Use this flag to activate all cable devices connected to host with delay, acceptable values are between 0 and 255 (default - 1, immediately). Important: 'activate' flag must be set. This flag is relevant only for cable components. --i <image> ‘i’ indicates ‘binary Image’ followed by the path and file name of the bin file to download into the cable/transceiver. --downstream_device_ids <list of ports> Use this flag to specify the LNKX ports to perform query. List must be only comma-separated numbers, without spaces

Commands:

b[urn] Burn flash q[uery] Query misc. flash/firmware characteristics.

Burning a firmware cable transceiver connected to the host (NIC or switch) is done using the "flint" tool. To do so, the user should use the "–linkx" flag.

Firmware can be burnt in follow one of the methods:

Burn with Auto-update:

Transfer the data from the host. Copy Copied! # flint -d <device> --linkx --linkx_auto_update --download_transfer -i <image> b Example: Copy Copied! # flint -d lid- 2 --linkx --linkx_auto_update --download_transfer -i image.bin b Activate the firmware. Copy Copied! # flint -d <device> --linkx --linkx_auto_update –-activate b Note The flint "--activate" flag behavior is changed to include a minimal delay of 1 second to avoid disconnections if the connected port is being activated. To use the "legacy" activation flow, use the "--activate_delay_sec 0" command. Example: Copy Copied! # flint -d lid- 2 --linkx --linkx_auto_update --activate b Activate with delay Example: Copy Copied! # flint -d lid- 2 --linkx --linkx_auto_update --activate --activate_delay_sec 10 b Transfer and Activate Example: Copy Copied! # flint -d lid- 2 --linkx --linkx_auto_update --download_transfer --activate -i image.bin b Warning Burning all cables in an unmanaged switch in one operation is risky. If the cables do not link up after the update, you lose connection to the switch – permanently. Burn half of the cables, check that they come up after burning, then burn the other half.

Burning multiple cables in the switch using the 'Range':

Transfer the data from the host. Copy Copied! # flint -d <device> --linkx --downstream_device_id_start_index <index> --num_of_downstream_devices <number> --download_transfer -i <image> b Activate the firmware. Copy Copied! # flint -d <device> --linkx --downstream_device_id_start_index <index> --num_of_downstream_devices <number> --activate b Example of Download Transfer with Activation, range indices is 10 to 16: Copy Copied! # flint -d lid- 2 --linkx --downstream_device_id_start_index 10 --num_of_downstream_devices 6 download_transfer --activate -i image.bin b This will update 6 AOCs/Transceivers starting from port 10, i.e. all ports in the range 10…15. Note You cannot ‘overburn’ the same firmware version into a transceiver/AOC as the one already installed. This is to prevent wasting time re-burning transceivers in a large cluster. If you try to burn the existing FW version, the command responds: Cable burn failed, error is LinkX downstream transfer failed for device index i Example of successful update of 1 AOC: Copy Copied! -I- Downloading FW ... FSMST_INITIALIZE - OK Writing COMPID_LINKX component - OK FSMST_LOCKED - OK FSMST_DOWNSTREAM_DEVICE_TRANSFER - OK FSMST_LOCKED - OK Please wait while activating the transceiver(s) FW ... FSMST_ACTIVATE - OK..] -I- Cable burn finished successfully. Note Downloading and burning takes approx. 1½ minute + activation ½ minute for one cable. The time for multiple cables depends on which ports they are plugged into.

Cable Burn Command Running CMIS Firmware Upgrade Flow for Supported Cables

The flint tool is able to burn firmware packages on CMIS compliant cables that support the CDB firmware update procedure.

Copy Copied! # flint --device <mst cable device> [--image <image>] [flags] burn

Where:

Where:

--module_password Optional, module password to enable locked operations. --module_vendor_data Optional, path to vendor data file in case it is not a part of the firmware image file. --activate Optional, run and commit the burned image. Use without the “--image" flag to try to perform run and commit commands if possible.

Querying a cable image for firmware version is done using the "flint" tool.

Copy Copied! # flint -i <fw file> q





Querying a firmware cable transceiver is done using the "flint" tool.

Note In case the Vendor Specific query command is not support by the firmware, it will run the CMIS standard query implemented by the firmware.

Copy Copied! # flint -d <cable device> q





Querying a firmware cable transceiver connected to the host (NIC or switch) is done using the "flint" tool. To do so, the user should use the "–linkx" flag.

Copy Copied! # flint -d <device> --linkx --downstream_device_ids <ids> [--output_file <file_name>] q

Query ports 1,2,5 Example:

Copy Copied! # flint -d <device> --linkx --downstream_device_ids 1 , 2 , 5 q

The system responds with information about the firmware version loaded into the transceivers.

The firmware version of all cables plugged into ports 1…40 of a switch with lid #nn can alternatively be checked with the mlxlink command:

Copy Copied! # for i in { 1 .. 40 }; do echo $i; mlxlink -d lid-nn -p $i -m | grep 'Part\|FW' ; done

Checking successful burning and operation - Example:

It is essential to check that the links come up AFTER the cable FW is updated and reactivated. This can be done as follows:

Copy Copied! # for i in { 36 .. 40 }; do echo $i; mlxlink -d lid-nn -p $i -m | grep 'Part\|FW\|State' ; done

The ‘State’ parameter was added to the query. The response has the following format (example):

Collapse Source Copy Copied! # 36 State : Active Vendor Part Number : MFS1S00-H010 FW Version : 38.100 . 59 37 State : Active Vendor Part Number : MFS1S00-H010 FW Version : 38.100 . 59 38 State : Active Vendor Part Number : MFS1S00-H010 FW Version : 38.100 . 59 39 State : Active Vendor Part Number : MFS1S00-H010 FW Version : 38.100 . 59 40 State : Active Vendor Part Number : MFS1S00-H010 FW Version : 38.100 . 59



