Troubleshooting

You may be able to easily resolve the issues described in this section. If a problem persists and you are unable to resolve it yourself please contact your NVIDIA representative or Support.

Issue

Cause

Solution

Adapter is no longer identified by the operating system after firmware upgrade

Happens due to burning the wrong firmware on the adapter, firmware corruption or adapter's hardware failure.

Power cycle the server. If the issue persists, extract the adapter and contact Support

Server is booting in loop/not completing boot after performing adapter firmware upgrade

Happens due to burning the wrong firmware on the adapter, firmware corruption or adapter's hardware failure.

Extract the adapter and contact Support

Some of the 5th generation (Group II) devices are represented with only one mst device (dev/mst/mt4113_pciconfx) in the output of mst status

For 5th generation (Group II) devices, there is only one method available for accessing the hardware. For example, Connect-IB device is represented bydev/mst/mt4113_pciconfx mst device

When querying a 5th generation (Group II) device, use the conf mst device (for example: dev/mst/mt4113_pciconfx)

Enabling hardware access after configuring new secure host key, fails

The new configuration of the secure host key was not loaded by the driver

Restart the driver before enabling the hardware access again

MFT tools fail on PCI device with the following errors:

  • Operation not permitted

  • Failed to identify device

  • Failed to detect device ID

  • Unknown device

  • No such device

  • Failed to open device

Tools PCI semaphore might be locked due to unexpected process shutdown.

Run the following command:

# mcra -c <mst_pci_device>

*Supported on MFT-4.4.0 and newer versions.

Issue

Cause

Solution

Server not booting after enabling SRIOV with high number of VFs

Setting number of VFs larger than what the Hardware and Software can support may cause the system to cease working

To solve this issue:

  1. Disable SRIOV in bios

  2. Reboot server

  3. Change num of VFs

  4. Enable SRIOV in bios

When Querying for current configuration on ConnectX-3/ ConnectX-3Pro, some of the parameters are shown as “N/A”

The current firmware on the device does not support showing the device's default configuration

Update to the latest firmware

After resetting configuration using the tool on 5th generation (Group II) devices, the configuration's value does not change

Firmware loads the default configuration only upon reboot

Reboot the server

Issue

Cause

Solution

Unable to install the tool package on ESXi platform and the following message is printed on the screen:

Got no data from process

Insufficient privileges

  1. Copy the tool's package to /tmp/vmware and continue with the installation. If the issue persists, reboot the ESX server and try again

  2. Use full file path of the tool's package

Note: an additional reboot will be required after completing the installation

Unable to install kernel-mft in Linux due to compilation error that contains the following message:

'error: conflicting types for 'compat_sigset_t''

CONFIG_COMPAT might not be enabled in the kernel configuration.

Set the CONFIG_COMPAT to “y” in the kernel .config file, and rebuild the kernel.

Issue

Cause

Solution

The following message is printed on screen when performing firmware update:

An update is needed for the flash layout.

The operation is not failsafe and terminating the process is not allowed.

A flash alignment operation is required.

Approve the alignment, avoid process interrupt.

Firmware update fails with the following message:

-E- Burning FS4 image failed: Bad parameter

Note: This is a rare scenario.

Firmware compatibility issue.

Re-run the burn command with --no_fw_ctrl flag.

The following message is printed on screen when performing firmware update:

Shifting between different image partition sizes requires current image to be re-programmed on the flash.

Once the operation is done, reload FW and run the command again

Note: This is a rare scenario.

Firmware compatibility issue.

Re-load firmware and re-run the burn command.

The following message is printed on screen when trying to query/burn a Connect-IB device:

-E- Cannot open Device: /dev/mst/mt4113_pciconf0. B14 Operation not permitted MFE_CMDIF_GO_BIT_BUSY

Using an outdated firmware version with the Connect-IB adapter.

  1. Unload MLNX_OFED driver: /etc/init.d/openibd stop.

  2. Add “-ocr” option to the 'flint' command.

For example:

flint -d /dev/mst/mt4113_pciconf0 -ocr q

The following message is reported on screen when trying to remove the expansion ROM using the 'drom' option:

-E- Remove ROM failed: The device FW contains common FW/ROM Product Version - The ROM cannot be removed separately.B9

Updating only the EXP_ROM (FlexBoot) for recent firmware images which requires adding the 'allow_rom_change' option.

Allow “-allow_rom_change” option to the “flint” command.

For example:

flint -d <mst_device> - allow_rom_change drom

Burning command fails and the following message is printed on screen:

-E- Can not open 06:00.0: Can not obtain Flash semaphore (63). You can run "flint -clear_semaphore

- d <device>" to force semaphore unlock. See help for details.

Semaphore can be locked for any of the following reasons:

  • Another process is burning the firmware at the same time

  • Failure in the firmware boot

  • Burning process was force- fully killed

  • In a Multi-Host environment, another Host is cur- rently burning the firmware

If no other process is taking place at the same time run the following command: flint -d <device> -- clear_semaphore

OR

Reboot the machine.

Burning tool fails with the following message:

–E– Unsupported binary version (2.0) please update to latest MFT package.

The binary version is incompatible with the burning tool.

Update MFT to the latest package.

mlxburn tool fails to generate a firmware image and displays the following message:

–E– Unsupported MLX file version (2.0) please update to latest MFT package.

The MLX file version is incompatible with the image generation tool (mlxburn).

Update MFT to the latest package.

mlxburn tool fails to generate a firmware image and displays the following message

-E- Perl Error: Image generation tool uses mic (tool) version 1.5.0 that is not supported for creating a bin file for this FW version. FW requires mic version 2.0.0 or above. Please update MFT package.

The MLX file version is incompatible with the image generation tool (mlxburn).

Update MFT to the latest package.

Burning tool fails with an error mentioning Firmware time stamping e.g

-E- Burning FS3 image failed: Stamped FW version missmatch: 12.16.0212 differs from 12.16.0230

The device was set with a timestamp for a different firmware version than the one being burnt or the image is stamped with an older timestamp

Either set a newer timestamp on the image than there is on the device, or reset the timestamp completely. flint -d <device> ts reset flint -i <image> ts reset

Burning the image on Controlled FW (default update method: fw_ctrl in 'flint -d <device> query full' output), fails with:

-E- Burning FS3 image failed: The Digest in the signature is wrong.

The image was changed without calculating the new digest on it with 'flint -i <img.bin> sign'.

Run 'flint -i <img.bin> sign', and retry.

Issue

Cause

Solution

Changing device setting such as ROM/ GUIDS using the relevant flint commands result in failure with the following error:

-E- <Operation> failed: Unsupported operation under Secure FW

Secure Firmware does not allow changes to the device data unless burning new Secure Firmware image.

N/A

Burning tool fails with the following error:

-E- Burning FS3 image failed: The component is not signed.

The image is not signed with an RSA authentication.

Contact Support to receive a signed firmware image.

Burning tool fails with the following error:

-E- Burning FS3 image failed: Rejected authentication.

The image authentication is rejected.

Contact Support to receive a signed firmware image.

Burning tool fails with the following error:

-E- Burning FS3 image failed: Component is not applicable.

The image does not match the device (Wrong ID).

Contact Support to receive the firmware image for the device.

Burning tool fails with the following error:

-E- Burning FS3 image failed: The FW image is not secured.

The image is not secured and is not accepted by the device.

Contact Support to receive a signed firmware image.

Burning tool fails with the following error:

-E- Burning FS3 image failed: There is no Debug Token installed.

The debug firmware was burnt before the debug token was installed on the device.

Install the debug token using mlxconfig and then re-burn the firmware.

Burning firmware on a secure device fails with one of the following messages:

  • -E- Burning FS3 image failed: Rejected authentication

  • The FW image is not secured

  • The key is not applicable

The image was not secured in a the proper way.

Ask for a secure image with the right keys that match the device.

Secure Firmware fails when using flint brom and drom commands.

flint brom and drom commands are not supported.

N/A

mlxdump and wqdump debug utilities do not work in Secure Firmware

A customer support token was not applied.

N/A

When the CR space is in read only mode, the tracers may demonstrate an unexpected behavior.

A writing permission is required for them to work properly.

N/A

Applying token on the device fails with one of the following messages:

  • Component is not applicable

  • The manufacturing base MAC was not listed

  • Mismatch FW version

  • Mismatch user timestamp

  • Rejected forbidden version

The token was not generated or signed in the proper way.

Refer to the section Create Tokens for Secure Firmware and NV LifeCycle to learn how to generate and sign tokens.

Burning the firmware using the “--use_dev_rom” flag has no effect and the ROM is replaced with the one on the image.

Controlled firmware does not support changing boot image component.

Use “--no_fw_ctrl”.

© Copyright 2023, NVIDIA. Last updated on Oct 12, 2023.