NVIDIA ConnectX-5 Adapter Cards Firmware Release Notes v16.34.1002
NVIDIA ConnectX-5 Adapter Cards Firmware Release Notes v16.35.3502 LTS

Bug Fixes History

Warning

This section includes history of bug fixes of 3 major releases back. For older releases history, please refer to the relevant firmware versions Release Notes inhttps://docs.mellanox.com/category/adapterfw.

Internal Ref.

Issue

2785026

Description: Fixed a rare case that caused the QP not to receive a completion.

Keywords: QP

Discovered in Version: 16.32.1010

Fixed in Release: 16.33.1048

2513453

Description: Fixed rare lanes skew issue that caused CPU to timeout in Rec.idle.

Keywords: PCIe

Discovered in Version: 16.32.1010

Fixed in Release: 16.33.1048

2961149

Description: Fixed an issue that caused the card to mask some PCIe AER reporting.

Keywords: AER

Discovered in Version: 16.32.1010

Fixed in Release: 16.33.1048

2860816

Description: Fixed a wrong flow of credits blockage that prevented booting during DC cycle test.

Keywords: DC cycle test

Discovered in Version: 16.32.1010

Fixed in Release: 16.33.1048

2882943

Description: Fixed an issue with BMC medium migration from SMBUS to PCIe, and increased FIFOs to pass large packets in case of the migration.

Keywords: BMC medium migration

Discovered in Version: 16.32.1010

Fixed in Release: 16.33.1048

2860409

Description: Enabled delay drop for hairpin packets. If a hairpin QP is created with delay_drop_en enabled, the feature will be enabled across all GVMIs, based on the delay drop status.

Keywords: Hairpin delay drop

Discovered in Version: 16.32.1010

Fixed in Release: 16.33.1048

Internal Ref.

Issue

2796324

Description: Fixed an issue that resulted in firmware getting stuck and causing unexpected behavior when connecting an optical transceiver that support RXLOS, and the remote side port was down.

Keywords: cables, RXLOS

Discovered in Version: 16.31.1014

Fixed in Release: 16.32.1010

2748800

Description: Fixed an issue that caused the link status to be reported incorrectly and consequently caused the link to go down due to the wrong definition of the RX_LOS polarity in the INI.

Keywords: RX_LOS polarity

Discovered in Version: 16.31.1014

Fixed in Release: 16.32.1010

2843888

Description: Fixed a rare case where the the system got stuck when a peer port went down while using an Optical module.

Keywords: Cables

Discovered in Version: 16.31.1014

Fixed in Release: 16.32.1010

2678394

Description: Limited the external loopback speed to the used module's capabilities.

Keywords: Cables

Discovered in Version: 16.31.1014

Fixed in Release: 16.32.1010

2771990

Description: Improved linkup time when using the fast linkup capability.

Keywords: Linkup time

Discovered in Version: 16.31.1014

Fixed in Release: 16.32.1010

2771407

Description: Disabled VST on dual port adapter cards when one port is configured as ETH and the other as IB as VST is not available when the port is set as ETH.

Keywords: VST

Discovered in Version: 16.31.1014

Fixed in Release: 16.32.1010

2823281

Description: Fixed an issue that resulted in wrong RNR timeout when trying to set it during the rts2rts_qp transition.

Keywords: RNR timeout

Discovered in Version: 16.31.1014

Fixed in Release: 16.32.1010

2826498

Description: Fixed a fatal assert 0x81C5 that occurred when calling get_vport_mad from the MAD APIs.

The firmware was trying to compute the number of vPorts using a global function number. To avoid this issue, we updated the API to remove any assumption on the function number.

Note: This issue is affects only IB devices.

Keywords: MAD APIs

Discovered in Version: 16.31.1014

Fixed in Release: 16.32.1010

2798627

Description: Added support for DSFP AOC (CMIS) v4 when error code is not reported by the module.

Keywords: Cables

Discovered in Version: 16.31.1014

Fixed in Release: 16.32.1010

2751853

Description: Fixed an issue that during events stress caused the firmware to reset the Arm host of the vPort without sending an event.

Thus preventing the software from rearming the vPort as it did not receive any event, and the firmware did not send the event because the vPort had no Arm set.

Keywords: Arm, vPort, event notification

Discovered in Version: 16.31.1014

Fixed in Release: 16.32.1010

2784304

Description: Fixed an issue that prevented the system from creating more than 128K QPs.

Keywords: QP

Discovered in Version: 16.31.1014

Fixed in Release: 16.32.1010

2748449

Description: Altered the GetInventory NC-SI command to not report leading 0xf in firmware version when it starts with 0.

Keywords: NC-SI, GetInventory, leading 0, FW version

Discovered in Version: 16.31.1014

Fixed in Release: 16.32.1010

2684634

Description: Fixed PCIe lane margining capability issues.

Keywords: PCIe lane margining

Discovered in Version: 16.31.1014

Fixed in Release: 16.32.1010

2716208

Description: Fixed an issue related to the sl2vl mad that caused a few msec hiccup in the transmission on an InfiniBand network when the SM sent the sl2vl mad to a node in the cluster.

Keywords: Sl2vl change, traffic, transmission, cluster

Discovered in Version: 16.31.1014

Fixed in Release: 16.32.1010

2450264

Description: Fixed an issue that caused TX PRBS not to change after reconfiguring it. Now all PRBS mode are enabled in test mode.

Keywords: PRBS

Discovered in Version: 16.30.1004

Fixed in Release: 16.31.1014

2603793

Description: Fixed an assert that was caused when trying to open 1024 functions on the device. The maximum number of functions is 1023.

Keywords: Max GVMI, sub-functions

Discovered in Version: 16.30.1004

Fixed in Release: 16.31.1014

2648336

Description: Disabled the CNP counter “rp_cnp_ignored " (triggered by OOS (out-of-sequence)) when all ports are IB.

Note: For mixed IB/ETH scenario, the behavior depends on the RoCE configuration, the counter on the IB port may still increase but will not affect the regular use.

Keywords: CNP counter, IB

Discovered in Version: 16.30.1004

Fixed in Release: 16.31.1014

2667272

Description: Fixed the TMP421 sensor temperature reporting.

Keywords: Sensor temperature

Discovered in Version: 16.30.1004

Fixed in Release: 16.31.1014

2641734

Description: Fixed the rate select mechanism in QSFP modules.

Keywords: Cables

Discovered in Version: 16.30.1004

Fixed in Release: 16.31.1014

2600783

Description: Fixed classification issues for "Passive" cables to be more robust.

Keywords: Cables

Discovered in Version: 16.30.1004

Fixed in Release: 16.31.1014

2574322

Description: Fixed an issue that occasionally caused some performance issues related to RC QPs using E2E-credits (not connected to SRQ and doing send/receive traffic) when the ROCE_ACCL tx_window was enabled.

Keywords: Bandwidth, performance

Discovered in Version: 16.30.1004

Fixed in Release: 16.31.1014

2391109

Description: Fixed an issue that caused a fatal error, and eventually resulted in the HCA hanging when a packet was larger than a strided receive WQE that was being scattered.

Keywords: Strided RQ, MTU

Discovered in Version: 16.30.1004

Fixed in Release: 16.31.1014

2569999

Description: Fixed a rare issue that caused RX pipe to hang.

Keywords: RX pipe

Discovered in Version: 16.30.1004

Fixed in Release: 16.31.1014

2621704

Description: Fixed the resource number size (a 64 bit number) to avoid a scenario where it overwrote it with a 32 bit number and erased the high bits when de-allocating the resource number.

In this scenario, when two resource numbers had identical low 32 bits, and because the high bits were cleared, it resulted in the same idx. Consequently, when two idxes were identical, then it freed the same idx twice.

Keywords: Resource number size, free_4k page

Discovered in Version: 16.30.1004

Fixed in Release: 16.31.1014

2619161

Description: Initialized the rate table in the static configuration so it will be configured at the link-not-up scenarios.

Keywords: RoCE, static configuration, rate table

Discovered in Version: 16.30.1004

Fixed in Release: 16.31.1014

2589430

Description: CRT_DCR with index larger than 1 << 21 can collide with the CRT_SW_RESERVED address.

Keywords: DCR

Discovered in Version: 16.30.1004

Fixed in Release: 16.31.1014

2684071

Description: Changing the default host chaining buffer size or WQE size (HOST_CHAINING_DESCRIPTORS, HOST_CHAINING_TOTAL_BUFFER_SIZE) using NVconfig might result in driver initialization failure.

Keywords: Host chaining

Discovered in Version: 16.29.2002

Fixed in Release: 16.31.1014

2565218

Description: Fixed an issue that caused the TX queue to hang when the VF rate limiter was set and it was leaded as NODNIC.

Keywords: NODNIC

Discovered in Version: 16.27.2008

Fixed in Release: 16.31.1014

2799269

Description: Tunnel Atomics is not functional when using UMR.

Keywords: UMR, Tunneled Atomic

Discovered in Version: 16.29.1016

Fixed in Release: 16.30.1004

2507096

Description: Removed the option to create unnecessary internal CNP operation for the Lossy ADP retransmission feature.

Keywords: RoCE, Lossy, Adp_retrans

Discovered in Version: 16.29.1016

Fixed in Release: 16.30.1004

2444837

Description: Set the cap to 0 for high index functions to avoid too many parallel VF NODNIC functions.

Keywords: NODNIC, VF, ETH PXE

Discovered in Version: 16.29.1016

Fixed in Release: 16.30.1004

2455041

Description: Fixed an issue that prevented PF from sending out packets. A new trigger (every ~1sec) was added to trigger the VQoS algorithm to run full iteration on all the VQoS tree.

Keywords: PF, packets, VQoS

Discovered in Version: 16.29.1016

Fixed in Release: 16.30.1004

© Copyright 2023, NVIDIA. Last updated on May 23, 2023.