Bug Fixes History

NVIDIA ConnectX-7 Adapter Cards Firmware Release Notes v28.41.1000
Note

This section includes history of 3 major releases back. For older releases history, please refer to the relevant firmware versions.

Internal Ref.

Issue

3712016

Description: Fixed an issue that prevented Congestion Control from behaving properly when GRH is used in traffic of an IB cluster.

Keywords: IB Congestion Control, CNP, SL

Discovered in Version: 28.39.1002

Fixed in Release: 28.40.1000

3174038

Description: SPDM requests received while CPLD burn flow is in progress may be answered with incorrect responses.

Keywords: SPDM

Discovered in Version: 28.34.1002

Fixed in Release: 28.40.1000

3110297

Description: When ConnectX-7 adapter card is configured to use the Auto-Negotiation mode, 400G_8x linkup cannot be raised.

Keywords: 400G_8x, linkup

Discovered in Version: 28.34.4000

Fixed in Release: 28.40.1000

3339818

Description: When performing a stress toggling on a ConnectX-7 adapter card that is connected to the MMA1Z00-NS400 cable and the speed is set to 100G_1x with interleaved FEC, a long linkup time of up to 5 min may occur.

Keywords: Toggling, MMA1Z00-NS400

Discovered in Version: 28.36.1010

Fixed in Release: 28.40.1000

3339919

Description:

  • When raising a link using 200G optical cables while connecting a ConnectX-7 to a ConnectX-7, raising a link with width less than the maximum provided by the cable with speed 25G lane is not supported.

  • When raising a link using 400G optical cables while connecting a ConnectX-7 to a ConnectX-7, raising a link with width less than the maximum provided by the cable with speed 50G or 25G lane is not supported.

Keywords: Link up speed

Discovered in Version: 28.36.1010

Fixed in Release: 28.40.1000

3312483

Description: WoL packets may not working properly if sent to Unicast destination MAC.

Keywords: WoL packets, Unicast destination MAC

Discovered in Version: 28.36.1010

Fixed in Release: 28.40.1000

3275394

Description: When performing PCIe link secondary-bus-reset, disable/enable or mlxfwreset on AMD based Genoa systems, the device takes longer then expected to link up, due to a PCIe receiver termination misconfiguration.

Keywords: PCIe

Discovered in Version: 28.37.1014

Fixed in Release: 28.40.1000

3457472

Description: Disabling the Relaxed Ordered (RO) capability (relaxed_ordering_read_pci_enabled=0) using the vhca_resource_manager is currently not functional.

Keywords: Relaxed Ordered

Discovered in Version: 28.37.1014

Fixed in Release: 28.40.1000

3606136

Description: In rare cases, linkup time of NDR and NDR200 with MMA4Z00-NS400 may take longer than 60 seconds.

Keywords: Cables, NDR, NDR200, linkup time

Discovered in Version: 28.39.1002

Fixed in Release: 28.40.1000

3683068

Description: Added back the Digital Feedforward Equalizer (DFFE) hardware component to improve the signal integrity link.

Keywords: Digital Feedforward Equalizer (DFFE)

Discovered in Version: 28.38.1002

Fixed in Release: 28.40.1000

3708035

Description: Fixed an issue with Selective-Repeat configuration which occasionally caused retransmission to wait for timeout instead of out-of-sequence NACK.

Keywords: RoCE, SR

Discovered in Version: 28.38.1002

Fixed in Release: 28.40.1000

3695219

Description: Enabled the lowest minimum rate for SW DCQCN to enable congestion control to hold a larger amount of QPs without pauses or drops.

Keywords: Congestion control, PCC, DCQCN

Discovered in Version: 28.38.1002

Fixed in Release: 28.40.1000

3637429

Description: Fixed an issue that caused the secondary ASIC run module init to fail due to missing condition.

Keywords: Secondary device, EEPROM

Discovered in Version: 28.38.1002

Fixed in Release: 28.40.1000

3693945

Description: Fixed an issue that kept the adapter cards' quad ports UP when using breakout cables / QSFP-split-4. Now when a 4 alignment loss is noticed, the link in 25G/lane Ethernet is dropped.

Keywords: Quad ports, link up, breakout cables / QSFP-split-4

Discovered in Version: 28.38.1002

Fixed in Release: 28.40.1000

3607329

Description: Modified PCIe switch downstream port EQLZ.PH1 timing to 3ms.

Keywords: PCIe switch downstream port

Discovered in Version: 28.38.1002

Fixed in Release: 28.40.1000

3617606

Description: Fixed a rare race condition in NODNIC teardown that caused commands to hang on regular PF.

Keywords: NODNIC teardown

Discovered in Version: 28.36.1010

Fixed in Release: 28.40.1000

Internal Ref.

Issue

3652874

Description: Fixed firmware measurements calculation.

Keywords: Firmware measurements calculation

Discovered in Version: 28.38.1002

Fixed in Release: 28.39.2048

3664415

Description: Fixed an issue that caused Live Migration to hang during the "save" stage.

Keywords: Live migration

Discovered in Version: 28.38.1002

Fixed in Release: 28.39.2048

3629353

Description: Fixed the cr_space in port configuration to prevent wrong timestamp of cqes.

Keywords: Hardware timestamp

Discovered in Version: 28.38.1002

Fixed in Release: 28.39.2048

3582559

Description: Added support for LED scheme #2 to MCX750500B-0D0K / MCX750500B-0D00 adapter cards.

Keywords: LED

Discovered in Version: 28.38.1002

Fixed in Release: 28.39.2048

3669258

Description: Fixed a rare issue that prevented changes in mlxconfig from taking effect upon warm reboot.

Keywords: mlxconfig

Discovered in Version: 28.38.1002

Fixed in Release: 28.39.2048

3670719 / 3676590

Description: Added a small delay after the power up process to fix an issue that occasionally caused the module to be unstable after the power up.

Keywords: Link up

Discovered in Version: 28.38.1002

Fixed in Release: 28.39.2048

3629562

Description: Fixed a code mismatch in the process of handling the cause to the link being down when the remote faults were received.

Keywords: Link down

Discovered in Version: 28.38.1002

Fixed in Release: 28.39.2048

3532508

Description: Fixed a wrong parameter in the cable info MAD that resulted in unnecessary messages in the log.

Keywords: Cable info MAD

Discovered in Version: 28.38.1002

Fixed in Release: 28.39.2048

3634350

Description: Disabled PCI power event messages on OCP 3.0 adapter cards according to the spec requirements.

Keywords: PCI, OCP 3.0

Discovered in Version: 28.38.1002

Fixed in Release: 28.39.2048

3636714

Description: Fixed an issue that caused the buffer for PLDM firmware update that were pending NIC requests to not being properly locked in case of PLDM-over-NC-SI, and consequently being corrupted by other flows.

Keywords: PLDM, buffer

Discovered in Version: 28.38.1002

Fixed in Release: 28.39.2048

3592276

Description: Fixed an issue that prevent MSI Interrupts from being advertised correctly, resulting in the wrong MSI being sent.

Keywords: MSI

Discovered in Version: 28.38.1002

Fixed in Release: 28.39.2048

3605363

Description: "Get Temperature" OEM command now always returns a unified temperature.

Keywords: Temperature

Discovered in Version: 28.38.1002

Fixed in Release: 28.39.2048

3531972

Description: Changed the bar configuration algorithm so that the last update to the bar address will be the one that takes affect when the host configures the same bar address for two different PFs.

Keywords: Network Interface

Discovered in Version: 28.38.1002

Fixed in Release: 28.39.2048

3626872

Description: Fixed an issue that caused the firmware to miscalculate the value of the maximum current temperature measured from all the diodes (found in the Internal_sensor_curr_temp field).

Keywords: Sensor, temperature

Discovered in Version: 28.38.1002

Fixed in Release: 28.39.2048

3544340 / 3537706 / 3639178

Description: Improved SPDM v1.0 compatibility. SPDM measurements signature additional fixes.

Keywords: SPDM

Discovered in Version: 28.38.1002

Fixed in Release: 28.39.2048

3587821

Description: Fixed a HW bug that resulted in transaction loss that when cache replacement transaction occurs in parallel to code transcoding.

Keywords: HW bug, transaction loss

Discovered in Version: 28.38.1002

Fixed in Release: 28.39.2048

3610861

Description: The eeprom module gets stuck in polling in 20% of the times after reset. To resolve the issue, a delay after config module to high power was added.

Keywords: Polling, module, reset

Discovered in Version: 28.38.1002

Fixed in Release: 28.39.2048

3507928

Description: Fixed a linkup failure issue that occurred when connecting to a 25GbE transceiver by clearing the PSI Aging before trying to open Tx power.

Keywords: Cables, PSI Aging, 25GbE transceiver

Discovered in Version: 28.38.1002

Fixed in Release: 28.39.2048

3602379

Description: The "Bad Signal Integrity" message seen after power cycle can be safely ignored. The user should monitor BER number.

Keywords: Bad Signal Integrity, BER

Discovered in Version: 28.38.1002

Fixed in Release: 28.39.2048

3605686

Description: Fixed a statics issue that caused the i2c access to module to lock and stuck the switch.

Keywords: i2c, switch

Discovered in Version: 28.38.1900

Fixed in Release: 28.39.2048

3482251

Description: Added support for hairpin drop counter in QUERY_VNIC_ENV command.

Keywords: Hairpin

Discovered in Version: 28.38.1002

Fixed in Release: 28.39.2048

3539437

Description: Fixed an issue that prevented the get_func_num_from_pci_func_num function from returning the value "-1" for undefined function type.

Keywords: get_func_num_from_pci_func_num

Discovered in Version: 28.38.1002

Fixed in Release: 28.39.2048

3570478

Description: Fixed Signal-to-Noise Ratio (SNR) value calculation for correct readings from the MMA4Z00 optical cable module.

Keywords: SNR

Discovered in Version: 28.38.1002

Fixed in Release: 28.39.2048

3602169

Description: Added a locking mechanism to protect the firmware from a race condition between insertion and deletion of the same rule in parallel. Such behavior occasionally resulted in firmware accessing a memory that has already been released, thus causing IOMMU / translation error.

Note: This fix will not impact insertion rate for tables owned by SW steering.

Keywords: Firmware steering

Discovered in Version: 28.38.1002

Fixed in Release: 28.39.2048

3588515 / 3409806

Description: Fixed a race condition that led to a firmware assert upon driver removal, or when changing the ETH flow control scheme in case of a stress of larger than MTU ingress packets.

Keywords: Race condition, firmware assert

Discovered in Version: 28.38.1002

Fixed in Release: 28.39.2048

3610169

Description: Fixed QoS Shaper handling behavior for non-transmitting applications.

Keywords: QoS Shaper

Discovered in Version: 28.38.1002

Fixed in Release: 28.39.2048

Internal Ref.

Issue

3537571

Description: Fixed SPDM measurements signature.

Keywords: SPDM

Discovered in Version: 28.37.1014

Fixed in Release: 28.38.1002

3439757

Description: Fixed an issue that prevented the system from detecting the PCIe device during slot DC power cycle tests.

Keywords: PCIe device, DC power cycle tests

Discovered in Version: 28.37.1014

Fixed in Release: 28.38.1002

3534473

Description: Added a new field/slot ID to PRS pcie_cfg_data.pci_cfg_space.pciex.pcie_switch_ini_defined_base_slot_id = 3 to define a specific slot number for GPU bridge DSP.

Keywords: Slot ID

Discovered in Version: 28.37.1014

Fixed in Release: 28.38.1002

3331179

Description: Improved token calculation.

Keywords: Token calculation

Discovered in Version: 28.37.1014

Fixed in Release: 28.38.1002

3299420

Description: Upgrading from firmware v28.38.1014 and below to v28.38.1002 no longer requires an upgrade to an intermediate version.

Keywords: Firmware upgrade

Discovered in Version: 28.37.1014

Fixed in Release: 28.38.1002

3394841

Description: Updated the plug in/out events' reporting method to report only when the last recorded event is the opposite of the current event.

Keywords: Port events

Discovered in Version: 28.37.1014

Fixed in Release: 28.38.1002

3469311

Description: Fixed the SPDM operations order according to the spec. v1.1.0.

Keywords: SPDM operations

Discovered in Version: 28.37.1014

Fixed in Release: 28.38.1002

3527987

Description: Added support for NC-SI channel on both ports.

Keywords: NC-SI channel

Discovered in Version: 28.37.1014

Fixed in Release: 28.38.1002

3459317

Description: Changed the protection mechanism for BAR configuration.

Keywords: BAR configuration

Discovered in Version: 28.37.1014

Fixed in Release: 28.38.1002

3345150

Description: Fixed an issue that caused a packet with invalid/bad padcount to be silently dropped instead of sending a bad nack error.

Keywords: Packet drop

Discovered in Version: 28.37.1014

Fixed in Release: 28.38.1002

3418627

Description: Fixed wrong credits configuration that occurred when MAX_ACC_OUT_READ was configured.

Keywords: Performance

Discovered in Version: 28.37.1014

Fixed in Release: 28.38.1002

3466088

Description: Update the SX root to work with driverless mode in vport0 gvmi teardown.

Keywords: Driverless mode

Discovered in Version: 28.37.1014

Fixed in Release: 28.38.1002

3487313

Description: Fixed a a rare deadlock case between 2 DC packets in the RX side.

Keywords: Firmware deadlock

Discovered in Version: 28.37.1014

Fixed in Release: 28.38.1002

3495889

Description: Fixed a QoS host port rate limit shaper inaccuracy that occurred when the shaper was configured via the QSHR access register.

Keywords: Port rate limit shaper

Discovered in Version: 28.37.1014

Fixed in Release: 28.38.1002

3449451

Description: When using ConnectX-7 adapter card as InfiniBand, the port must be configured to use the Auto-Negotiation mode.

Keywords: Auto-Negotiation, InfiniBand

Discovered in Version: 28.37.1014

Fixed in Release: 28.38.1002

© Copyright 2024, NVIDIA. Last updated on May 6, 2024.