Bug Fixes History

NVIDIA Quantum-2 Firmware Release Notes v31.2012.2304 LTS

The following table provides a list of bugs fixed in previous versions.

Internal Ref.

Issues

3887883

Description: In cases where MirroringAgent MAD was sent without configuring fast recovery mirroring using MirroringGlobalTrigger MAD, the agent that was configured will send fast recovery mirroring notifications.

Keywords: Mirroring

Discovered in Version: 31.2012.2200

Fixed in Version: 31.2012.2234

3707400

Description: In rare cases when running SHARP jobs in parallel to InfiniBand Traffic, the switch may be unstable due to buffer oversubscription.

Keywords: SHARP, Buffer

Discovered in Version: 31.2012.2108

Fixed in Version: 31.2012.2234

3696514

3700332

Description: Improved buffer utilization by fixing the ratio of transmitted data and credit packets on the link.

Keywords: Credit Packet, Buffer

Discovered in Version: 31.2012.2014

Fixed in Version: 31.2012.2234

3877860

3864399

Description: Sending pFRN packets to ports that were connected to themselves (loop), caused the switch hanged due to semaphore lock mismatch.

Keywords: pFRN

Discovered in Version: 31.2012.2108

Fixed in Version: 31.2012.2234

3824931

3843040

Description: Illegal packets of a permissive LID (0xFFFF) and VL other than 15 were incorrectly configured to destined for port 0 and be proceeded by the switch firmware, which lead to overloading of the switch firmware.

Keywords: Checks

Discovered in Version: 31.2012.2108

Fixed in Version: 31.2012.2234

3864241

3860421

Description: Incorrect buffer configuration for trap and mirror packets may cause the switch data path to become stuck, potentially resulting in buffer overrun and internal credit leakage.

Keywords: Data Path, Buffer

Discovered in Version: 31.2012.2108

Fixed in Version: 31.2012.2234

3773771

Description: HBF (Hash Based Forwarding) configurations were applied on the incorrect port.

Keywords: Hash Based Forwarding

Discovered in Version: 31.2012.3008

Fixed in Version: 31.2012.2234

3738343

Description: Improved buffer utilization by fixing the ratio of transmitted data and credit packets on the link.

Keywords: Credit packet

Discovered in Version: 31.2012.2014

Fixed in Version: 31.2012.2200

3737622

Description: Updated PSU fans high RPM warning threshold.

Keywords: Fan, Warning

Discovered in Version: 31.2012.2014

Fixed in Version: 31.2012.2200

3651360

Description: Fixed split mode LEDs wrong endian mapping issue.

Keywords: LEDs

Discovered in Version: 31.2012.1068

Fixed in Version: 31.2012.2200

3677817

Description: Added a timeout flow in a case the lock bit sticks high due to race in cable information interfaces.

Keywords: Timeout flow, Race, Cable Information Interfaces

Discovered in Version: 31.2012.2014

Fixed in Version: 31.2012.2148

3702214

Description: The SHARP job fails in case the job is running while an error event triggers a firmware dump.

Keywords: SHARP

Discovered in Version: 31.2012.2014

Fixed in Version: 31.2012.2148

3706827

Description: Packets sent to port0 might drop due to heavy load, causing MADs timeouts in the Subnet Manager.

Keywords: Packet Drops, Subnet Manager

Discovered in Version: 31.2012.2014

Fixed in Version: 31.2012.2148

3705783

Description: Fixed the timer for sending Trap-135 to 1 second (in case a host stopped sending packets in the middle of a SAT).

Keywords: Timer, SAT

Discovered in Version: 31.2012.2014

Fixed in Version: 31.2012.2148

3700332

Description: Updated the ratio between the amount of data sent (TX) to FCCL packets to be determined by the credit size; credit packet will be sent after roughly 4096B of sent data.

Keywords: FCCL, Credit Packet

Discovered in Version: 31.2012.2014

Fixed in Version: 31.2012.2148

3536538

Description: For mirror agent configured with dynamic port analyzer, configuring linear forwarding table may cause mirror agent enablement and unexpected mirrored packets.

Keywords: Recovery

Discovered in Version: 31.2012.1068

Fixed in Version: 31.2012.2014

3592659

3585886

Description: Quantum-2 unmanaged switch may freeze while sending MVCR.

Keywords: MVCR, Switch

Discovered in Version: 31.2012.1024

Fixed in Version: 31.2012.1068

3589044

3587703

3573164

Description: Rare issue that triggers the i2c to module connection to lock and causes the Quantum-2 switch to freeze.

Keywords: i2c, Switch

Discovered in Version: 31.2012.1024

Fixed in Version: 31.2012.1068

3548254

Description: FR4 MMS4X50-NM cable link-up failure after a disconnect or AC cycle.

Keywords: Cables, Link Up

Discovered in Version: 31.2012.1024

Fixed in Version: 31.2012.1068

3570478

Description: Fixed SNR value calculation for correct readings from the MMA4Z00 optical cable module.

Keywords: SNR

Discovered in Version: 31.2012.1024

Fixed in Version: 31.2012.1068

3311198

Description: Disabled "low priority credits" feature on the switch side that caused the credits mechanism to overload the links with credit packets, reducing the available bandwidth for transmitting data packets on the link.

Keywords: Bandwidth

Discovered in Version: 31.2012.1024

Fixed in Version: 31.2012.1068

3554182

Description: Link does not raise with 2nd source MMS4X00-NS transceivers.

Keywords: Cables, link up

Discovered in Version: 31.2010.6064

Fixed in Version: 31.2012.1024

3538638

Description: The message of code 57 in the PDDR Troubleshooting information page was incorrect.

Keywords: Link Diagnostics

Discovered in Version: 31.2010.6064

Fixed in Version: 31.2012.1024

3407038

Description: An unresponsive PSU client can cause the SDA I2C line to hang.

Keywords: I2C

Discovered in Version: 31.2010.6064

Fixed in Version: 31.2012.1024

3477039

Description: Wrong RTT value is exposed under PRTL PRM.

Keywords: Registers, RTT Value

Discovered in Version: 31.2010.6064

Fixed in Version: 31.2012.1024

3481394

Description: When trying to choose the threshold for the Fast Recovery feature (BER Config), it is possible that threshold 0 will be loaded.

Keywords: Fast Recovery, BER Configuration

Discovered in Version: 31.2010.6064

Fixed in Version: 31.2012.1024

3499997

Description: In some cases, the combination of SHARP SAT traffic and SHARP MADs can cause the switch to get stuck.

Keywords: SHARP

Discovered in Version: 31.2010.4210

Fixed in Version: 31.2012.1024

3451519

Description: When using ibdiagnet, an incorrect module alarm type was reported.

Keywords: ibdiagnet, Module Temperature Alarm Type

Discovered in Version: 31.2010.5108

Fixed in Version: 31.2012.1024

Internal Ref.

Issues

3326692

Description: Wrap-around of the time_since_last_clear counter caused incorrect reporting of counters on the port.

Keywords: Counters

Discovered in Version: 31.2010.3118

Fixed in Version: 31.2010.6102

3389432

Description: The flint burning firmware process might take longer than expected, possibly leading to timeouts in SM and logical links drops by the SM, which, in turn, may lead to failure of the flint burn command.

Keywords: SM, Timeout, Flint, Failure

Discovered in Version: 31.2010.6064

Fixed in Version: 31.2010.6102

3339363

Description: pFRN notification state machine got halted in busy-wait on all riscs due to inability to free TX credits.

Keywords: pFRN

Discovered in Version: 31.2010.3118

Fixed in Version: 31.2010.6064

3393378

Description: In some cases, pFRN configuration over multi-SWID caused out-of-bound access to an array and overran FLID configuration.

Keywords: pFRN

Fixed in Version: 31.2010.6064

3342918

Description: On rare occasions, the port might get stuck (in all speeds) during the link up flow when using optical modules.

Keywords: Port Link Up, Port Toggling, Optical Modules

Fixed in Version: 31.2010.6064

3395821

Description: Bandwidth is lower than expected on MMS4X00-NL-QP1 cable.

Keywords: MMS4X00-NL-QP1, Bandwidth

Fixed in Version: 31.2010.6064

2824249

Description: After a firmware update failure, the bad image was not erased.

Keywords: Installation, Firmware

Discovered in Version: 31.2010.2036

Fixed in Version: 31.2010.6064

3362685

Description: In QM9700 systems, when a transceiver module is plugged in when only one of the optic cables is connected (while the second cable is disconnected), the port LED may be incorrectly displayed on the disconnected side.

Keywords: Port LED, Optic Cables

Discovered in Version: 31.2010.4102

Fixed in Version: 31.2010.5108

3377608

Description: When operating in dynamic trees allocation mode, MAD error responses might be received in libsharp.

Keywords: sharp_am, libsharp

Fixed in Version: 31.2010.5108

3362200

Description: In rare cases that involve stress of traffic, unexpected hardware fast path behavior may occur, possibly leading to the switch firmware hanging when toggling the ports.

Keywords: Turbo Path

Discovered in Version: 31.2010.5002

Fixed in Version: 31.2010.5108

3301825

Description: The firmware does not return values for the counters "PortSwLifetimeLimitDiscards" and "PortSwHOQLifetimeLimitDiscards". Support has now been added for the counters.

Keywords: Counters

Discovered in Version: 31.2010.3118

Fixed in Version: 31.2010.5042

3335002

Description: pFRN mirror v1 header pad count showed an invalid padding size.

Keywords: PFRN

Discovered in Version: 31.2010.4010

Fixed in Version: 31.2010.5042

3269531

Description: After multiple MSPS (Management System Power Supply register) calls, the switch gets stuck.

Keywords: MSPS

Discovered in Version: 27.2010.3118

Fixed in Version: 27.2010.5002

3267152

Description: On NDR devices, when collecting BER data, the peer falls, causing the switch to hang.

Keywords: BER COLLECT

Discovered in Version: 31.2010.4102

Fixed in Version: 31.2010.5002

3261861

Description: Connecting an HDR device to an NDR device with Optical cables longer than 30m causes degradation in the bandwidth.

Keywords: HDR-to-NDR

Discovered in Version: 31.2010.4102

Fixed in Version: 31.2010.5002

2974424

Description: Currently, on cables that perform polarity inversion there is no link up.

Keywords: Cables, Polarity Inversion

Discovered in Version: 31.2010.3118

Fixed in Version: 31.2010.5002

3199650

Description: A physical link failure between switches while a SHARP job is running and utilizing the link can cause one of the switches to become invalid for further SHARP jobs. This can result in either a "No resource" response for new SHARP job requests or in jobs getting stuck.

The bug fix requires SHARP version 3.2.

Keywords: SHARP

Discovered in Version: 31.2010.4010

Fixed in Version: 31.2010.4102

3245821

Description: In case of an AR group table set request, the ARN mask is flushed for group that has an active pFRN timer.

Keywords: PFRN

Discovered in Version: 31.2010.4010

Fixed in Version: 31.2010.4102

3253717

Description: mask_force_clear_timeout timer in pFRN feature was not functional (the mask was not cleared when the timer expired).

Keywords: PFRN

Discovered in Version: 31.2010.4010

Fixed in Version: 31.2010.4102

3242209

Description: Set PFRN mad did not return error on wrong inputs in mask_clear_timer and mask_force_clear_timer fields.

Keywords: PFRN

Discovered in Version: 31.2010.4010

Fixed in Version: 31.2010.4102

3143685

Description: The switch does not return SN or PN when trying to call via mlxlink or ibdiagnet.

Keywords: SN, PN, mlxlink, ibdiagnet

Discovered in Version: 31.2010.2300

Fixed in Version: 31.2010.4010

3174239

Description: On rare occasions, traps were not properly repressed, which caused redundant traps to be sent multiple times.

Keywords: Traps

Discovered in Version: 31.2010.3118

Fixed in Version: 31.2010.4010

3002314

Description: On rare occasion, when port is configured to mloop toggle may cause link to not rise.

Keywords: Optic in Mloop

Discovered in Version: 31.2010.2110

Fixed in Version: 31.2010.3118

3127727

Description: On rare occasion, when egress port is split to two, the egress port may get stuck due to wrong Fast Path configuration.

Keywords: Switch Hang, Fast Path, Split

Discovered in Version: 31.2010.3004

Fixed in Version: 31.2010.3118

3082569

Description: In some traffic patterns involving small packets, the PortRcvErrors counter may mistakenly count events of local physical errors due to an internal flow in the hardware that involves link packets.

Keywords: Counters

Discovered in Version: 31.2010.2246

Fixed in Version: 31.2010.3004

3085427

Description: On rare occasions, SHARP semaphore may remain locked on a port following an event of a port link down or an application crash.

Keywords: SHARPv3

Discovered in Version: 31.2010.2036

Fixed in Version: 31.2010.3004

3011581

Description: On rare occasions, job failures with SharpError trap may be experienced as a result of previous jobs that have failed.

Keywords: SHARPv3

Discovered in Version: 31.2010.2036

Fixed in Version: 31.2010.3004

3000602

Description: After disconnecting MMS4X00-NL* cable and connecting Ultron cable to the same port, ports fails to link up.

Keywords: Cables

Discovered in Version: 31.2010.2110

Fixed in Version: 31.2010.2300

3060122

Description: In the event of link fault of a link between root switch and non-root switch during the run of a job, the next job run on the non-root switch may fail.

Keywords: SHARPv3

Discovered in Version: 31.2010.2036

Fixed in Version: 31.2010.2300

2923464

Description: When using MMS4X00-NL Optical module, on rare occasions port that is in NDR speed may get stuck and stay in Polling state.

Keywords: NDR, Optical Module

Discovered in Version: 31.2010.1404

Fixed in Version: 31.2010.2246

2859363

Description: When using NVIDIA Quantum-2 systems in Auto-Neg mode, NDR speed in one lane (1x) is not supported.

Keywords: Auto-Negotiation

Discovered in Version: 31.2010.1310

Fixed in Version: 31.2010.2246

3033131

Description: The number of flows changed from 2 to 1, as intended.

Keywords: SHARPv3

Discovered in Version: 31.2010.2110

Fixed in Version: 31.2010.2246

2972388

Description: Running of concurrent jobs may lead to states where jobs unexpectedly terminate or get stuck.

Keywords: SHARPv3

Discovered in Version: 31.2010.2036

Fixed in Version: 31.2010.2110

2982113

Description: On rare occasions, job resource cleanup may fail.

Keywords: SHARPv3

Discovered in Version: 31.2010.2036

Fixed in Version: 31.2010.2110

2971339

Description: During high load scenarios, performance degradation may be experienced.

Keywords: SHARPv3

Discovered in Version: 31.2010.2036

Fixed in Version: 31.2010.2110

2849215

Description: On NVIDIA Quantum-2 switches, when working with MFA7U10-H0xx cables, if one of the ports in a cage is disabled at the time of initialization by user configuration, reenabling the port will require toggling the link (i.e. enable → disable → enable).

Keywords: NVIDIA Quantum-2, Cables

Discovered in Version: 31.2010.1310

Fixed in Version: 31.2010.2036

2890632

Description: On NVIDIA Quantum-2 systems, changing the Optical module rate was not allowed.

Keywords: Optical Modules

Discovered in Version: 31.2010.1310

Fixed in Version: 31.2010.2036

2885798

Description: In NVIDIA Quantum-2 systems, effective errors may occur with short Copper cable MCP4Y10-N00B.

Workaround: N/A

Discovered in Version: 31.2010.1310

Fixed in Version: 31.2010.2036

2910161

Description: In auto-negotiation flow, using copper cables when toggling both port's sides may cause the port to get stuck on rare occasions.

Keywords: Auto-Negotiation, Copper Cables

Discovered in Version: 31.2010.1310

Fixed in Version: 31.2010.2036

© Copyright 2024, NVIDIA. Last updated on Jul 2, 2024.