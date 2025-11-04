Bug Fixes History
This section includes the bug fix history for the last three major releases. For the history of older releases, please refer to the relevant versions.
Internal Ref.
Issues
4493905
Description: In rare cases, repressing SHARP traps may fail.
Keywords: SHARP, TRAP
Detected in Version: 31.2014.2084
Discovered in Version: 31.2016.1028
4445694
Description: Once the MAD arrived at the vPort, the firmware returned an OK status rather than an error (without affecting functionality).
Keywords: vPort
Detected in Version: 31.2014.2084
Discovered in Version: 31.2016.1028
Internal Ref.
Issues
4391761
Description: In rare cases, the firmware flow handling I2C may get stuck and only recover after a reboot.
Keywords: Firmware, I2C
Discovered in Version: 31.2014.2084
Fixed in Version: 31.2014.4044
4403112
Description: In rare cases, cleaning all SHARP resources may fail following event of port dropped during a job.
Keywords: SHARP
Discovered in Version: 31.2014.2084
Fixed in Version: 31.2014.4044
4363924
Description: When SM was running on the switch host, VS-MAD Classes 0x0b and 0x0c responses did not return to the SM, causing the SM to continue sending MADs. This could result in switch congestion, leading to dropped MADs.
Keywords: SM, VS MADs
Discovered in Version: 31.2014.2084
Fixed in Version: 31.2014.4044
4323083
Description: The internal cableinfo timeout has been increased from 2 seconds to 4 seconds to accommodate potential delays caused by another command not releasing the packet buffer.
Keywords: cableinfo timeout
Discovered in Version: 31.2014.2084
Fixed in Version: 31.2014.4044
Internal Ref.
Issues
4245399
Description: In rare cases, when running a SHARP job on two different groups and the job was stopped in the middle, cleaning SHARP resources may fail.
Keywords: SHARP
Discovered in Version: 31.2014.2084
Fixed in Version: 31.2014.3062
4232783
Description: In rare cases when extreme congestion occur in the cluster, the following commands might fail (due to changes in buffer allocation):
1. Creating/destruction SHARP trees
2. Changing the Quality of service of the switch
3. Changing the number of operational VLs of a port
Keywords: Congestion
Discovered in Version: 31.2014.2084
Fixed in Version: 31.2014.3062
4083638
Description: SHARP discard counters may be inaccurate, due to incorrect configuration.
Keywords: SHARP
Discovered in Version: 31.2014.2084
Fixed in Version: 31.2014.3062
Internal Ref.
Issues
4129741
Description: In rare cases, when a SHARP job is stopped while ANDR operations were in use, the next job may fail and require switch reboot.
Keywords: SHARP
Discovered in Version: 31.2014.1000
Fixed in Version: 31.2014.2084
4146212
Description: Inaccurate credits configurations in switch data path which, in rare cases, can lead to unexpected behavior or performance degradation.
Keywords: Credits, Data Path, Fatal Cause
Discovered in Version: 31.2012.2014
Fixed in Version: 31.2014.2084
4037679
Description: In some cases where incorrect MAD handling causes the switch to become unresponsive to the SM, a switch reset is required.
Keywords: MADs
Discovered in Version: 31.2014.1000
Fixed in Version: 31.2014.2084
3782281
3778631
Description: When performing a partial split (where only one port in a cage is split) on an optical module, port may not link up.
Keywords: Optical Module, Split
Discovered in Version: 31.2012.4006
Fixed in Version: 31.2014.2084
3605188
Description: Occasionally, when using an FR4-coherent optical module while running a split-unsplit flow, the port may not raise a link.
Keywords: FR4-Coherent Optical Module
Discovered in Version: 31.2012.1024
Fixed in Version: 31.2014.2084
3967793
Description: When cable is unplugged in unmanaged switches, port width remains 2X, even when using a 4X cable.
Keywords: Cables Width
Discovered in Version: 31.2012.1024
Fixed in Version: 31.2014.2084
4035403
Description: In cases of repeating flapping links, the pFRN recovery mechanism might not be triggered.
Keywords: SHIELDv2, pFRN
Discovered in Version: 31.2012.2234
Fixed in Version: 31.2014.2084
Internal Ref.
Issues
3685913
Description: Link down counter was not consistently incrementing while optical modules were removed or plugged out.
Keywords: Counters, Timing
Discovered in Version: 31.2012.2014
Fixed in Version: 31.2014.1090
3888873
Description: Head-of-line blocking between QP1 and QP0 MADs from the switch to the firmware packet buffer occurs when the firmware is busy handling a MAD, causing another MAD with the same QP to wait in the queue ahead of MADs from different QPs.
Keywords: MADs
Discovered in Version: 31.2012.2200
Fixed in Version: 31.2014.1090
3977077
Description: A flap on one port of an NDR transceiver may cause the twin port on the same transceiver to also flap.
Keywords: Twin Port
Discovered in Version: 31.2012.4036
Fixed in Version: 31.2014.1090
3924542
Description: On rare occasions, SHARP job failure with trap 135 may be experienced as a result of previous jobs that have failed or stopped.
Keywords: SHARPv3
Discovered in Version: 31.2012.2234
Fixed in Version: 31.2014.1090
3887857
Description: In cases where MirroringAgent MAD was sent without configuring fast recovery mirroring using MirroringGlobalTrigger MAD, the agent that was configured will send fast recovery mirroring notifications.
Keywords: Mirroring
Discovered in Version: 31.2012.4036
Fixed in Version: 31.2014.1090
3813780
Description: Following a module reset, link up may take up to 70 seconds.
Keywords: Module Rest, Link Up
Discovered in Version: 31.2012.4006
Fixed in Version: 31.2014.1090
Internal Ref.
Issues
3864399
Description: Sending pFRN packets to ports that were connected to themselves (loop), caused the switch hanged due to semaphore lock mismatch.
Keywords: pFRN
Discovered in Version: 31.2012.2108
Fixed in Version: 31.2012.4036
3860421
Description: Incorrect buffer configuration for trap and mirror packets may cause the switch data path to become stuck, potentially resulting in buffer overrun and internal credit leakage.
Keywords: Data Path, Buffer
Discovered in Version: 31.2012.2108
Fixed in Version: 31.2012.4036
3700332
Description: Updated the ratio between the amount of data sent (TX) to FCCL packets to be determined by the credit size; credit packet will be sent after roughly 4096B of sent data.
Keywords: FCCL, Credit Packet
Discovered in Version: 31.2012.2014
Fixed in Version: 31.2012.4006
3657798
Description: During the transition from 2x split to 4x split port configuration, the procedure for clearing routing logic for management ports was incorrect.
Keywords: Management Ports, Split
Discovered in Version: 31.2012.3008
Fixed in Version: 31.2012.4006
3753126
Description: Interference on the AC line may cause a wrong reset sequence, which leads to a system error and causes all ports to be in link-down mode. Only a power cycle restores the switch to a normal operation mode.
For CPLD Upgrade please contact NVIDIA technical support team.
Fixes available in Main CPLD revision 0300 and Port CPLD revision 0700 or higher.
Keywords: CPLD
Discovered in Version: 31.2012.2108
Fixed in Version: 31.2012.4006
3715407
Description: Because of a race condition in cable information interfaces, the lock bit remains stuck in the high position. Consequently, a timeout flow has been implemented to address this scenario.
Keywords: Cables, Timeout
Discovered in Version: 31.2012.2014
Fixed in Version: 31.2012.4006
3740969
Description: PSU fan error is experienced. To address this issue, the PSU fans high RPM warning threshold has been increased to 32500.
Keywords: PSU Fan
Discovered in Version: 31.2012.2108
Fixed in Version: 31.2012.4006
3773771
Description: HBF (Hash Based Forwarding) configurations were applied on the incorrect port.
Keywords: Hash Based Forwarding
Discovered in Version: 31.2012.3008
Fixed in Version: 31.2012.4006
3707400
Description: In cases where buffer is overutilized, failure to allocate a buffer for the SHARP TX retry buffer may lead to buffer oversubscription.
Keywords: SHARP, Buffer
Discovered in Version: 31.2012.2108
Fixed in Version: 31.2012.4006
3724050
Description: The PRBS register could not be properly disabled. To solve this, the Ethernet FSM closes upon receiving an Enable event without a lock only when the FSM is already in the Close state.
Keywords: Ethernet FSM, PRBS
Discovered in Version: 31.2012.2014
Fixed in Version: 31.2012.4006
3824931
3843040
Description: Illegal packets of a permissive LID (0xFFFF) and VL other than 15 were incorrectly configured to destined for port 0 and be proceeded by the switch firmware, which lead to overloading of the switch firmware.
Keywords: Checks
Discovered in Version: 31.2012.2108
Fixed in Version: 31.2012.4006
Internal Ref.
Issues
3742609
Description: Fix split mode LEDs wrong endian mapping issue.
Keywords: LED
Discovered in Version: 31.2012.1068
Fixed in Version: 31.2012.3040
3725809
Description: Increased PSU fans high RPM warning threshold.
Keywords: Fan, Warning
Discovered in Version: 31.2012.2108
Fixed in Version: 31.2012.3040
Internal Ref.
Issues
3677817
Description: Fixed a race condition in the cable info interfaces that caused the switch be non-responsive.
Keywords: Cable info interfaces
Discovered in Version: 31.2012.2108
Fixed in Version: 31.2012.3008
3705783
Description: Fixed the timer for sending Trap-135 to 1 second (in case a host stopped sending packets in the middle of a SAT).
Keywords: Timer
Discovered in Version: 31.2012.2108
Fixed in Version: 31.2012.3008
3669982
Description: Fixed an issue that caused the SHARP job to fail due to an error event that triggered the firmware dumps.
Keywords: SHARP, firmware dumps
Discovered in Version: 31.2012.2108
Fixed in Version: 31.2012.3008
3626729
3677386
Description: Fixed credit management of shared buffer scheme that effected the overall bandwidth performance of the switch.
Keywords: Buffer Scheme
Discovered in Version: 31.2012.2108
Fixed in Version: 31.2012.3008
3600467
Description: Fixed the statics issue where the I2C to modules is locked and might cause switch to get stuck by resetting the switch.
Keywords: I2C
Discovered in Version: 31.2012.2108
Fixed in Version: 31.2012.3008
3719983
Description: Fixed the packets sent to port0 which might drop due to heavy load, causing MADs timeouts in SM.
Keywords: Packets
Discovered in Version: 31.2012.2108
Fixed in Version: 31.2012.3008
Internal Ref.
Issues
3536538
Description: For mirror agent configured with dynamic port analyzer, configuring linear forwarding table may cause mirror agent enablement and unexpected mirrored packets.
Keywords: Recovery
Discovered in Version: 31.2012.1068
Fixed in Version: 31.2012.2014
Internal Ref.
Issues
3592659
3585886
Description: Switch may freeze while sending MVCR.
Keywords: MVCR
Discovered in Version: 31.2012.1024
Fixed in Version: 31.2012.1068
3589044
3587703
3573164
Description: Rare issue that triggers the I2C to module connection to lock and causes the switch to freeze.
Keywords: I2C
Discovered in Version: 31.2012.1024
Fixed in Version: 31.2012.1068
3548254
Description: FR4 MMS4X50-NM cable link-up failure after a disconnect or AC cycle.
Keywords: Cables, Link Up
Discovered in Version: 31.2012.1024
Fixed in Version: 31.2012.1068
3570478
Description: Fixed SNR value calculation for correct readings from the MMA4Z00 optical cable module.
Keywords: SNR
Discovered in Version: 31.2012.1024
Fixed in Version: 31.2012.1068
3311198
3586423
Description: Disabled "low priority credits" feature on the switch side that caused the credits mechanism to overload the links with credit packets, reducing the available bandwidth for transmitting data packets on the link.
Keywords: Bandwidth
Discovered in Version: 31.2012.1024
Fixed in Version: 31.2012.1068
Internal Ref.
Issues
3554182
Description: Link does not raise with 2nd source MMS4X00-NS transceivers.
Keywords: Cables, Link Up
Discovered in Version: 31.2010.6064
Fixed in Version: 31.2012.1024
3538638
Description: The message of code 57 in the PDDR Troubleshooting information page was incorrect.
Keywords: Link Diagnostics
Discovered in Version: 31.2010.6064
Fixed in Version: 31.2012.1024
3407038
Description: An unresponsive PSU client can cause the SDA I2C line to hang.
Keywords: I2C
Discovered in Version: 31.2010.6064
Fixed in Version: 31.2012.1024
3477039
Description: Wrong RTT value is exposed under PRTL PRM.
Keywords: Registers, RTT Value
Discovered in Version: 31.2010.6064
Fixed in Version: 31.2012.1024
3481394
Description: When trying to choose the threshold for the Fast Recovery feature (BER Config), it is possible that threshold 0 will be loaded.
Keywords: Fast Recovery, BER Configuration
Discovered in Version: 31.2010.6064
Fixed in Version: 31.2012.1024
3499997
Description: In some cases, the combination of SHARP SAT traffic and SHARP MADs can cause the switch to get stuck.
Keywords: SHARP
Discovered in Version: 31.2010.4210
Fixed in Version: 31.2012.1024
3451519
Description: When using ibdiagnet, an incorrect module alarm type was reported.
Keywords: ibdiagnet, Module Temperature Alarm Type
Discovered in Version: 31.2010.5108
Fixed in Version: 31.2012.1024
Internal Ref.
Issues
3326692
Description: Wrap-around of the time_since_last_clear counter caused incorrect reporting of counters on the port.
Keywords: Counters
Discovered in Version: 31.2010.3118
Fixed in Version: 31.2010.6102
3389432
Description: The flint burning firmware process might take longer than expected, possibly leading to timeouts in SM and logical links drops by the SM, which, in turn, may lead to failure of the flint burn command.
Keywords: SM, Timeout, Flint, Failure
Discovered in Version: 31.2010.6064
Fixed in Version: 31.2010.6102
3339363
Description: pFRN notification state machine got halted in busy-wait on all riscs due to inability to free TX credits.
Keywords: pFRN
Discovered in Version: 31.2010.3118
Fixed in Version: 31.2010.6064
3393378
Description: In some cases, pFRN configuration over multi-SWID caused out-of-bound access to an array and overran FLID configuration.
Keywords: pFRN
Fixed in Version: 31.2010.6064
3342918
Description: On rare occasions, the port might get stuck (in all speeds) during the link up flow when using optical modules.
Keywords: Port Link Up, Port Toggling, Optical Modules
Fixed in Version: 31.2010.6064
3395821
Description: Bandwidth is lower than expected on MMS4X00-NL-QP1 cable.
Keywords: MMS4X00-NL-QP1, Bandwidth
Fixed in Version: 31.2010.6064
2824249
Description: After a firmware update failure, the bad image was not erased.
Keywords: Installation, Firmware
Discovered in Version: 31.2010.2036
Fixed in Version: 31.2010.6064
3362685
Description: In QM9700 systems, when a transceiver module is plugged in when only one of the optic cables is connected (while the second cable is disconnected), the port LED may be incorrectly displayed on the disconnected side.
Keywords: Port LED, Optic Cables
Discovered in Version: 31.2010.4102
Fixed in Version: 31.2010.5108
3377608
Description: When operating in dynamic trees allocation mode, MAD error responses might be received in libsharp.
Keywords: sharp_am, libsharp
Fixed in Version: 31.2010.5108
3362200
Description: In rare cases that involve stress of traffic, unexpected hardware fast path behavior may occur, possibly leading to the switch firmware hanging when toggling the ports.
Keywords: Turbo Path
Discovered in Version: 31.2010.5002
Fixed in Version: 31.2010.5108
3301825
Description: The firmware does not return values for the counters "PortSwLifetimeLimitDiscards" and "PortSwHOQLifetimeLimitDiscards". Support has now been added for the counters.
Keywords: Counters
Discovered in Version: 31.2010.3118
Fixed in Version: 31.2010.5042
3335002
Description: pFRN mirror v1 header pad count showed an invalid padding size.
Keywords: PFRN
Discovered in Version: 31.2010.4010
Fixed in Version: 31.2010.5042
3269531
Description: After multiple MSPS (Management System Power Supply register) calls, the switch gets stuck.
Keywords: MSPS
Discovered in Version: 27.2010.3118
Fixed in Version: 27.2010.5002
3267152
Description: On NDR devices, when collecting BER data, the peer falls, causing the switch to hang.
Keywords: BER COLLECT
Discovered in Version: 31.2010.4102
Fixed in Version: 31.2010.5002
3261861
Description: Connecting an HDR device to an NDR device with Optical cables longer than 30m causes degradation in the bandwidth.
Keywords: HDR-to-NDR
Discovered in Version: 31.2010.4102
Fixed in Version: 31.2010.5002
2974424
Description: Currently, on cables that perform polarity inversion there is no link up.
Keywords: Cables, Polarity Inversion
Discovered in Version: 31.2010.3118
Fixed in Version: 31.2010.5002
3199650
Description: A physical link failure between switches while a SHARP job is running and utilizing the link can cause one of the switches to become invalid for further SHARP jobs. This can result in either a "No resource" response for new SHARP job requests or in jobs getting stuck.
The bug fix requires SHARP version 3.2.
Keywords: SHARP
Discovered in Version: 31.2010.4010
Fixed in Version: 31.2010.4102
3245821
Description: In case of an AR group table set request, the ARN mask is flushed for group that has an active pFRN timer.
Keywords: PFRN
Discovered in Version: 31.2010.4010
Fixed in Version: 31.2010.4102
3253717
Description: mask_force_clear_timeout timer in pFRN feature was not functional (the mask was not cleared when the timer expired).
Keywords: PFRN
Discovered in Version: 31.2010.4010
Fixed in Version: 31.2010.4102
3242209
Description: Set PFRN mad did not return error on wrong inputs in mask_clear_timer and mask_force_clear_timer fields.
Keywords: PFRN
Discovered in Version: 31.2010.4010
Fixed in Version: 31.2010.4102
3143685
Description: The switch does not return SN or PN when trying to call via mlxlink or ibdiagnet.
Keywords: SN, PN, mlxlink, ibdiagnet
Discovered in Version: 31.2010.2300
Fixed in Version: 31.2010.4010
3174239
Description: On rare occasions, traps were not properly repressed, which caused redundant traps to be sent multiple times.
Keywords: Traps
Discovered in Version: 31.2010.3118
Fixed in Version: 31.2010.4010
3002314
Description: On rare occasion, when port is configured to mloop toggle may cause link to not rise.
Keywords: Optic in Mloop
Discovered in Version: 31.2010.2110
Fixed in Version: 31.2010.3118
3127727
Description: On rare occasion, when egress port is split to two, the egress port may get stuck due to wrong Fast Path configuration.
Keywords: Switch Hang, Fast Path, Split
Discovered in Version: 31.2010.3004
Fixed in Version: 31.2010.3118
3082569
Description: In some traffic patterns involving small packets, the PortRcvErrors counter may mistakenly count events of local physical errors due to an internal flow in the hardware that involves link packets.
Keywords: Counters
Discovered in Version: 31.2010.2246
Fixed in Version: 31.2010.3004
3085427
Description: On rare occasions, SHARP semaphore may remain locked on a port following an event of a port link down or an application crash.
Keywords: SHARPv3
Discovered in Version: 31.2010.2036
Fixed in Version: 31.2010.3004
3011581
Description: On rare occasions, job failures with SharpError trap may be experienced as a result of previous jobs that have failed.
Keywords: SHARPv3
Discovered in Version: 31.2010.2036
Fixed in Version: 31.2010.3004
3000602
Description: After disconnecting MMS4X00-NL* cable and connecting Ultron cable to the same port, ports fails to link up.
Keywords: Cables
Discovered in Version: 31.2010.2110
Fixed in Version: 31.2010.2300
3060122
Description: In the event of link fault of a link between root switch and non-root switch during the run of a job, the next job run on the non-root switch may fail.
Keywords: SHARPv3
Discovered in Version: 31.2010.2036
Fixed in Version: 31.2010.2300
2923464
Description: When using MMS4X00-NL Optical module, on rare occasions port that is in NDR speed may get stuck and stay in Polling state.
Keywords: NDR, Optical Module
Discovered in Version: 31.2010.1404
Fixed in Version: 31.2010.2246
2859363
Description: When using NVIDIA Quantum-2 systems in Auto-Neg mode, NDR speed in one lane (1x) is not supported.
Keywords: Auto-Negotiation
Discovered in Version: 31.2010.1310
Fixed in Version: 31.2010.2246
3033131
Description: The number of flows changed from 2 to 1, as intended.
Keywords: SHARPv3
Discovered in Version: 31.2010.2110
Fixed in Version: 31.2010.2246
2972388
Description: Running of concurrent jobs may lead to states where jobs unexpectedly terminate or get stuck.
Keywords: SHARPv3
Discovered in Version: 31.2010.2036
Fixed in Version: 31.2010.2110
2982113
Description: On rare occasions, job resource cleanup may fail.
Keywords: SHARPv3
Discovered in Version: 31.2010.2036
Fixed in Version: 31.2010.2110
2971339
Description: During high load scenarios, performance degradation may be experienced.
Keywords: SHARPv3
Discovered in Version: 31.2010.2036
Fixed in Version: 31.2010.2110
2849215
Description: On NVIDIA Quantum-2 switches, when working with MFA7U10-H0xx cables, if one of the ports in a cage is disabled at the time of initialization by user configuration, reenabling the port will require toggling the link (i.e. enable → disable → enable).
Keywords: NVIDIA Quantum-2, Cables
Discovered in Version: 31.2010.1310
Fixed in Version: 31.2010.2036
2890632
Description: On NVIDIA Quantum-2 systems, changing the Optical module rate was not allowed.
Keywords: Optical Modules
Discovered in Version: 31.2010.1310
Fixed in Version: 31.2010.2036
2885798
Description: In NVIDIA Quantum-2 systems, effective errors may occur with short Copper cable MCP4Y10-N00B.
Workaround: N/A
Discovered in Version: 31.2010.1310
Fixed in Version: 31.2010.2036
2910161
Description: In auto-negotiation flow, using copper cables when toggling both port's sides may cause the port to get stuck on rare occasions.
Keywords: Auto-Negotiation, Copper Cables
Discovered in Version: 31.2010.1310
Fixed in Version: 31.2010.2036