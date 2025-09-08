NVIDIA NVOS User Manual for InfiniBand Switches v25.02.5019
Link Diagnostic Per Port

When debugging a system, it is important to be able to quickly identify the root of a problem. The Diagnostic commands enables an insight into the physical layer components where the user is able to see information such as a cable status (plugged/unplugged) or if Auto-Negotiation has failed.

PHY Firmware Indication

Link Diagnostic Indication

Code

Firmware PHY Indication (0–1023)

0

No issue observed

1

Port is close by command

2–4

Auto Negotiation failure

5–8

Link training failure

9–13

Logical mismatch between link partners

14

Remote fault received

15

Bad Signal integrity

16

Compliance code mismatch (protocol mismatch between cable and port)

17

Bad signal integrity

18

Internal error

19

Internal error

22

Internal error

23

Internal error

24–32

Cable compliance code mismatch (protocol mismatch

between cable and port)

34

Speed degradation

35

Speed degradation

38

Auto Negotiation failure

39

Auto Negotiation failure

40

VPI protocol do not match

41

Port is closed, module cannot be set to the enabled rate

42

Bad signal integrity

48

Bad signal integrity

49

Bad signal integrity

50

Internal error

52

Bad signal integrity

55

Internal error

56

module_lanes_frequency_not_synced

57

Signal not detected

60

No partner detected for long time

128

Troubleshooting in process

1023

Information not available

Code

Firmware Management Issues (1024–2047)

1024

Cable is unplugged

1025

Long range for non NVIDIA cable/module

1026

Bus stuck (I2C Data or clock shorted)

1027

Bad/unsupported EEPROM

1028

Part number list

1029

Unsupported cable

1030

Module temperature shutdown

1031

Shorted cable

1032

Power budget exceeded

1033

Management force down the port

1034

Module is disabled by command

1035

System Power is Exceeded therefore the module is powered off

1036

Module’s PMD type is not enabled (see PMTPS).

1040

pcie system power slot Exceeded

1042

Module state machine fault

1043–1046

Module’s stamping speed degeneration

1047, 1048

Modules DataPath FSM fault

1050–1053

Module Boot Error

1054

Module Forced to Low Power by command


Link Down Reason Indication

Code

Link Down Reason Indication

0

No_link_down_indication

1

Unknown_reason

2

Hi_SER_or_Hi_BER

3

Block_Lock_loss

4

Alignment_loss

5

FEC_sync_loss

6

PLL_lock_loss

7

FIFO_overflow

8

false_SKIP_condition

9

Minor_Error_threshold_exceeded

10

Physical_layer_retransmission_timeout

11

Heartbeat_errors

12

Link_Layer_credit_monitoring_watchdog

13

Link_Layer_integrity_threshold_exceeded

14

Link_Layer_buffer_overrun

15

Down_by_outband_command_with_healthy_link

16

Down_by_outband_command_for_link_with_hi_ber

17

Down_by_inband_command_with_healthy_link

18

Down_by_inband_command_for_link_with_hi_ber

19

Down_by_verification_GW

20

Received_Remote_Fault

21

Received_TS1

22

Down_by_management_command

23

Cable_was_unplugged

24

Cable_access_issue

25

Cable_Thermal_shutdown

26

Current_issue

27

Power_budget

28

Fast_recovery_raw_ber

29

Fast_recovery_effective_ber

30

Fast_recovery_symbol_ber

31

Fast_recovery_credit_watchdog

32

Peer_side_down_to_sleep_state

33

Peer_side_down_to_disable_state

34

Peer_side_down_to_disable_and_port_lock

35

Peer_side_down_due_to_thermal_event

36

Peer_side_down_due_to_force_event

37

Peer_side_down_due_to_reset_event

38

Reset_no_power_cycle

39

Fast_recovery_tx_plr_trigger

40

Down_due_to_HW_force_event

41

Down_due_to_thermal_event

42

L1_exit_failure

43

too_many_link_error_recoveries

44

Down_due_to_contain_mode

45

BW_loss_threshold_exceeded


Link Diagnostic Commands
