Troubleshooting

Problem Indicator

Symptoms

Cause and Solution

LEDs

System Status LED is blinking for more than 5 minutes

Cause: MLNX-OS software did not boot properly and only firmware is running.

Solution: Connect to the system via the console port, and check the software status. You might need to contact an FAE if the MLNX-OS software did not load properly.

System Status LED is red

Cause:

  • Critical system fault (CPU error, bad firmware)

  • Over temperature

Solution:

  • Check environmental conditions (room temperature)

Fan Status LED is red

Cause: Possible fan issue
Solution:

  • Check that the fan is fully inserted and nothing blocks the airflow

  • Replace the fan FRU if needed

PSU Status LED is red

Cause: Possible PSU issue
Solution:

  • Check/replace the power cable

  • Replace the PSU if needed

The activity LED does not light up (InfiniBand)

Make sure that there is an SM running in the fabric.

System boot failure

The last software upgrade failed on x86 based systems

Solution:

  • Connect the RS232 connector (CONSOLE) to a laptop.

  • Push the system’s reset button.

  • Press the ArrowUp or ArrowDown key during the system boot. GRUB menu will appear. For example:

Copy
Copied!
            

Default image: 'SX_X86_64 SX_3.4.0008 2014-11-10 20:07:51 x86_64' Press enter to boot this image, or any other key for boot menu Booting default image in 3 seconds. Boot Menu ------------------------------------------------------------------- 0: SX_X86_64 SX_3.4.0008 2014-11-10 20:07:51 x86_64 1: SX_X86_64 SX_3.4.0007 2014-10-23 17:27:34 x86_64 ------------------------------------------------------------------- Use the ArrowUp and ArrowDown keys to select which entry is highlighted. Press enter to boot the selected image or 'p' to enter a password to unlock the next set of features. Highlighted entry is 0: "

  • Select previous image to boot by pressing an arrow key and choosing the appropriate image.

System date and time reset

The date and time settings were reset to the default configuration following an AC power loss

Cause:

Date and time are reconfigured by the operating system.

Solution:

  • To set the system’s date and time manually, run:

    Copy
    Copied!
                

    # clock set <hh:mm:ss> [<yyyy/mm/dd>]

  • To verify the configured clock settings, run:

    Copy
    Copied!
                

    # show clock


  • It is recommended to enable server time synchronization with a Network Time Protocol (NTP) server. To do that, run:

    Copy
    Copied!
                

    > enable # config terminal (config)# ntp server <ntp server ip address>

  • To verify NTP time synchronization is enabled, run:

    Copy
    Copied!
                

    # show ntp


For full configuration instructions, please refer to the Onyx User Manual.

© Copyright 2023, NVIDIA. Last updated on Sep 3, 2023.