System Monitoring

NVIDIA MetroX-3 XC TQ8400 Long-haul 1U Appliance User Manual

Front Panel Monitoring Interfaces

Index

Indicator or Button

Icon

Description

1

Power button

power-button-icon.png

Indicates if the system is powered on or off. Press the power button to manually power on or off the system.

Warning

Press the power button to shut down the ACPI-compliant operating system.

For a graceful shutdown of the system, use the relevant CLI command. To force a shutdown of the appliance, hold the button down until the appliance turns off. The LED of the button displays the system’s power status.

Index

Indicator or Button

Icon

Description

1

System health and system ID indicator

system-health-and-system-id-indicator-icon.png

Indicates the system health

2

System status LEDs

See System Status LEDs below

Indicates the status of the system

System Status LEDs

The system status indicators are located on the front left-side panel.

Icon

Description

Condition

Corrective Action

drive-indicator.png

Drive indicator

The indicator turns solid amber if there is a drive error

  • Check the System Event Log to determine if the drive has an error.

  • Run the appropriate Online Diagnostics test. Restart the system and run embedded diagnostics (ePSA).

  • If the drives are configured in a RAID array, restart the system, and enter the host adapter configuration utility program.

temperature-indicator.png

Temperature indicator

The indicator turns solid amber if the system experiences a thermal error (for example, the ambient temperature is out of range or there is a fan failure)

Ensure that none of the following conditions exist:

  • A cooling fan has been removed or has failed.

  • System cover, air shrouds, or back filler bracket has been removed.

  • Ambient temperature is too high.

  • External airflow is obstructed.

electrical-indicator.png

Electrical indicator

The indicator turns solid amber if the system experiences an electrical error (for example, voltage out of range, or a failed power supply unit (PSU) or voltage regulator)

Check the System Event Log or system messages for the specific issue. If it is due to a problem with the PSU, check the LED on the PSU. Reseat the PSU.

memory-indicator.png

Memory indicator

The indicator turns solid amber if a memory error occurs

Check the System Event Log or system messages for the location of the failed memory. Reseat the memory module.

pcie-indicator.png

PCIe indicator

The indicator turns solid amber if a PCIe card experiences an error

Restart the system. Update any required drivers for the PCIe card. Reinstall the card.

System Health and System ID Indicator Codes

The system health and system ID indicator is located on the left control panel of the system.

System Health and System ID Indicator Code

Condition

Solid blue

Indicates that the system is powered on and healthy, and that system ID mode is not active. Press the system health and system ID button (

system-id-button.png

) to switch to system ID mode.

Blinking blue

Indicates that the system ID mode is active. Press the system health and system ID button to switch to system health mode.

Solid amber

Indicates that the system is in fail-safe mode.

Blinking amber

Indicates that the system is experiencing a fault. Check the System Event Log for specific error messages.

The LEDs on the drive carrier indicate the state of each drive. Each drive carrier has two LEDs: an activity LED (green) and a status LED (bicolor, green/amber). The activity LED blinks whenever the drive is accessed.

SSD Indicators

image2023-1-19_14-40-4.png

Index

Description

1

Drive status LED indicator

2

Drive activity LED indicator

The following table lists the drive indicator codes:

Drive Status Indicator Code

Condition

Blinks green twice per second

Indicates that the drive is being identified or preparing for removal

Off

Indicates that the drive is ready for removal

Warning

The drive status indicator remains off until all drives are initialized after the system is powered on. Drives are not ready for removal during this time.

Blinks green, amber, and then powers off

Indicates that there is an unexpected drive failure

Blinks amber four times per second

Indicates that the drive has failed

Blinks green slowly

Indicates that the drive is rebuilding

Solid green

Indicates that the drive is online

Blinks green for three seconds, amber for three seconds, and then powers off after six seconds

Indicates that the rebuild has stopped

RJ-45 Remote Management Port

The remote management port is designed for secure local and remote server management and helps IT administrators deploy, update, and monitor the NVIDIA® MetroX-3 XC Appliance.

image2023-1-4_13-21-7.png

RJ-45 Management Ports eth0-eth1

These four RJ-45 ports are found on the rear side of the appliance. The eth0-eth1 and remote management interfaces are pre-configured as DHCP and the initial host name is MetroX3xc-1 (the MAC address appears on the pull-tab label), so their IP addresses can be obtained from the DHCP server. If no DHCP server is available, you have to use a serial cable to connect and configure eth0 and remote-management IP addresses with a static IP address.

image2023-4-13_10-59-40.png

Warning

Configuring the appliance via the serial port is required only in the case where out-of-the-box DHCP configuration for eth0 cannot be used. (There is no DHCP server in the management network). The user is then required to use the serial port to configure a static IP on eth0.

Warning

NIC#1 Ethernet connector gets connected to Ethernet switches. This switch must be configured to 100M/1G auto-negotiation.


ConnectX-7 OSFP Ports

These 2 OSFP ports are found on the rear side of the appliance. They should be connected to an IB switch in the fabric. It is recommended to connect to two different switches for redundancy. The appliance can be connected only to a single IB fabric.

RJ-45 Ethernet Connector for Remote Management

The appliance has several Ethernet management interfaces. The primary management interface is eth0. An additional interface exists, for connecting to a remote management controller (it usually connects to the same management network as eth0).

To use out-of-the-box DHCP settings, the default hostname for the appliance (over eth0) is "MetroX3xc-1". The MAC address for eth0 is available on the pull-tab and can be configured in the DHCP server.

To use the remote management controller with DHCP, the free-range IP allocation must be enabled on the DHCP server. A static IP address for remote management interface can be configured via the CLI (chassis remote-management ip command).

Warning

Configuration via a serial port is only required if you want to use a static IP address and not the out-of-the-box DHCP setting for eth0. Otherwise, an IP is assigned by the DHCP server, and you can log into the CLI over LAN.

Warning

NIC#1 Ethernet connector gets connected to Ethernet switches. This switch must be configured to 100M/1G auto-negotiation.


USB Interface

There are two USB connectors. These connectors can be used to install software and/or firmware upgrades using a memory device that has a USB connector. This connector is USB 2.0 compliant. Various upload/download operations are also supported through the USB using the CLI.

image2023-1-4_13-28-45.png


PSU Status Indicators

Index

Description

1

AC PSU handle

2

Socket

3

Release latch

Each power supply (PS) unit has a one built-in fan and a single two-color LED on the right side of the PS unit that indicates the internal status of the unit.

The following table presents the AC PSU status indicator codes:

Power Indicator Codes

Condition

Green

Indicates that a valid power source is connected to the PSU and the PSU is operational

Blinking amber

Indicates an issue with the PSU

Not powered on

Indicates that the power is not connected to the PSU

Blinking green

Indicates that the firmware of the PSU is being updated

Important

Do not disconnect the power cord or unplug the PSU when updating firmware. If firmware update is interrupted, the PSUs will not function.

Blinking green and powers off

When hot-plugging a PSU, it blinks green five times at a rate of 4 Hz and powers off. This indicates a PSU mismatch due to efficiency, feature set, health status, or supported voltage.

Important

If two PSUs are used, they must be of the same type and have the same maximum output power.

Important

When correcting a PSU mismatch, replace the PSU with the blinking indicator. Swapping the PSU to make a matched pair can result in an error condition and an unexpected system shutdown. To change from a high output configuration to a low output configuration or vice versa, you must power off the system.

Important

When two identical PSUs receive different input voltages, they can output different wattage, and trigger a mismatch.

The following table presents the D C PSU status indicator codes:

Power Indicator Codes

Condition

Green

Indicates that a valid power source is connected to the PSU, and the PSU is operational

Blinking amber

Indicates an issue with the PSU

Not powered on

Indicates that the power is not connected to the PSU

Blinking green

When hot-plugging a PSU, it blinks green five times at a rate of 4 Hz and powers off. This indicates a PSU mismatch due to efficiency, feature set, health status, or supported voltage.

NIC Activity LED Indicators

Each NIC on the back of the system has indicators that provide information about the activity and link status. The activity LED indicator indicates if data is flowing through the NIC, and the link LED indicator indicates the speed of the connected network.

NIC Activity LEDs

nic-activity.png

Index

Description

1

Link LED indicator

2

Activity LED indicator

The following table lists the drive indicator codes:

NIC Indicator Code

Condition

Link and activity indicators are off

Indicates that the NIC is not connected to the network

Link indicator is green, and activity indicator is blinking green

Indicates that the NIC is connected to a valid network at its maximum port speed, and data is being sent or received

Link indicator is amber, and activity indicator is blinking green

Indicates that the NIC is connected to a valid network at less than its maximum port speed, and data is being sent or received

Link indicator is green, and activity indicator is off

Indicates that the NIC is connected to a valid network at its maximum port speed, and data is not being sent or received

Link indicator is amber, and activity indicator is off

Indicates that the NIC is connected to a valid network at less than its maximum port speed, and data is not being sent or received

Link indicator is blinking green, and activity is off

Indicates that the NIC identity is enabled through the NIC configuration utility

The appliance comes with a single air flow pattern; a front (hard-drive) side to back (power-supply) side.

© Copyright 2023, NVIDIA. Last updated on Sep 3, 2023.