Monitoring System Hardware
You can monitor system hardware with the following commands and utilities:
decode-syseeprom command to retrieve information about the switch EEPROM. If the EEPROM is writable, you can set values on the EEPROM.
The following is example
decode-syseeprom command output. The output is different on different switches:
cumulus@switch:~$ decode-syseeprom TlvInfo Header: Id String: TlvInfo Version: 1 Total Length: 629 TLV Name Code Len Value -------------------- ---- --- ----- Product Name 0x21 64 MSN3700C Part Number 0x22 20 MSN3700-CSBFO Serial Number 0x23 24 MT2043X05294 Base MAC Address 0x24 6 1C:34:DA:24:C9:00 Manufacture Date 0x25 19 10/21/2020 20:57:29 Device Version 0x26 1 1 MAC Addresses 0x2A 2 254 Manufacturer 0x2B 8 Mellanox Vendor Extension 0xFD 52 0x00 0x00 0x81 0x19 0x00 0x2E 0x00 0x02 0x07 0x98 0x00 0x00 0x31 0x00 0x20 0x00 0x00 0x00 0x00 0x00 0x07 0x07 0x07 0x07 0x07 0x07 0x07 0x07 0x07 0x07 0x07 0x07 0x07 0x07 0x07 0x07 0x07 0x07 0x07 0x07 0x07 0x07 0x07 0x07 0x07 0x07 0x07 0x07 0x07 0x07 0x07 0x07 Platform Name 0x28 64 x86_64-mlnx_msn3700C-r0 ONIE Version 0x29 23 2019.11-5.2.0020-115200 CRC-32 0xFE 4 0x11D0954D (checksum valid)
decode-syseeprom command includes the following options:
|Displays the help message and exits.|
|Prints the base MAC address for switch interfaces.|
|Prints the number of MAC addresses allocated for the switch interfaces.|
|Sets the EEPROM content (if the EEPROM is writable). You can provide arguments in the command line in a comma separated list in the form |
NVIDIA Spectrum switches do not support this option.
|Displays JSON output.|
|Prints the target EEPROM information (board, psu2, psu1).|
|Prints the device serial number.|
|Prints the base MAC address for the management interfaces.|
|Clears and initializes the board EEPROM cache.|
dmidecode command to retrieve hardware configuration information populated in the BIOS.
apt-get to install the
lshw program on the switch, which also retrieves hardware configuration information.
smond daemon monitors system units like power supply and fan, updates the corresponding LEDs, and logs the change in state. The
cpld registers detect changes in system unit state.
smond utilizes these registers to read all sources, which determines the health of the unit and updates the system LEDs.
sudo smonctl command to display sensor information for the various system units:
cumulus@switch:~$ sudo smonctl Board : OK Fan : OK PSU1 : OK PSU2 : BAD Temp1 (Networking ASIC Die Temp Sensor ): OK Temp10 (Right side of the board ): OK Temp2 (Near the CPU (Right) ): OK Temp3 (Top right corner ): OK Temp4 (Right side of Networking ASIC ): OK Temp5 (Middle of the board ): OK Temp6 (P2020 CPU die sensor ): OK Temp7 (Left side of the board ): OK Temp8 (Left side of the board ): OK Temp9 (Right side of the board ): OK
When the switch is not powered on,
smonctl shows the PSU status as BAD instead of POWERED OFF or NOT DETECTED. This is a known limitation.
smonctl command includes the following options:
|Displays data for the specified sensor.|
|Displays detailed hardware sensors data.|
For more information, read
man smond and
sensors command to monitor the health of your switch hardware, such as power, temperature and fan speeds. This command executes
Even though you can use the
sensors command to monitor the health of your switch hardware, the
smond daemon is the recommended method for monitoring hardware health. See smond Daemon
cumulus@switch:~$ sensors tmp75-i2c-6-48 Adapter: i2c-1-mux (chan_id 0) temp1: +39.0 C (high = +75.0 C, hyst = +25.0 C) tmp75-i2c-6-49 Adapter: i2c-1-mux (chan_id 0) temp1: +35.5 C (high = +75.0 C, hyst = +25.0 C) ltc4215-i2c-7-40 Adapter: i2c-1-mux (chan_id 1) in1: +11.87 V in2: +11.98 V power1: 12.98 W curr1: +1.09 A max6651-i2c-8-48 Adapter: i2c-1-mux (chan_id 2) fan1: 13320 RPM (div = 1) fan2: 13560 RPM
- Output from the
sensorscommand varies depending upon the switch.
- If you only plug in one PSU, the fan is at maximum speed.
The following table shows the
sensors command options.
|Specify a configuration file; use |
|Execute set statements in the configuration file (root only); |
|Show temperatures in degrees Fahrenheit.|
|Do not show the adapter for each chip.|
Generate bus statements for
|Generate raw output.|
|Generate json output.|
|Show the program version.|
Cumulus Linux includes a simplified version of the
wd_keepalive(8) daemon instead of the one in the standard
watchdog Debian package.
wd_keepalive writes to a file called
/dev/watchdog periodically (at least one time per minute) to prevent the switch from resetting. Each write delays the reboot time by another minute. After one minute of inactivity, where
wd_keepalive does not write to
/dev/watchdog, the switch resets itself.
Cumulus Linux enables the watchdog by default, which starts when you boot the switch (before
To disable the watchdog, disable and stop the
cumulus@switch:~$ sudo systemctl disable wd_keepalive ; systemctl stop wd_keepalive
You can modify the settings for the watchdog, such as the timeout and the scheduler priority, in the
/etc/watchdog.conf configuration file.