The OOB Collector Shell Script#
If remote access to the BMC cannot be provided to the nvdebug tool, you can collect logs directly from the HGX baseboard by running the nvdebug_oob_collection_bmc_shell_v1.6.sh
script. To do this, transfer the script to the BMC and run it locally.
Optional CLI Arguments:
-l <log level>
: Specifies the log level (L0, L1, L2, L3).Default: L1
-c <category>
: Specifies the log category when using L0. Multiple categories can be specified using commas (for example,-c RedfishGetInventory,SMBPBIOperations
).-i <hmc ip[:port]>
: The HMC IP address and optional port number.Default IP: 192.168.31.1
The port number is optional. If the value is not specified, the default HTTP port (80) will be used. Example with port: 192.168.31.1:8080
-b <i2c1 bus>
: The I2C bus number for I2C1 communication.Default: 11
-s <i2c2 bus>
: The I2C bus number for I2C2 communication.Default: 12
-o <output dir>
: The output directory for storing collected logs.Default: /tmp/output
-y
: Auto confirmation (no interactive prompts).--help
: Display help message.--list-collectors
: List all available collectors and groups.
Prerequisites#
Before running the script, ensure the following:
The HMC is up and running.
The output directory on the BMC is writable.
At least 150MB of free disk space is available.
For SMBPBI operations, sensor polling must be disabled, and SMBPBI fencing must be relinquished to HMC.
HMC Communication#
The script communicates with the HMC using the following configuration:
Protocol: HTTP is used by default for local communication.
Authentication: No authentication is required for local HMC access.
SSL/TLS: SSL verification is disabled for local communication.
Port Configuration: Default HTTP port (80) is used if not specified in the HMC IP argument.
Log Levels#
nvdebug_oob_collection_bmc_shell_v1.6.sh
supports the following log levels:
L0 (Manual): Requires specifying a log category with
-c
. Allows the selective collection of specific log types. When using L0, multiple categories can be specified using commas (for example,-c RedfishGetInventory,SMBPBIOperations
).L1 (Default): Collects basic system logs including hardware information, firmware versions, health status, telemetry, and basic sensor data.
L2: Includes all L1 logs, debug dumps, and detailed system information.
L3: Includes all L2 logs, comprehensive debug data, and verbose logging.
Log Collection Categories#
When using L0, you must specify a log category. Supported categories include:
Redfish Categories:#
RedfishGetInventory: Collects hardware inventory information.
RedfishGetHealth: Gathers system health status.
RedfishGetTelemetry: Collects telemetry and metrics data.
RedfishGetSystemLogs: Retrieves system event logs.
RedfishGetDebugDumps: Collects various debug dumps (FPGA, EROT, and so on).
RedfishGetHMCStatus: Gathers HMC-specific status information.
SMBus Categories:#
SMBusOperations: Collects temperature, power, and staleness data using SMBus.
SMBPBI Categories:#
SMBPBIOperations: Gathers the following data from the Baseboard Management Processor Interface: - Fencing status - Firmware versions - Hardware status - PCIe link status and error counts - Temperature telemetry
I2C Categories:#
I2CHMCBootProgress: Collects HMC boot progress information.
I2CFPGARegTable: Retrieves FPGA register data using I2C.
Virtual EEPROM Categories:#
VirtualEEPROMOperations: Retrieves data from Virtual EEPROM including: - Temperature readings - Power consumption - Energy usage - Voltage levels - System status
Note
Virtual EEPROM operations are supported only on GB200 and newer platforms. The collection will run on older platforms, but the logs will show that the attempted operations might not contain meaningful data.
Note
Use the
--list-collectors
option to see a complete list of available collectors and their descriptions.