Advanced Usage Guide#
Learn advanced NVDebug techniques for complex environments and automation.
Execution Modes#
Remote Mode (Default):
Run NVDebug from a remote machine that has access to both the BMC and the Host system.
# Basic remote mode
nvdebug -i <BMC_IP> -u <BMC_USER> -p <BMC_PASS> -t <PLATFORM>
# Remote mode with host access
nvdebug -i <BMC_IP> -u <BMC_USER> -p <BMC_PASS> -t <PLATFORM> \
-I <HOST_IP> -U <HOST_USER> -H <HOST_PASS>
Local Mode:
Run NVDebug directly on the Host machine, access to the BMC is optional.
# Local mode with BMC connection
nvdebug -i <BMC_IP> -u <BMC_USER> -p <BMC_PASS> -t <PLATFORM> --local
# Local mode without BMC connection (host logs only)
nvdebug -t <PLATFORM> --local
# Local mode with additional options
nvdebug -i <BMC_IP> -u <BMC_USER> -p <BMC_PASS> -t <PLATFORM> \
--local -v -o /path/to/output -c
Note
Local Mode Benefits:
Eliminates the need for remote access configuration
Useful for troubleshooting issues on the Host system itself
BMC credentials are still required for BMC-related logs
All other options (like -o, -v, -c) remain available
Platform Detection#
Automatic Detection:
NVDebug can automatically detect platform and baseboard types in most cases:
# Automatic detection (interactive)
nvdebug -i <BMC_IP> -u <USER> -p <PASS>
# Non-interactive mode
nvdebug -i <BMC_IP> -u <USER> -p <PASS> --non-interactive
Platform Validation: Check supported platforms and baseboards:
# List supported collectors for platform
nvdebug -t "arm64" -l
# List default collectors for platform
nvdebug -t "arm64" -D
# List collectors by group
nvdebug -t "arm64" -l Redfish
Manual Specification: You can manually specify platform and baseboard types:
# Specify platform only
nvdebug -i <BMC_IP> -u <USER> -p <PASS> -t "arm64"
# Specify platform and baseboard
nvdebug -i <BMC_IP> -u <USER> -p <PASS> -t "HGX-HMC" -b "Hopper-HGX-8-GPU"
Platform Overrides#
When working with different platforms, you may need to run specific collectors that aren’t enabled by default for your platform type. The override functionality allows you to execute any collector regardless of platform restrictions.
# For a DGX platform, running specific Redfish collectors
nvdebug --config <config.yaml> --dutconfig <dut_config.yaml> --cids R1 R2 R3 R4 R5 R8
# By default, only R8 would run.
# However, if you include the --override_platform flag, all specified collector IDs will be executed:
nvdebug --config <config.yaml> --dutconfig <dut_config.yaml> --cids R1 R2 R3 R4 R5 R8 --override_platform
Note
Override Guidelines: - Use the -e/–override_platform flag to bypass platform restrictions - Specify collectors using -S/–cids followed by collector IDs - You can combine multiple collectors from different groups (Redfish, IPMI, SSH, Host) - Be cautious when overriding as some collectors may not be compatible with all platforms
Log Archiving#
NVDebug provides several options for controlling archive creation and management:
Skip creating the final zip archive
Control zip archive splitting
Configure archive size threshold
By default, if archive exceeds 200MB, it will be split into 200 MB chunks to be recombined later.
# Skip creating the final zip archive
nvdebug -i <ip> -u <username> -p <pass> -t <platform> -z/--skipzip
# Skip splitting large archives
nvdebug -i <ip> -u <username> -p <pass> -t <platform> -Z/--skipzipsplit
# Set custom archive split threshold (in MB)
nvdebug -i <ip> -u <username> -p <pass> -t <platform> --zipsplit_threshold 500
Note
The -z/–skipzip and -Z/–skipzipsplit options are mutually exclusive
The -Z/–skipzipsplit option is ignored if -z/–skipzip is also specified
Passwordless SSH#
NVDebug enables users to establish passwordless SSH connections for seamless interaction with DUTs, including the Host OS and BMC.
Setup Steps:
Generate SSH Key Pair:
ssh-keygen -t rsa -b 4096 -f ~/.ssh/nvdebug_key
Copy Public Key to DUTs:
ssh-copy-id -i ~/.ssh/nvdebug_key.pub <DUT_HOST_USERNAME>@<DUT_HOST_IP> ssh-copy-id -i ~/.ssh/nvdebug_key.pub <DUT_BMC_USERNAME>@<DUT_BMC_IP>
Configure SSH on DUTs: Add to /etc/ssh/sshd_config:
PubkeyAuthentication yes AuthorizedKeysFile .ssh/authorized_keys
Enable Passwordless Sudo: Add to /etc/sudoers via sudo visudo:
# System Information Collection <DUT_HOST_USERNAME> ALL=(ALL) NOPASSWD: /usr/bin/dmesg <DUT_HOST_USERNAME> ALL=(ALL) NOPASSWD: /usr/bin/lspci <DUT_HOST_USERNAME> ALL=(ALL) NOPASSWD: /usr/sbin/dmidecode <DUT_HOST_USERNAME> ALL=(ALL) NOPASSWD: /usr/bin/lshw <DUT_HOST_USERNAME> ALL=(ALL) NOPASSWD: /usr/bin/nvidia-smi <DUT_HOST_USERNAME> ALL=(ALL) NOPASSWD: /usr/bin/journalctl <DUT_HOST_USERNAME> ALL=(ALL) NOPASSWD: /usr/sbin/nvme <DUT_HOST_USERNAME> ALL=(ALL) NOPASSWD: /usr/bin/nvidia-bug-report.sh
Enable in Configuration: In dut_config.yaml:
HOST_SSH_PASSWORDLESS: true
Warning
Security Considerations:
Use absolute paths in sudoers configuration
Never use NOPASSWD: ALL
Restrict file operations to specific directories
Monitor sudo logs for unauthorized activities
Keep systems and NVDebug tool updated
The Parse Option#
The –parse option takes an NVDebug log dump as input and creates a new dump with binary files decoded into plain-text and archive files extracted into subdirectories.
Supported Binary Decoding:
IRoT/ERoT Dumps (Collector R6)
FPGA Register Dumps (Collector R5)
# Auto-parse collected logs
nvdebug --parse <Path to the nvdebug zip archive>
# Auto-parse collected logs with baseboard specification for FPGA dumps
NVDebug --parse <Path to the nvdebug zip archive> -b <Baseboard>
# Parse specific FPGA Log Files
nvdebug --parse /path/to/log/file --fpga-log-files <Path to the FPGA log files> --page <Page number>
Note
The –parse option is used to auto-parse collected logs.
The –fpga-parse option is used to parse specific FPGA log files manually.
The –page option is used to specify the page number of the FPGA log files to parse manually.
If you do not specify the baseboard, the tool will prompt you to select the baseboard from a list of supported baseboards.
Not all baseboards are supported for FPGA parsing. Please refer to the Supported Baseboards page for the list of supported baseboards.
Output Structure:
After parsing, you’ll find:
Decoded FPGA register dumps in Page*.txt files
Extracted ERoT/IRoT dumps in individual component files
Original archive with _decoded suffix
Tip
Pro Tip: Use verbose mode (-v) for detailed debugging and preflight checks (–preflight) before full collections.