Advanced Usage Guide#

Learn advanced NVDebug techniques for complex environments and automation.

Execution Modes#

Remote Mode (Default):

Run NVDebug from a remote machine that has access to both the BMC and the Host system.

# Basic remote mode
nvdebug -i <BMC_IP> -u <BMC_USER> -p <BMC_PASS> -t <PLATFORM>

# Remote mode with host access
nvdebug -i <BMC_IP> -u <BMC_USER> -p <BMC_PASS> -t <PLATFORM> \
        -I <HOST_IP> -U <HOST_USER> -H <HOST_PASS>

Local Mode:

Run NVDebug directly on the Host machine, access to the BMC is optional.

# Local mode with BMC connection
nvdebug -i <BMC_IP> -u <BMC_USER> -p <BMC_PASS> -t <PLATFORM> --local

# Local mode without BMC connection (host logs only)
nvdebug -t <PLATFORM> --local

# Local mode with additional options
nvdebug -i <BMC_IP> -u <BMC_USER> -p <BMC_PASS> -t <PLATFORM> \
        --local -v -o /path/to/output -c

Note

Local Mode Benefits:

  • Eliminates the need for remote access configuration

  • Useful for troubleshooting issues on the Host system itself

  • BMC credentials are still required for BMC-related logs

  • All other options (like -o, -v, -c) remain available

Platform Detection#

Automatic Detection:

NVDebug can automatically detect platform and baseboard types in most cases:

# Automatic detection (interactive)
nvdebug -i <BMC_IP> -u <USER> -p <PASS>

# Non-interactive mode
nvdebug -i <BMC_IP> -u <USER> -p <PASS> --non-interactive

Platform Validation: Check supported platforms and baseboards:

# List supported collectors for platform
nvdebug -t "arm64" -l

# List default collectors for platform
nvdebug -t "arm64" -D

# List collectors by group
nvdebug -t "arm64" -l Redfish

Manual Specification: You can manually specify platform and baseboard types:

# Specify platform only
nvdebug -i <BMC_IP> -u <USER> -p <PASS> -t "arm64"

# Specify platform and baseboard
nvdebug -i <BMC_IP> -u <USER> -p <PASS> -t "HGX-HMC" -b "Hopper-HGX-8-GPU"

Platform Overrides#

When working with different platforms, you may need to run specific collectors that aren’t enabled by default for your platform type. The override functionality allows you to execute any collector regardless of platform restrictions.

# For a DGX platform, running specific Redfish collectors
nvdebug --config <config.yaml> --dutconfig <dut_config.yaml> --cids R1 R2 R3 R4 R5 R8

# By default, only R8 would run.
# However, if you include the --override_platform flag, all specified collector IDs will be executed:
nvdebug --config <config.yaml> --dutconfig <dut_config.yaml> --cids R1 R2 R3 R4 R5 R8 --override_platform

Note

Override Guidelines: - Use the -e/–override_platform flag to bypass platform restrictions - Specify collectors using -S/–cids followed by collector IDs - You can combine multiple collectors from different groups (Redfish, IPMI, SSH, Host) - Be cautious when overriding as some collectors may not be compatible with all platforms

Log Archiving#

NVDebug provides several options for controlling archive creation and management:

  • Skip creating the final zip archive

  • Control zip archive splitting

  • Configure archive size threshold

By default, if archive exceeds 200MB, it will be split into 200 MB chunks to be recombined later.

# Skip creating the final zip archive
nvdebug -i <ip> -u <username> -p <pass> -t <platform> -z/--skipzip

# Skip splitting large archives
nvdebug -i <ip> -u <username> -p <pass> -t <platform> -Z/--skipzipsplit

# Set custom archive split threshold (in MB)
nvdebug -i <ip> -u <username> -p <pass> -t <platform> --zipsplit_threshold 500

Note

  • The -z/–skipzip and -Z/–skipzipsplit options are mutually exclusive

  • The -Z/–skipzipsplit option is ignored if -z/–skipzip is also specified

Passwordless SSH#

NVDebug enables users to establish passwordless SSH connections for seamless interaction with DUTs, including the Host OS and BMC.

Setup Steps:

  1. Generate SSH Key Pair:

    ssh-keygen -t rsa -b 4096 -f ~/.ssh/nvdebug_key
    
  2. Copy Public Key to DUTs:

    ssh-copy-id -i ~/.ssh/nvdebug_key.pub <DUT_HOST_USERNAME>@<DUT_HOST_IP>
    ssh-copy-id -i ~/.ssh/nvdebug_key.pub <DUT_BMC_USERNAME>@<DUT_BMC_IP>
    
  3. Configure SSH on DUTs: Add to /etc/ssh/sshd_config:

    PubkeyAuthentication yes
    AuthorizedKeysFile .ssh/authorized_keys
    
  4. Enable Passwordless Sudo: Add to /etc/sudoers via sudo visudo:

    # System Information Collection
    <DUT_HOST_USERNAME> ALL=(ALL) NOPASSWD: /usr/bin/dmesg
    <DUT_HOST_USERNAME> ALL=(ALL) NOPASSWD: /usr/bin/lspci
    <DUT_HOST_USERNAME> ALL=(ALL) NOPASSWD: /usr/sbin/dmidecode
    <DUT_HOST_USERNAME> ALL=(ALL) NOPASSWD: /usr/bin/lshw
    <DUT_HOST_USERNAME> ALL=(ALL) NOPASSWD: /usr/bin/nvidia-smi
    <DUT_HOST_USERNAME> ALL=(ALL) NOPASSWD: /usr/bin/journalctl
    <DUT_HOST_USERNAME> ALL=(ALL) NOPASSWD: /usr/sbin/nvme
    <DUT_HOST_USERNAME> ALL=(ALL) NOPASSWD: /usr/bin/nvidia-bug-report.sh
    
  5. Enable in Configuration: In dut_config.yaml:

    HOST_SSH_PASSWORDLESS: true
    

Warning

Security Considerations:

  • Use absolute paths in sudoers configuration

  • Never use NOPASSWD: ALL

  • Restrict file operations to specific directories

  • Monitor sudo logs for unauthorized activities

  • Keep systems and NVDebug tool updated

The Parse Option#

The –parse option takes an NVDebug log dump as input and creates a new dump with binary files decoded into plain-text and archive files extracted into subdirectories.

Supported Binary Decoding:

  • IRoT/ERoT Dumps (Collector R6)

  • FPGA Register Dumps (Collector R5)

# Auto-parse collected logs
nvdebug --parse <Path to the nvdebug zip archive>

# Auto-parse collected logs with baseboard specification for FPGA dumps
NVDebug --parse <Path to the nvdebug zip archive> -b <Baseboard>

# Parse specific FPGA Log Files
nvdebug --parse /path/to/log/file --fpga-log-files <Path to the FPGA log files> --page <Page number>

Note

  • The –parse option is used to auto-parse collected logs.

  • The –fpga-parse option is used to parse specific FPGA log files manually.

  • The –page option is used to specify the page number of the FPGA log files to parse manually.

  • If you do not specify the baseboard, the tool will prompt you to select the baseboard from a list of supported baseboards.

  • Not all baseboards are supported for FPGA parsing. Please refer to the Supported Baseboards page for the list of supported baseboards.

Output Structure:

After parsing, you’ll find:

  • Decoded FPGA register dumps in Page*.txt files

  • Extracted ERoT/IRoT dumps in individual component files

  • Original archive with _decoded suffix

Tip

Pro Tip: Use verbose mode (-v) for detailed debugging and preflight checks (–preflight) before full collections.