Troubleshooting Guide#

This comprehensive guide includes solutions for common NVDebug issues and errors, organized by category for easy reference.

Installation Issues#

Common Installation Problems:

Table 1 Installation Problems#

Problem

Cause

Solution

“Command not found”

NVDebug not in PATH

Add NVDebug to your PATH or reinstall

Permission denied

Insufficient privileges

Use sudo for system-wide installation

Python version error

Incompatible Python version

Install Python 3.12+ and ensure pip3 is used

Network connectivity issues

Firewall/network restrictions

Check firewall rules and network configuration

Installation Verification Commands:

# Check Python version
python3 --version

# Check pip installation
pip3 list | grep nvdebug

# Test network connectivity
ping <your_bmc_ip>

# Test SSH connectivity
ssh <username>@<host_ip>

Common Issues#

Authentication Problems

Table 2 Authentication Issues#

Error

Cause

Solution

“Authentication failed”

Wrong username/password

Verify credentials with system admin

“Permission denied”

Insufficient privileges

Use admin account or request elevated access

“Account locked”

Too many failed attempts

Wait 15 minutes or contact admin

“Invalid credentials”

Special characters in password

Escape special characters or use quotes

Network Connectivity Issues

Table 3 Network Issues#

Error

Cause

Solution

“Connection timeout”

Network unreachable

Check network connectivity and firewall

“Connection refused”

Service not running

Verify BMC is powered on and accessible

“Host unreachable”

Wrong IP address

Verify BMC IP address is correct

“SSL certificate error”

Certificate issues

Use –allow-insecure or update certificates

Platform-Specific Issues

Table 4 Platform Issues#

Error

Cause

Solution

“Platform not supported”

Wrong platform type

Check supported platforms list

“Platform detection failed”

BMC not responding

Verify BMC is accessible and responding

“Feature not available”

Platform limitation

Use compatible platform or contact support

“Could not auto-detect platform and/or baseboard”

BMC not responding or incompatible

Manually specify platform and baseboard with -t and -b flags

Connection Issues#

BMC Connection Failures

Symptoms:

  • “Unable to connect to BMC” error

  • Connection timeouts

  • SSH connection refused

Solutions:

# Test basic connectivity
ping 192.168.1.100

# Test BMC services
telnet 192.168.1.100 22
telnet 192.168.1.100 443

# Test with IPMI
ipmitool -I lanplus -H 192.168.1.100 -U admin -P password chassis status

# Use verbose mode for details
nvdebug -i 192.168.1.100 -u admin -p password -t "arm64" -v

Host Connection Errors:

Symptoms:

  • Host log collection fails

  • SSH timeout to host

  • Permission denied errors

Solutions:

# Test host connectivity
ping 192.168.1.50

# Test SSH access
ssh host_user@192.168.1.50 "echo 'SSH working'"

# Check sudo access
ssh host_user@192.168.1.50 "sudo -n true"

# Test with verbose output
nvdebug -i 192.168.1.100 -u admin -p password -t "arm64" \
        -I 192.168.1.50 -U host_user -H host_pass -v

Log Collection Issues#

Insufficient Space

Symptoms:

  • “No space left on device” error

  • Failed .zip creation

  • Incomplete log collection

Solutions:

# Check disk usage
df -h /path/to/output

# Skip zip creation
nvdebug -i 192.168.1.100 -u admin -p password -t "arm64" -z

# Use smaller zip split threshold
nvdebug -i 192.168.1.100 -u admin -p password -t "arm64" --zipsplit_threshold 100

# Specify alternate output location
nvdebug -i 192.168.1.100 -u admin -p password -t "arm64" -o /tmp/nvdebug_output

Timeout Issues

Symptoms:

  • Collection process hangs

  • Partial log collection

  • Timeout errors

Solutions:

# Increase timeout in config
<Collector Specific Timeout Parameter>: 60  # Default is 30 seconds

# Collect in smaller batches
nvdebug -i 192.168.1.100 -u admin -p password -t "arm64" -S R1 R2

# Use verbose mode to see progress
nvdebug -i 192.168.1.100 -u admin -p password -t "arm64" -v

Debug Mode:

Enable verbose logging for detailed troubleshooting:

nvdebug -i <BMC_IP> -u <USER> -p <PASS> -t "arm64" -v

This provides:

  • Detailed error messages

  • API call traces

  • Timing information

  • Collection progress

Preflight Checks:

Run preflight checks before full collection:

nvdebug -i <BMC_IP> -u <USER> -p <PASS> -t "arm64" --preflight

Collector-Specific Issues:

[WARNING] Collector R3 failed: Network timeout

Solutions: - Check specific collector requirements - Verify BMC capabilities - Use -e flag to override restrictions - Check error logs in /path/to/output/error_logs/

Advanced Troubleshooting Techniques:

Verbose Mode for Debugging:

# Enable verbose logging output to the console
nvdebug -i <BMC_IP> -u <USER> -p <PASS> -v

Preflight Checks:

# Run preflight checks only against Out-of-band interfaces
nvdebug -i <BMC_IP> -u <USER> -p <PASS> --preflight

# Run preflight checks only against In-band interfaces
nvdebug -I <HOST_IP> -U <USER> -H <PASS> --preflight

# Run preflight checks against all interfaces
nvdebug -i <BMC_IP> -u <USER> -p <PASS> -I <HOST_IP> -U <USER> -H <PASS> --preflight

# Run preflight checks against all interfaces with configuration files
nvdebug -c <CONFIG_FILE> -d <DUT_CONFIG_FILE> --preflight

Specific Log Collection:

# Collect specific collectors by CID (Specify as many collectors as needed, delimit with space)
nvdebug -i <BMC_IP> -u <USER> -p <PASS> -t "arm64" -S H5 R3 I2 S1

List Available Collectors:

# List all supported collectors
nvdebug -l

# List all supported collectors for a specific platform
nvdebug -l -t <Platform>

# List default collectors for platform
nvdebug -D <Platform>

Verbose Debugging:

# Enable verbose output for debugging
nvdebug -i 192.168.1.100 -u admin -p password123 -t "DGX" -v

SOS Report Issues (H20)

Symptoms:

  • SOS report collection fails with “non-existing plugin” errors.

  • SOS report shows “Error” status in Execution_Summary_Report.txt.

  • Missing plugins for NVIDIA hardware.

Solutions:

  • Ensure doca-sosreport 4.8+ is installed (not Ubuntu system sosreport).

  • Install from doca-host-repo (https://linux.mellanox.com/public/repo/doca/) for DOCA 2.10 compatibility.

  • Verify sos command availability: which sos

  • Check sos version: sos --version

  • Ensure NVIDIA networking hardware is detected before running sos report.

Help and Support#

Get Help:

# Show all options
nvdebug --help

# List supported collectors
nvdebug -l

# Show version
nvdebug --version

Pro Tips:

  • Use -v flag for detailed output during learning

  • Start with basic collection before customizing

  • Check collection summary for detailed results

  • Use preflight checks for troubleshooting: –preflight

  • Keep your NVDebug installation up to date for latest features and bug fixes

Support Information#

When reporting issues, include:

  • The NVDebug log archive.

  • If no archive is created, please provide any files generated and the stdout from the console.
    • Error logs from /path/to/output/error_logs/

    • Collection summary from Execution_Summary_Report.txt

    • Relevant BMC/Host logs

    • HTML reports from /path/to/output/reports/

  • System information:
    • Platform type

    • OS version

    • Network configuration