Troubleshooting Guide#
This comprehensive guide includes solutions for common NVDebug issues and errors, organized by category for easy reference.
Installation Issues#
Common Installation Problems:
Problem |
Cause |
Solution |
---|---|---|
“Command not found” |
NVDebug not in PATH |
Add NVDebug to your PATH or reinstall |
Permission denied |
Insufficient privileges |
Use sudo for system-wide installation |
Python version error |
Incompatible Python version |
Install Python 3.12+ and ensure pip3 is used |
Network connectivity issues |
Firewall/network restrictions |
Check firewall rules and network configuration |
Installation Verification Commands:
# Check Python version
python3 --version
# Check pip installation
pip3 list | grep nvdebug
# Test network connectivity
ping <your_bmc_ip>
# Test SSH connectivity
ssh <username>@<host_ip>
Common Issues#
Authentication Problems
Error |
Cause |
Solution |
---|---|---|
“Authentication failed” |
Wrong username/password |
Verify credentials with system admin |
“Permission denied” |
Insufficient privileges |
Use admin account or request elevated access |
“Account locked” |
Too many failed attempts |
Wait 15 minutes or contact admin |
“Invalid credentials” |
Special characters in password |
Escape special characters or use quotes |
Network Connectivity Issues
Error |
Cause |
Solution |
---|---|---|
“Connection timeout” |
Network unreachable |
Check network connectivity and firewall |
“Connection refused” |
Service not running |
Verify BMC is powered on and accessible |
“Host unreachable” |
Wrong IP address |
Verify BMC IP address is correct |
“SSL certificate error” |
Certificate issues |
Use –allow-insecure or update certificates |
Platform-Specific Issues
Error |
Cause |
Solution |
---|---|---|
“Platform not supported” |
Wrong platform type |
Check supported platforms list |
“Platform detection failed” |
BMC not responding |
Verify BMC is accessible and responding |
“Feature not available” |
Platform limitation |
Use compatible platform or contact support |
“Could not auto-detect platform and/or baseboard” |
BMC not responding or incompatible |
Manually specify platform and baseboard with -t and -b flags |
Connection Issues#
BMC Connection Failures
Symptoms:
“Unable to connect to BMC” error
Connection timeouts
SSH connection refused
Solutions:
# Test basic connectivity
ping 192.168.1.100
# Test BMC services
telnet 192.168.1.100 22
telnet 192.168.1.100 443
# Test with IPMI
ipmitool -I lanplus -H 192.168.1.100 -U admin -P password chassis status
# Use verbose mode for details
nvdebug -i 192.168.1.100 -u admin -p password -t "arm64" -v
Host Connection Errors:
Symptoms:
Host log collection fails
SSH timeout to host
Permission denied errors
Solutions:
# Test host connectivity
ping 192.168.1.50
# Test SSH access
ssh host_user@192.168.1.50 "echo 'SSH working'"
# Check sudo access
ssh host_user@192.168.1.50 "sudo -n true"
# Test with verbose output
nvdebug -i 192.168.1.100 -u admin -p password -t "arm64" \
-I 192.168.1.50 -U host_user -H host_pass -v
Log Collection Issues#
Insufficient Space
Symptoms:
“No space left on device” error
Failed .zip creation
Incomplete log collection
Solutions:
# Check disk usage
df -h /path/to/output
# Skip zip creation
nvdebug -i 192.168.1.100 -u admin -p password -t "arm64" -z
# Use smaller zip split threshold
nvdebug -i 192.168.1.100 -u admin -p password -t "arm64" --zipsplit_threshold 100
# Specify alternate output location
nvdebug -i 192.168.1.100 -u admin -p password -t "arm64" -o /tmp/nvdebug_output
Timeout Issues
Symptoms:
Collection process hangs
Partial log collection
Timeout errors
Solutions:
# Increase timeout in config
<Collector Specific Timeout Parameter>: 60 # Default is 30 seconds
# Collect in smaller batches
nvdebug -i 192.168.1.100 -u admin -p password -t "arm64" -S R1 R2
# Use verbose mode to see progress
nvdebug -i 192.168.1.100 -u admin -p password -t "arm64" -v
Debug Mode:
Enable verbose logging for detailed troubleshooting:
nvdebug -i <BMC_IP> -u <USER> -p <PASS> -t "arm64" -v
This provides:
Detailed error messages
API call traces
Timing information
Collection progress
Preflight Checks:
Run preflight checks before full collection:
nvdebug -i <BMC_IP> -u <USER> -p <PASS> -t "arm64" --preflight
Collector-Specific Issues:
[WARNING] Collector R3 failed: Network timeout
Solutions: - Check specific collector requirements - Verify BMC capabilities - Use -e flag to override restrictions - Check error logs in /path/to/output/error_logs/
Advanced Troubleshooting Techniques:
Verbose Mode for Debugging:
# Enable verbose logging output to the console
nvdebug -i <BMC_IP> -u <USER> -p <PASS> -v
Preflight Checks:
# Run preflight checks only against Out-of-band interfaces
nvdebug -i <BMC_IP> -u <USER> -p <PASS> --preflight
# Run preflight checks only against In-band interfaces
nvdebug -I <HOST_IP> -U <USER> -H <PASS> --preflight
# Run preflight checks against all interfaces
nvdebug -i <BMC_IP> -u <USER> -p <PASS> -I <HOST_IP> -U <USER> -H <PASS> --preflight
# Run preflight checks against all interfaces with configuration files
nvdebug -c <CONFIG_FILE> -d <DUT_CONFIG_FILE> --preflight
Specific Log Collection:
# Collect specific collectors by CID (Specify as many collectors as needed, delimit with space)
nvdebug -i <BMC_IP> -u <USER> -p <PASS> -t "arm64" -S H5 R3 I2 S1
List Available Collectors:
# List all supported collectors
nvdebug -l
# List all supported collectors for a specific platform
nvdebug -l -t <Platform>
# List default collectors for platform
nvdebug -D <Platform>
Verbose Debugging:
# Enable verbose output for debugging
nvdebug -i 192.168.1.100 -u admin -p password123 -t "DGX" -v
SOS Report Issues (H20)
Symptoms:
SOS report collection fails with “non-existing plugin” errors.
SOS report shows “Error” status in Execution_Summary_Report.txt.
Missing plugins for NVIDIA hardware.
Solutions:
Ensure doca-sosreport 4.8+ is installed (not Ubuntu system sosreport).
Install from doca-host-repo (https://linux.mellanox.com/public/repo/doca/) for DOCA 2.10 compatibility.
Verify sos command availability:
which sos
Check sos version:
sos --version
Ensure NVIDIA networking hardware is detected before running sos report.
Help and Support#
Get Help:
# Show all options
nvdebug --help
# List supported collectors
nvdebug -l
# Show version
nvdebug --version
Pro Tips:
Use -v flag for detailed output during learning
Start with basic collection before customizing
Check collection summary for detailed results
Use preflight checks for troubleshooting: –preflight
Keep your NVDebug installation up to date for latest features and bug fixes
Support Information#
When reporting issues, include:
The NVDebug log archive.
- If no archive is created, please provide any files generated and the stdout from the console.
Error logs from /path/to/output/error_logs/
Collection summary from Execution_Summary_Report.txt
Relevant BMC/Host logs
HTML reports from /path/to/output/reports/
- System information:
Platform type
OS version
Network configuration