System Requirements#
This section outlines the system requirements necessary to run NVDebug, including client, server, NVSwitch, and baseboard management controller requirements as well as performance requirements. It also highlights known limitations that may affect functionality or compatibility.
Client Host Requirements#
The client host must meet the following requirements:
Operating System:
Linux-based operating system
Linux Kernel 4.4 or later (4.15 or later recommended)
GNU C Library (glibc) 2.17 or later
Recommended Operating Systems:
Ubuntu 24.04 (recommended)
Ubuntu 20.04 or later (supported)
Supported Operating Systems:
RHEL/CentOS 8+
SLES 15+
Hardware Requirements:
Minimum 4GB RAM
2GB free disk space
Network connectivity to target systems
Required Packages:
Python 3.12
ipmitool
sshpass
Network Requirements:
Access to server device under test (DUT) via BMC using Redfish and IPMI-over-LAN
Out-of-band (OOB) access to BMC
SSH access to host systems (for remote mode)
Firewall rules for BMC and host communication
Segmented network access is supported through split log collection
Stable network connectivity to target systems
Server Host Requirements#
The server host requirements are the same as those listed under “Client Host Requirements” with the following additional packages:
Required Packages:
nvme-cli
pciutils
dmidecode
lshw
opensm
nvidia-fabricmanager
nvidia-subnet-manager
mft-tools
NVIDIA Graphics Driver (RM Driver)
nvlsm
doca-sosreport >= 4.8.0
Important Notes:
Note
- NVIDIA Graphics Drivers: Must be installed on the target system to run the following collectors:
nvidia-bug-report
collector H11nvidia-smi
collector H6
nvidia-fabricmanager: Required for nvidia-fabricmanager collector (H12). This collector is specific to InfiniBand network hardware.
nvlsm: Required for subnet manager collector (H10) and a section of the nvidia-bug-report collector (H11).
doca-sosreport: Required for sos-report (H20). The recommended version is v4.8.0 or later. Do not use the Ubuntu system sosreport as it lacks required plugins for NVIDIA hardware.
Network Configuration: BMC Management and Server Host Management networks must reside in the same subnet.
Tip
For example, the server BMC should be accessible from the server host OS using IPMI-over-LAN and the Redfish API using the BMC’s IP address and credentials.
NVSwitch Tray Host Requirements#
The NVSwitch tray host must be running NVOS version 2.
Baseboard Management Controller Requirements#
Before running NVDebug on the BMC, ensure the following tools are installed:
Required Tools:
i2cset
i2ctransfer
ipmitool
i2cdump
curl
Redfish
Network Access:
BMC must be accessible via IPMI-over-LAN
Redfish API must be enabled and accessible
SSH access to BMC (if SSH collection is enabled)
Performance Requirements#
Storage:
Minimum 2GB free disk space for collection archives
Additional space for heavily used rack systems
Monitor archive sizes to avoid disk space exhaustion
Memory:
Sufficient RAM for parallel collection operations
Consider collection duration and number of nodes
CPU:
Adequate CPU resources for collection processing
Number of available threads correlates to collection speed
NVDebug parallelizes collection across multiple threads
Network:
Stable network connectivity between collector and targets
Consider bandwidth limitations for large collections
Account for firewall rules that might block collection
Known Limitations#
Platform Support:
Limited to NVIDIA platforms (DGX, HGX, MGX, GB series)
Some collectors are platform-specific
New platforms added over time
Collection Limitations:
Some collectors may take hours to complete with -VV flag
Large log files may require significant time and storage
Network timeouts may occur with unstable connections
Security Limitations:
Credentials stored in plain text in configuration files
No built-in encryption for stored credentials
Network security depends on underlying protocols (SSH, IPMI, Redfish)
Performance Limitations:
Collection speed depends on network bandwidth
Large systems may require extended collection times
Parallel collection limited by system resources
Compatibility Limitations:
Requires specific Python version (3.12)
Limited to Linux operating systems
Some collectors require specific hardware configurations