System Requirements and Known Limitations#
This section provides information about the system requirements to run nvdebug and provides information about the known limitations.
Client Host#
The client host must meet the following requirements:
Linux-based operating system
Linux Kernel 4.4 or later
Tip
Kernel version 4.15 or later is recommended.
GNU C Library (glibc) 2.35 or later
Operating System:
Recommended: Ubuntu 24.04
Supported: Ubuntu 22.04 or later
Required packages:
Python3.12
ipmitool
sshpass
A server device under test (DUT) that can be accessed by the BMC from the client host using Redfish and IPMI-over-LAN.
Server Host#
The server host requirements are the same as those listed under the “Client Host” section. Additionally:
The following packages must be installed:
nvme-cli
pciutils
dmidecode
lshw
opensm
nvidia-fabricmanager
nvidia-subnet-manger
mft-tools
NVIDIA RM Driver
nvlsm
doca-sosreport
Note
NVIDIA RM drivers must be installed on the system to run
nvidia-bug-report
(H11) andnvidia-smi
collectors (H6).The
nvidia-fabricmanager
package is required to run thenvidia-fabricmanager
collector (H12). This collector is specific to InfiniBand network hardware.The
nvlsm
package is required to run thesubnet manager
collector (H10) and a section of thenvidia-bug-report
collector (H11).The BMC Management and Server Host Management networks must reside in the same subnet.
The
doca-sosreport
package is required to runsos-report
(H21). The recommended version is v4.8.0 or later.
Tip
For example, the server BMC should be accessible from the server host OS using IPMI-over-LAN and the Redfish API using the BMC’s IP address and credentials.
NVSwitch Tray Host#
The NVSwitch tray host must be running NVOS version 2.
Baseboard Management Controller#
Before running the nvdebug one-click script on the BMC, ensure that the following tools are installed:
i2cset
i2ctransfer
ipmitool
i2cdump
curl
Redfish
Known Limitations#
To collect logs from HGX 8-GPU baseboards, either Redfish Aggregation or port forwarding on the host BMC is required.
The number of threads available on the client host can impact the collection time. Larger number of threads can improve performance greatly.
NVBugs has a limitation of 200 MB upload file size. NVDebug will automatically split the final zip archive into multiple parts if the file size exceeds this limit. This limit is user configurable through config yaml or cli options. Please refer to the
Log Archiving
section in thenvdebug Command-Line Interface
chapter for more information.
Note
To enable port forwarding on the BMC:
Edit the
/etc/ssh/sshd_config
file on the BMC.Change the setting from
AllowTcpForwarding no
toAllowTcpForwarding yes
.Restart the SSH service by running:
/etc/init.d/ssh restart
.
Supported Architectures#
nvdebug is available for the following architectures:
x86_64
arm64/sbsa
Note
Make sure to download the version that corresponds to the architecture of the system on which the tool will run.
System Performance Considerations#
When collecting logs in a multinode environment, the system’s CPU core count can significantly impact collection time:
Client systems with lower core counts will experience longer log collection times for multinode captures
The tool uses process pooling to parallelize operations, with the number of processes matching the available CPU cores
You can control the number of parallel processes using the
--resource-count
flag to limit resource usageFor optimal performance, ensure your system has sufficient CPU cores to handle the expected workload