System Requirements and Known Limitations#
This section provides information about the system requirements to run nvdebug and provides information about the known limitations.
Client Host#
The client host must meet the following requirements:
Linux-based operating system
Linux Kernel 4.4 or later
Tip
Kernel version 4.15 or later is recommended.
GNU C Library (glibc) 2.35 or later
Operating System:
Recommended: Ubuntu 24.04
Supported: Ubuntu 22.04 or later
Required packages:
Python3.12ipmitoolsshpass
A server device under test (DUT) that can be accessed by the BMC from the client host using Redfish and IPMI-over-LAN.
Server Host#
The server host requirements are the same as those listed under the “Client Host” section. Additionally:
The following packages must be installed:
nvme-clipciutilsdmidecodelshwopensmnvidia-fabricmanagernvidia-subnet-mangermft-toolsNVIDIA RM Drivernvlsmdoca-sosreport
Note
NVIDIA RM drivers must be installed on the system to run
nvidia-bug-report(H11) andnvidia-smicollectors (H6).The
nvidia-fabricmanagerpackage is required to run thenvidia-fabricmanagercollector (H12). This collector is specific to InfiniBand network hardware.The
nvlsmpackage is required to run thesubnet managercollector (H10) and a section of thenvidia-bug-reportcollector (H11).The BMC Management and Server Host Management networks must reside in the same subnet.
The
doca-sosreportpackage is required to runsos-report(H21). The recommended version is v4.8.0 or later.
Tip
For example, the server BMC should be accessible from the server host OS using IPMI-over-LAN and the Redfish API using the BMC’s IP address and credentials.
NVSwitch Tray Host#
The NVSwitch tray host must be running NVOS version 2.
Baseboard Management Controller#
Before running the nvdebug one-click script on the BMC, ensure that the following tools are installed:
i2cseti2ctransferipmitooli2cdumpcurlRedfish
Known Limitations#
To collect logs from HGX 8-GPU baseboards, either Redfish Aggregation or port forwarding on the host BMC is required.
The number of threads available on the client host can impact the collection time. Larger number of threads can improve performance greatly.
NVBugs has a limitation of 200 MB upload file size. NVDebug will automatically split the final zip archive into multiple parts if the file size exceeds this limit. This limit is user configurable through config yaml or cli options. Please refer to the
Log Archivingsection in thenvdebug Command-Line Interfacechapter for more information.
Note
To enable port forwarding on the BMC:
Edit the
/etc/ssh/sshd_configfile on the BMC.Change the setting from
AllowTcpForwarding notoAllowTcpForwarding yes.Restart the SSH service by running:
/etc/init.d/ssh restart.
Supported Architectures#
nvdebug is available for the following architectures:
x86_64
arm64/sbsa
Note
Make sure to download the version that corresponds to the architecture of the system on which the tool will run.
System Performance Considerations#
When collecting logs in a multinode environment, the system’s CPU core count can significantly impact collection time:
Client systems with lower core counts will experience longer log collection times for multinode captures
The tool uses process pooling to parallelize operations, with the number of processes matching the available CPU cores
You can control the number of parallel processes using the
--resource-countflag to limit resource usageFor optimal performance, ensure your system has sufficient CPU cores to handle the expected workload