System Requirements and Known Limitations#

This section provides information about the system requirements to run nvdebug and provides information about the known limitations.

Client Host#

The client host must meet the following requirements:

  • Linux-based operating system

  • Linux Kernel 4.4 or later

Tip

Kernel version 4.15 or later is recommended.

  • GNU C Library (glibc) 2.35 or later

  • Operating System:

    • Recommended: Ubuntu 24.04

    • Supported: Ubuntu 22.04 or later

  • Required packages:

    • Python3.12

    • ipmitool

    • sshpass

  • A server device under test (DUT) that can be accessed by the BMC from the client host using Redfish and IPMI-over-LAN.

Server Host#

The server host requirements are the same as those listed under the “Client Host” section. Additionally:

  • The following packages must be installed:

    • nvme-cli

    • pciutils

    • dmidecode

    • lshw

    • opensm

    • nvidia-fabricmanager

    • nvidia-subnet-manger

    • mft-tools

    • NVIDIA RM Driver

    • nvlsm

    • doca-sosreport

Note

  • NVIDIA RM drivers must be installed on the system to run nvidia-bug-report (H11) and nvidia-smi collectors (H6).

  • The nvidia-fabricmanager package is required to run the nvidia-fabricmanager collector (H12). This collector is specific to InfiniBand network hardware.

  • The nvlsm package is required to run the subnet manager collector (H10) and a section of the nvidia-bug-report collector (H11).

  • The BMC Management and Server Host Management networks must reside in the same subnet.

  • The doca-sosreport package is required to run sos-report (H21). The recommended version is v4.8.0 or later.

Tip

For example, the server BMC should be accessible from the server host OS using IPMI-over-LAN and the Redfish API using the BMC’s IP address and credentials.

NVSwitch Tray Host#

The NVSwitch tray host must be running NVOS version 2.

Baseboard Management Controller#

Before running the nvdebug one-click script on the BMC, ensure that the following tools are installed:

  • i2cset

  • i2ctransfer

  • ipmitool

  • i2cdump

  • curl

  • Redfish

Known Limitations#

  • To collect logs from HGX 8-GPU baseboards, either Redfish Aggregation or port forwarding on the host BMC is required.

  • The number of threads available on the client host can impact the collection time. Larger number of threads can improve performance greatly.

  • NVBugs has a limitation of 200 MB upload file size. NVDebug will automatically split the final zip archive into multiple parts if the file size exceeds this limit. This limit is user configurable through config yaml or cli options. Please refer to the Log Archiving section in the nvdebug Command-Line Interface chapter for more information.

Note

To enable port forwarding on the BMC:

  1. Edit the /etc/ssh/sshd_config file on the BMC.

  2. Change the setting from AllowTcpForwarding no to AllowTcpForwarding yes.

  3. Restart the SSH service by running: /etc/init.d/ssh restart.

Supported Architectures#

nvdebug is available for the following architectures:

  • x86_64

  • arm64/sbsa

Note

Make sure to download the version that corresponds to the architecture of the system on which the tool will run.

System Performance Considerations#

When collecting logs in a multinode environment, the system’s CPU core count can significantly impact collection time:

  • Client systems with lower core counts will experience longer log collection times for multinode captures

  • The tool uses process pooling to parallelize operations, with the number of processes matching the available CPU cores

  • You can control the number of parallel processes using the --resource-count flag to limit resource usage

  • For optimal performance, ensure your system has sufficient CPU cores to handle the expected workload