Known Issues: DGX Station

See the sections for specific versions to see which issues are open in those versions.

Issue (fixed in 20.02)

Attempting to run GPU-accelerated Docker containers may return the following error. .. code:: text

Failed to initialize NVML: Unknown Error

Explanation and Workaround

This issue occurs if you have installed docker-1.13.1-108 provided by Red Hat Enterprise Linux.

An updated Docker version that resolves the issue is now available. To obtain the update, issue the following: .. code:: text

sudo yum update

Issue

[Fixed in EL7-21-07] Removing the NVIDIA CUDA Toolkit can cause the symbolic link to /usr/local/cuda to be removed even if multiple versions of the NVIDIA CUDA Toolkit are installed.

Workaround

This workaround requires sudo privileges.

Re-create the symbolic link to /usr/local/cuda from the versioned CUDA directory, for example, /usr/local/cuda-10.1.

Copy
Copied!
            

sudo ln -s /usr/local/cuda-10.1 /usr/local/cuda

Issue

The nvhealth command incorrectly lists the serial number of the motherboard in the DGX Serial Number entry under Checks. The correct serial number is listed under System Summary.

Copy
Copied!
            

$ sudo nvhealth Info ---- Timestamp: Thu Mar 7 08:54:52 2019 -0800 Version: 19.01.6 Checks ------ DGX BaseOS Version [4.0.5]........................................... BIOS Version [0406].................................................. DGX Serial Number [160984157800056].................................. ... System Summary -------------- Product Name: DGX Station Manufacturer: NVIDIA DGX Serial Number: 0154017000004 Uptime: up 5 days, 17 hours, 44 minutes Motherboard: BIOS Version: 0406 Serial Number: 160984157800056 ...

Issue (fixed with EL7-22.02)

The DGX Station cannot be resumed after being suspended either from the desktop GUI or by using the systemctl suspend command. Pressing a keyboard key or the power button when the system is suspended has no effect: The display remains dark, it is not possible to log in to the system, and the system does not respond to a ping command from a remote host.

Workaround

To avoid this issue, do not suspend the system.

If you encounter this issue, turn off the power to the system and then turn on the power to the system again.

© Copyright 2022-2023, NVIDIA. Last updated on Jun 27, 2023.