Known Issues: DGX Station

See the sections for specific versions to see which issues are open in those versions.

Docker GPU Containers Cannot be Run

Issue (fixed in 20.02)

Attempting to run GPU-accelerated Docker containers may return the following error.
Failed to initialize NVML: Unknown Error

Explanation and Workaround

This issue occurs if you have installed docker-1.13.1-108 provided by Red Hat Enterprise Linux.

An updated Docker version that resolves the issue is now available. To obtain the update, issue the following:
sudo yum update

DGX Station: An Incorrect Serial Number Is Listed in nvhealth Output

Issue

The nvhealth command incorrectly lists the serial number of the motherboard in the DGX Serial Number entry under Checks. The correct serial number is listed under System Summary.

$ sudo nvhealth
Info
----
Timestamp:  Thu Mar  7 08:54:52 2019 -0800
Version:    19.01.6
 
Checks
------
DGX BaseOS Version [4.0.5]........................................... 
BIOS Version [0406].................................................. 
DGX Serial Number [160984157800056].................................. 
...
 
System Summary
--------------
    Product Name: DGX Station
    Manufacturer: NVIDIA
    DGX Serial Number: 0154017000004
    Uptime: up 5 days, 17 hours, 44 minutes
Motherboard:
    BIOS Version: 0406
    Serial Number: 160984157800056
...

DGX Station: The System Cannot be Resumed After Suspension

Issue (fixed with EL7-22.02)

The DGX Station cannot be resumed after being suspended either from the desktop GUI or by using the systemctl suspend command. Pressing a keyboard key or the power button when the system is suspended has no effect: The display remains dark, it is not possible to log in to the system, and the system does not respond to a ping command from a remote host.

Workaround

To avoid this issue, do not suspend the system.

If you encounter this issue, turn off the power to the system and then turn on the power to the system again.