Contacting NVIDIA Support#

If you need assistance with your DGX Spark system, you can contact NVIDIA support:

Field Diagnostic Software#

NVIDIA Field Diagnostic is a software program used to test the DGX Spark system and detect hardware failures, and is intended for health checks of your DGX Spark setup and a pre-check for RMA qualification of the overall system.

For complete instructions, refer to the Field Diagnostics User Guide.

Removing a Previous Version#

Remove the previous version of the field diagnostic software before installing a new one with the following commands:

sudo dpkg -P dgx-spark-fieldiag
sudo rm -rf /opt/nvidia/dgx-spark-fieldiag
sudo apt autoremove dgx-spark-fieldiag

Installing the Field Diagnostic Software#

The Field Diagnostic software package is named dgx-spark-fieldiag_<version>-1_arm64.deb. To install the package using the NVIDIA CUDA APT Repository, follow these steps:

  1. Add the NVIDIA CUDA repository key:

    sudo mkdir -p /usr/share/keyrings
    curl -fsSL https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/sbsa/cuda-archive-keyring.gpg | sudo tee /usr/share/keyrings/cuda-archive-keyring.gpg > /dev/null
    
  2. Add the CUDA APT repository and install:

    echo "deb [signed-by=/usr/share/keyrings/cuda-archive-keyring.gpg] https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/sbsa /" | sudo tee /etc/apt/sources.list.d/cuda-sbsa-ubuntu2404.list
    sudo apt-get update
    sudo apt-get install dgx-spark-fieldiag
    
  3. Verify the installation:

    dpkg -l | grep dgx-spark-fieldiag
    

The software dependencies (stress-ng, fio, and memtester) are automatically installed on your system when you install the .deb package.

Running Field Diagnostics#

After installing the package, the field diagnostic software is located at /opt/nvidia/dgx-spark-fieldiag.

Before Running#

Disable Secure Boot before running field diagnostics:

  1. Check the current Secure Boot state:

    sudo mokutil --sb-state
    
  2. Reboot and enter UEFI setup (press Delete during boot, or run sudo systemctl reboot --firmware-setup).

  3. Go to SecuritySecure BootDisable Secure Boot.

  4. Save the changes and reboot the system.

Running the Diagnostic#

To execute field diagnostics, use root access:

  1. Run:

    sudo init 3
    
  2. When the system switches to TTY console mode, log in at the TTY console.

  3. Run:

    cd /opt/nvidia/dgx-spark-fieldiag
    sudo ./partnerdiag --field
    

The diagnostic takes approximately 30 minutes. A PASS/FAIL banner appears when complete. You can also run the diagnostic over SSH using the same commands.

Note

If you interrupt the diagnostic (for example, with Ctrl+C), power cycle the system before running the tests again.

After Running#

Re-enable Secure Boot after the diagnostic completes:

  1. Run sudo systemctl reboot --firmware-setup.

  2. Go to SecuritySecure BootEnable Secure Boot.

  3. Save the changes and reboot the system.

For detailed instructions, including the Spec JSON file and log retrieval, refer to the Field Diagnostics User Guide.

Verifying Tool Installation#

You can verify whether these tools are installed properly on your system using the following commands. These commands should display the path to each tool binary.

which fio
which memtester
which stress-ng