Prerequisites for arm64 / aarch64 DGX Systems#

Ensure the prerequisites for arm64 / aarch64 DGX Systems are met, including the following:

  • NVIDIA Driver

  • CUDA

  • Docker

  • NVIDIA Container Toolkit

Introduction#

If your compute host has arm64 or aarch64 CPU architecture, such as a DGX GB200 Compute Tray, follow the steps below to install.

The steps below follow NVIDIA DGX OS 7 User Guide: Installing the GPU Driver.

  • These steps are customized for the DGX GB200 Compute Tray, which

    • has 2x Grace CPUs (arm64 / aarch64)

    • has 4x Blackwell GPUs

    • runs one OS image

    • uses the DGX Software Stack

  • This workflow was verified with

    • Ubuntu 24.04

    • Linux kernel version 6.8.0-1044-nvidia-64k

Installation Steps#

  1. Check state of NVIDIA (GPU) Driver and related tools

    • Check the running driver version

      nvidia-smi
      
    • Example successful output is shown below.

      +-----------------------------------------------------------------------------------------+
      | NVIDIA-SMI 580.105.08             Driver Version: 580.105.08     CUDA Version: 13.0     |
      +-----------------------------------------+------------------------+----------------------+
      | GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
      | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
      |                                         |                        |               MIG M. |
      |=========================================+========================+======================|
      |   0  NVIDIA GB200                   On  |   00000008:01:00.0 Off |                    0 |
      | N/A   29C    P0            130W / 1200W |       0MiB / 189471MiB |      0%      Default |
      |                                         |                        |             Disabled |
      +-----------------------------------------+------------------------+----------------------+
      |   1  NVIDIA GB200                   On  |   00000009:01:00.0 Off |                    0 |
      | N/A   29C    P0            127W / 1200W |       0MiB / 189471MiB |      0%      Default |
      |                                         |                        |             Disabled |
      +-----------------------------------------+------------------------+----------------------+
      |   2  NVIDIA GB200                   On  |   00000018:01:00.0 Off |                    0 |
      | N/A   30C    P0            127W / 1200W |       0MiB / 189471MiB |      0%      Default |
      |                                         |                        |             Disabled |
      +-----------------------------------------+------------------------+----------------------+
      |   3  NVIDIA GB200                   On  |   00000019:01:00.0 Off |                    0 |
      | N/A   30C    P0            139W / 1200W |       0MiB / 189471MiB |      0%      Default |
      |                                         |                        |             Disabled |
      +-----------------------------------------+------------------------+----------------------+
      
      +-----------------------------------------------------------------------------------------+
      | Processes:                                                                              |
      |  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
      |        ID   ID                                                               Usage      |
      |=========================================================================================|
      |  No running processes found                                                             |
      +-----------------------------------------------------------------------------------------+
      
    • If the running driver version is 580+, skip to Step 9.

    • If nvidia-smi fails, proceed to Step 2. Output like Command 'nvidia-smi' not found indicates failure.

  2. Confirm that the OS sees NVIDIA GPUs

    • Run the command below, and look for NVIDIA entries.

      sudo lshw -class display -json | jq '.[] | select(.description=="3D controller")'
      
    • Product-specific information may be visible with

      sudo lshw -class system -json | jq '.[0]'
      
  3. Verify that your Linux distribution, kernel version, and gcc version meet the following requirements:

    • Use the output from the commands below to find your system’s Linux distribution, kernel version, and gcc version, respectively.

      . /etc/os-release && echo "$PRETTY_NAME"   # for Linux distribution
      uname -r  # for kernel version
      gcc --version  # for gcc version
      
    • See example output:

      # Example output for . /etc/os-release && echo "$PRETTY_NAME"
      Ubuntu 24.04.2 LTS
      
      # Example output for uname -r
      6.8.0-1044-nvidia-64k
      
      # Example output for gcc --version
      gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0
      Copyright (C) 2023 Free Software Foundation, Inc.
      This is free software; see the source for copying conditions.  There is NO
      warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
      
  4. Update Linux kernel version if needed

    • For a GB200 system, this workflow was verified with kernel version 6.8.0-1044-nvidia-64k.

    • If your systems has kernel version 6.8.0-1043-nvidia-64k or 6.8.0-1044-nvidia-64k, go to Step 5.

    • If your system has a different kernel version, configure grub (GRand Unified Bootloader) so that your system starts up with the verified kernel version.

      a. Update grub default ‘menu entry’

      • In the file /etc/default/grub, set the variable GRUB_DEFAULT to the verified kernel version

      sudo sed --in-place=.bak \
        '/^[[:space:]]*GRUB_DEFAULT=/c\GRUB_DEFAULT="Advanced options for Ubuntu>Ubuntu, with Linux 6.8.0-1044-nvidia-64k"' \
        /etc/default/grub
      

      b. Verify that etc/default/grub is updated

      cat /etc/default/grub
      

      c. Update grub and reboot

      ```
      sudo update-grub
      sudo reboot
      ``` 
      
  5. Remove NVIDIA libraries to avoid version conflicts

    • See reference Removing the Driver

    • Check for NVIDIA libraries, using the command

      ls /usr/lib/aarch64-linux-gnu/ | grep -i nvidia
      
    • If the output from the command above is not empty, run the command below.

      sudo apt remove --autoremove --purge -Vy \
        cuda-compat\* \
        cuda-drivers\*  \
        libnvidia-cfg1\* \
        libnvidia-compute\* \
        libnvidia-decode\* \
        libnvidia-encode\* \
        libnvidia-extra\* \
        libnvidia-fbc1\* \
        libnvidia-gl\* \
        libnvidia-gpucomp\* \
        libnvidia-nscq\* \
        libnvsdm\* \
        libxnvctrl\* \
        nvidia-dkms\* \
        nvidia-driver\* \
        nvidia-fabricmanager\* \
        nvidia-firmware\* \
        nvidia-headless\* \
        nvidia-imex\* \
        nvidia-kernel\* \
        nvidia-modprobe\* \
        nvidia-open\* \
        nvidia-persistenced\* \
        nvidia-settings\* \
        nvidia-xconfig\* \
        xserver-xorg-video-nvidia\*
      
    • Ignore errors for non-matching patterns.

  6. Download package repositories and install DGX tools

    a. Download and unpack ARM64-specific packages

    curl https://repo.download.nvidia.com/baseos/ubuntu/noble/arm64/dgx-repo-files.tgz | sudo tar xzf - -C /
    

    b. Update local APT database

    sudo apt update
    

    c. Install DGX system tools

    sudo apt install -y nvidia-system-core
    sudo apt install -y nvidia-system-utils
    sudo apt install -y nvidia-system-extra
    

    d. Install linux-tools for your Linux kernel

    sudo apt install -y linux-tools-nvidia-64k
    

    e. NVIDIA peermem loader package

    sudo apt install -y nvidia-peermem-loader
    
  7. Install GPU Driver

    • For your system and architecture, such as GB200 and arm64, follow the steps as described in https://docs.nvidia.com/dgx/dgx-os-7-user-guide/installing_on_ubuntu.html#installing-the-gpu-driver

    a. Do not update the Linux kernel version

    b. Pin the driver version

    sudo apt install nvidia-driver-pinning-580
    

    c. Install the open GPU kernel module

    sudo apt install --allow-downgrades \
      nvidia-driver-580-open \
      libnvidia-nscq \
      nvidia-modprobe \
      nvidia-imex \
      datacenter-gpu-manager-4-cuda13 \
      nv-persistence-mode
    
    • Ignore build errors for modules built for 6.14.0-1015-nvidia-64k

    e. Enable the peristence daemon

    sudo systemctl enable nvidia-persistenced nvidia-dcgm nvidia-imex
    

    f. Reboot

    sudo reboot
    
  8. After reboot, repeat Step 1 to check NVIDIA Driver and related tools

  9. Install Docker and the NVIDIA Container Toolkit

    • Follow the instructions at Installing Docker and the NVIDIA Container Toolkit

    • Ignore build errors for modules built for 6.14.0-1015-nvidia-64k

    • Verify the NVIDIA Driver, Docker, NVIDIA Container Toolkit stack

      sudo docker run --rm --gpus=all nvcr.io/nvidia/cuda:12.6.2-base-ubuntu24.04 nvidia-smi
      
  10. To run docker as a non-root user, see Manage Docker as non-root user

  11. Verify the NVIDIA Driver, CUDA, Docker, NVIDIA Container Toolkit, Torch stack

    a. Log into the NVIDIA Container Registry, using your NGC key as the password.

    docker login nvcr.io --username '$oauthtoken'
    

    b. Run

    sudo docker run --rm --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \
      nvcr.io/nvidia/pytorch:25.12-py3 \
      python -c \
    "import torch, pynvml;
    pynvml.nvmlInit();
    print('Driver:', pynvml.nvmlSystemGetDriverVersion());
    print('CUDA:', torch.version.cuda);
    print('GPU count:', torch.cuda.device_count())"
    

    The sudo prefix can be omitted if you completed the previous step, ‘To run docker as a non-root user’.

    • Example output is

      =============
      == PyTorch ==
      =============
      
      NVIDIA Release 25.12 (build 245654591)
      PyTorch Version 2.10.0a0+b4e4ee8
      Container image Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
      Copyright (c) 2014-2024 Facebook Inc.
      Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)
      Copyright (c) 2012-2014 Deepmind Technologies    (Koray Kavukcuoglu)
      Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)
      Copyright (c) 2011-2013 NYU                      (Clement Farabet)
      Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston)
      Copyright (c) 2006      Idiap Research Institute (Samy Bengio)
      Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)
      Copyright (c) 2015      Google Inc.
      Copyright (c) 2015      Yangqing Jia
      Copyright (c) 2013-2016 The Caffe contributors
      All rights reserved.
      
      Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.
      
      GOVERNING TERMS: The software and materials are governed by the NVIDIA Software License Agreement
      (found at https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-software-license-agreement/)
      and the Product-Specific Terms for NVIDIA AI Products
      (found at https://www.nvidia.com/en-us/agreements/enterprise-software/product-specific-terms-for-ai-products/).
      
      NOTE: CUDA Forward Compatibility mode ENABLED.
        Using CUDA 13.1 driver version 590.44.01 with kernel driver version 580.105.08.
        See https://docs.nvidia.com/deploy/cuda-compatibility/ for details.
      
      Driver: 580.105.08
      CUDA: 13.1
      GPU count: 4
      
    • If the number of CUDA devices accessible is as expected, your system is verified for

      • (a) GPU setup

      • (b) NVIDIA Driver

      • (c) CUDA

      • (d) Docker

      • (e) NVIDIA Container Toolkit