Setup and Build Systems#

Compilation on DGX Spark#

CMake#

CMake can automatically detect CUDA toolchains when certain prerequisites are met. Since DGX Spark comes with the CUDA SDK pre-installed, CMake will detect and use it automatically. For example, run the following on your DGX Spark:

git clone https://github.com/NVIDIA/CUDALibrarySamples.git
cd CUDALibrarySamples/cuBLAS/Level-3/gemm
mkdir build
cd build
cmake  -DCMAKE_CUDA_ARCHITECTURES="121-real" ..
cmake --build .
./cublas_gemm_example

If cmake is unable to find nvcc, add -DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc to the first cmake call. For optimal performance, make sure to compile for the compute capability of your device. For DGX Spark, this is 121-real.

Cross-compilation on x86-64 linux host#

In the context of this guide, cross-compilation refers to compiling applications on x86 Linux devices (the host) for the arm64-based DGX Spark (the target). To build and link correctly, you need to install arm binaries and modify the toolchain on your host device. For simplicity and to avoid unintended side effects, we demonstrate cross-compilation using Docker. As our example project, we use the cuda-samples project, available on github.

Please use the newest available toolkit, starting from version 13.0.

Build steps:

Set up Docker and verify your Docker installation with docker run hello-world

Create a basic Dockerfile:

# Use the NVIDIA CUDA base image with cuDNN and Ubuntu 24.04
FROM nvidia/cuda:13.0.0-cudnn-devel-ubuntu24.04

# Set environment variables to avoid prompts during package installation
ENV DEBIAN_FRONTEND=noninteractive

# Update and install basic utilities
# python3 is needed to install and configure cuda-cross-sbsa for cross-compilation
RUN apt-get update && \
    apt-get install -y --no-install-recommends \
        build-essential wget curl ca-certificates git python3 python3-pip python3-venv && \
    rm -rf /var/lib/apt/lists/*

RUN apt update
# Install build tools and arm64 toolchain
RUN apt install -y cmake gcc-aarch64-linux-gnu g++-aarch64-linux-gnu
# CUDA toolkit download: https://developer.nvidia.com/cuda-downloads: Select linux->arm64-sbsa->Cross->Ubuntu->24.04->deb(network)
# Adjust installation instructions below as necessary
RUN wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/cross-linux-sbsa/cuda-keyring_1.1-1_all.deb \
    && dpkg -i cuda-keyring_1.1-1_all.deb \
    && apt update \
    && apt install -y cuda-cross-sbsa
# Clone the repository
RUN git clone https://github.com/NVIDIA/cuda-samples

Build and launch the Docker container (adjust paths as needed):

docker build -t cross_compile_docker .
docker run -it --rm --runtime=nvidia --gpus 'all' -v ./host_dir:/docker_dir cross_compile_docker /bin/bash

Navigate to cuda-samples/Samples/0_Introduction/vectorAdd/

Configure the build:

cmake -S . -B . -DCMAKE_CXX_COMPILER=/usr/bin/aarch64-linux-gnu-g++ -DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc -DCMAKE_CUDA_HOST_COMPILER=/usr/bin/aarch64-linux-gnu-g++

Build the project:
```
cmake --build .
```
Note that warnings about missing CUDA runtimes might appear.
Either move the vectorAdd binary to your Spark, or copy the Docker container itself. The DGX Spark should be able to run the binary in both cases. If you encounter permission issues when running the binary on the Spark, use chmod +x <binary_name> to make the binary executable on the ARM64 host.

Alternatively, you can check the binary information using file vectorAdd. The output should be similar to:

vectorAdd: ELF 64-bit LSB pie executable, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-aarch64.so.1, BuildID[sha1]=df539f8a4bbaeeadadcd9b816be6ea230ed72d42, for GNU/Linux 3.7.0, not stripped