NVIDIA Container Runtime for Docker#

Overview#

The NVIDIA Container Runtime enables Docker containers to access GPU resources on DGX Spark systems. This runtime acts as a bridge between Docker and the NVIDIA drivers, allowing containers to utilize GPU acceleration for AI/ML workloads, CUDA applications, and other GPU-accelerated software.

Key benefits: - Seamless GPU access within containers - Automatic driver and library management - Support for multi-GPU configurations - Compatibility with popular container orchestration platforms

The runtime works in conjunction with the NVIDIA Container Toolkit, which provides the necessary components to expose GPU devices and CUDA libraries to containerized applications.

Installation#

The NVIDIA Container Toolkit is preinstalled and configured on DGX Spark systems. This includes:

NVIDIA Container Runtime
Docker integration
GPU device access configuration
CUDA library management

The runtime is ready to use out of the box for running GPU-accelerated containers.

Optional: Add User to Docker Group#

By default, Docker requires sudo privileges to run commands. Adding your user to the docker group allows you to run Docker commands without sudo, which provides:

Convenience: No need to type sudo before every Docker command
Better workflow: Seamless integration with development tools and scripts
Reduced friction: Faster iteration when working with containers

To add your user to the docker group:

sudo usermod -aG docker $USER

Important: You must log out and log back in (or restart your session) for the group membership to take effect.

Note: This step is optional. You can continue using Docker with sudo if you prefer not to modify group memberships.

Usage#

Basic GPU Access#

Run a container with GPU access using the –gpus flag:

docker run -it --gpus=all nvcr.io/nvidia/cuda:13.0.1-devel-ubuntu24.04 nvidia-smi

This command: - Runs an interactive container (-it) - Enables access to all GPUs (–gpus=all) - Uses the NVIDIA CUDA development image - Executes nvidia-smi to display GPU information

Set GPU Capabilities#

Control which GPU capabilities are available to the container:

docker run -it --gpus '"capabilities=compute,utility"' nvcr.io/nvidia/cuda:13.0.1-devel-ubuntu24.04 nvidia-smi

Mount CUDA Libraries#

For applications that need specific CUDA libraries, mount them from the host:

docker run -it --gpus=all \
  -v /usr/local/cuda:/usr/local/cuda:ro \
  nvcr.io/nvidia/cuda:13.0.1-devel-ubuntu24.04 bash

Validation#

Test GPU Access#

Run the test command to verify GPU access:
```
docker run -it --gpus=all nvcr.io/nvidia/cuda:13.0.1-devel-ubuntu24.04 nvidia-smi
```
Expected output should show: - GPU device information - Driver version - CUDA version - Memory usage and temperature
Check runtime configuration:
```
docker info | grep -A 10 "Runtimes"
```

Verify NVIDIA runtime is available:

docker run --rm --runtime=nvidia nvcr.io/nvidia/cuda:13.0.1-devel-ubuntu24.04 nvidia-smi

Inspect Container GPU Access#

Check what GPU resources are available inside a running container:

docker run -it --gpus=all nvcr.io/nvidia/cuda:13.0.1-devel-ubuntu24.04 bash
# Inside the container:
nvidia-smi
ls /dev/nvidia*

Troubleshooting#

Runtime Not Found#

If you encounter “runtime not found” errors:

Verify NVIDIA Container Toolkit is installed:
```
nvidia-ctk --version
```
Check Docker daemon configuration:
```
cat /etc/docker/daemon.json
```
Restart Docker service:
```
sudo systemctl restart docker
```

Driver/Container CUDA Mismatch#

If you see CUDA version mismatches:

Check host CUDA driver version:
```
nvidia-smi
```

Use a container image with compatible CUDA version:

docker run -it --gpus=all nvcr.io/nvidia/cuda:12.0.1-devel-ubuntu24.04 nvidia-smi

Permission Issues#

If you encounter permission errors:

Ensure your user is in the docker group (if not using sudo):
```
groups $USER
```
Check device permissions:
```
ls -la /dev/nvidia*
```

Verify Docker daemon has access to GPU devices:

sudo docker run -it --gpus=all nvcr.io/nvidia/cuda:13.0.1-devel-ubuntu24.04 nvidia-smi

Container Startup Issues#

If containers fail to start:

Check Docker logs:
```
docker logs <container_id>
```
Verify GPU devices are available on host:
```
ls /dev/nvidia*
```

Test with a minimal container:

docker run --rm --gpus=all nvcr.io/nvidia/cuda:13.0.1-devel-ubuntu24.04 echo "GPU test successful"