Prerequisites#

Before you begin using the OpenFold3 NIM, ensure the following requirements described on this page are met.

The installation and setup workflows work with the following system architectures:

  • Ubuntu 22.04 / 24.04 and amd64 (x86_64)

  • Ubuntu 24.04 with arm64 (aarch64)

  • Without NVSwitch. For systems with NVSwitch, you may need fabricmanager. To get fabricmanager, refer to Installing the GPU Driver.

Known issues#

NGC (NVIDIA GPU Cloud) Account#

  1. Create an account on NGC

  2. Generate an API Key

  3. Log in to the NVIDIA Container Registry, using your NGC API key as passord

  • NVIDIA docker images will be used to verify the NVIDIA Driver, CUDA, Docker, and NVIDIA Container Toolkit stack

docker login nvcr.io --username='$oauthtoken'

NGC CLI Tool#

  1. Download the NGC CLI tool for your OS.

    Important: Use NGC CLI version 3.41.1 or newer. Here is the command to install this on AMD64 Linux in your home directory:

    wget --content-disposition https://api.ngc.nvidia.com/v2/resources/nvidia/ngc-apps/ngc_cli/versions/3.41.3/files/ngccli_linux.zip -O ~/ngccli_linux.zip && \
    unzip ~/ngccli_linux.zip -d ~/ngc && \
    chmod u+x ~/ngc/ngc-cli/ngc && \
    echo "export PATH=\"\$PATH:~/ngc/ngc-cli\"" >> ~/.bash_profile && source ~/.bash_profile
    
  2. Set up your NGC CLI Tool locally (You’ll need your API key for this!):

    ngc config set
    

    Note: After you enter your API key, you may see multiple options for the org and team. Select as desired or hit enter to accept the default.

Set up your NIM cache#

The NIM needs a directory on your system called the NIM cache, where it can

  • Download the model artifact (checkpoints and TRT engines)

  • Read the model artifact if it has been previously downloaded

The NIM cache directory must:

  • Reside on a disk with at least 15GB of storage

  • Have a permission state that allows the NIM to read, write, an execute

The NIM cache directory can be set up as follows, if your home directory ‘~’ is on a disk with enough storage.

## Create the NIM cache directory in a location with sufficent storage
mkdir -p ~/.cache/nim

## Set the NIM cache directory permissions to allow all (a) users to read, write, and execute (rwx)
sudo chmod -R a+rwx ~/.cache/nim

Now, you should be able to pull the NIM container, refer to the Getting Started. You won’t be able to run the NIM until completing the installation of the NVIDIA Driver, CUDA, Docker, and the NVIDIA Container Toolkit.

Installing the NVIDIA Driver, CUDA, Docker, and NVIDIA Container Toolkit Stack#

Collect System Information#

Before installation, collect your system information to determine the appropriate installation path.

  1. Determine the OS version:

# Check OS version
cat /etc/os-release
# Example output for Ubuntu:
# NAME="Ubuntu"
# VERSION="24.04.3 LTS (Noble Numbat)"
# ID=ubuntu
# VERSION_ID="24.04"

# Set OS version as environment variable for use in subsequent commands
export OS_VERSION=$( . /etc/os-release && echo "$VERSION_ID" | tr -d '.' )
echo "OS Version: $OS_VERSION"
# Example output for Ubuntu 24.04:
# OS Version: 2404
  1. Determine the GPU model:

# Check GPU model
nvidia-smi | grep -i "NVIDIA" | awk '{print $3, $4}'
# Example output:
# 590.44.01 Driver
# H100 PCIe

If you see a message like Command 'nvidia-smi' not found, then attempt to determine GPU model with the command below:

# Check GPU model
lspci | grep -i "3D controller"
# Example output:
# 01:00.0 3D controller: NVIDIA Corporation GH100 [H100 SXM5 80GB] (rev a1)
  1. Determine the CPU architecture:

# Set CPU arch as environment variable, on Ubuntu/Debian system
export CPU_ARCH=$(dpkg --print-architecture)
echo "CPU_ARCH: ${CPU_ARCH}"
# Example output:
# amd64

# Set CPU arch as environment variable, on a non-Ubuntu/Debian system
export CPU_ARCH=$(uname -m)
echo "CPU_ARCH: ${CPU_ARCH}"
# Example output:
# x86_64

Installation Instructions by Architecture#

Select the appropriate section based on your CPU architecture identified in the previous step:

Installation for amd64 / x86_64 Systems#

For systems with amd64 or x86_64 CPU architecture (H100, H200, A100, L40S, B200)

1. Find and Download Driver Package#

a. On your local machine (with browser), visit the NVIDIA Drivers download page, and observe the fields in the ‘Manual Driver Search’ dialogue box.

b. Enter your system information:

For H100, H200, A100, L40S:

Field

Value

Product Category

Data Center / Tesla

Product Series

H-Series, A-Series, or L-Series

Product

H100, H200, A100

OS

Linux 64-bit Ubuntu 24.04

CUDA Toolkit Version

13.1

Language

English (US)

For B200:

Field

Value

Product Category

Data Center / Tesla

Product Series

HGX-Series

Product

HGX B200

OS

Linux 64-bit Ubuntu 24.04

CUDA Toolkit Version

13.1

Language

English (US)

c. Click Find to find driver version 590.44.01 or higher

d. On the results page, click View

e. On the next page, right-click the Download button and select Copy Link Address

Note: Some distributions like Ubuntu, Debian, or RHEL have distribution-specific packages (.deb, .rpm). For other distributions, use the .run installer.

2. Direct Driver URLs#

For Ubuntu 24.04 (Noble):

# Driver 590.44.01 for H100/H200/B200/A100/L40S on x86_64 system
https://us.download.nvidia.com/tesla/590.44.01/nvidia-driver-local-repo-ubuntu2404-590.44.01_1.0-1_amd64.deb

For Ubuntu 22.04 (Jammy):

# Driver 590.44.01 for H100/H200/B200/A100/L40S on x86_64 system
https://us.download.nvidia.com/tesla/590.44.01/nvidia-driver-local-repo-ubuntu2204-590.44.01_1.0-1_amd64.deb

For RHEL 8/Rocky Linux 8:

# Driver 590.44.01 for H100/H200/B200/A100/L40S
https://us.download.nvidia.com/tesla/590.44.01/nvidia-driver-local-repo-rhel8-590.44.01-1.0-1.x86_64.rpm

Important: Always check the NVIDIA Driver Downloads page for the latest driver version compatible with your GPU and OS.

4. Download the Driver#

# Download driver using OS_VERSION environment variable
# For Ubuntu (automatically uses correct version: 2204, 2404, etc.)
wget https://us.download.nvidia.com/tesla/590.44.01/nvidia-driver-local-repo-ubuntu${OS_VERSION}-590.44.01_1.0-1_${CPU_ARCH}.deb

5. Install the Local Repository#

For Ubuntu/Debian:

sudo dpkg -i nvidia-driver-local-repo-ubuntu${OS_VERSION}-590.44.01_1.0-1_${CPU_ARCH}.deb

For RHEL/CentOS/Rocky Linux:

sudo rpm -i nvidia-driver-local-repo-rhel8-590.44.01-1.0-1.${CPU_ARCH}.rpm

6. Update Package Lists and Install Driver#

For Ubuntu/Debian:

# Copy the GPG key
sudo cp /var/nvidia-driver-local-repo-ubuntu${OS_VERSION}-590.44.01/nvidia-driver-local-*-keyring.gpg /usr/share/keyrings/

# Update package cache
sudo apt-get update

# Install the driver
sudo apt-get install -y cuda-drivers

For RHEL/CentOS/Rocky Linux:

# Update package cache
sudo dnf clean all
sudo dnf makecache

# Install the driver
sudo dnf install -y cuda-drivers

7. Reboot System#

sudo reboot

8. Verify Driver Installation#

After reboot, verify the driver:

nvidia-smi

Expected output:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 590.44.01    Driver Version: 590.44.01    CUDA Version: 13.1   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA H100 PCIe    Off  | 00001E:00:00.0  Off |                    0  |
| N/A   30C    P0    68W / 350W |      0MiB / 81559MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+

9. Install Docker#

Verify Docker is installed with version >=23.0.1:

docker --version
# Example output:
# Docker version 29.1.3, build f52814d

If Docker is not installed or does not meet requirements:

10. Install NVIDIA Container Toolkit#

Verify the NVIDIA Container Toolkit:

nvidia-container-cli --version

If not installed:

  1. Follow Installing the NVIDIA Container Toolkit

  2. Configure Docker: Configuring Docker

11. Verify the Complete Stack#

sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

Example output:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 590.44.01    Driver Version: 590.44.01    CUDA Version: 13.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA H100 ...     Off  | 00000000:01:00.0 Off |                  N/A |
| 41%   30C    P8     1W / 260W |   2244MiB / 81559MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

Note: For more information on enumerating multi-GPU systems, refer to the NVIDIA Container Toolkit’s GPU Enumeration Docs

Installation for arm64 / aarch64 DGX Systems#

For arm64 / aarch64 DGX Systems (e.g., DGX GB200 Compute Tray)

Note: These steps follow the NVIDIA DGX OS 7 User Guide: Installing the GPU Driver, customized for DGX GB200 Compute Tray with:

  • 2x Grace CPUs (arm64 / aarch64)

  • 4x Blackwell GPUs

  • Ubuntu 24.04

  • Linux kernel version 6.8.0-1044-nvidia-64k

1. Check NVIDIA Driver State#

Check the running driver version:

nvidia-smi

Example successful output:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 590.44.01              Driver Version: 590.44.01     CUDA Version: 13.1     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GB200                   On  |   00000008:01:00.0 Off |                    0 |
| N/A   29C    P0            130W / 1200W |       0MiB / 189471MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
  • If the running driver version is 590+, skip to Step 9

  • If nvidia-smi fails, proceed to Step 2

2. Confirm OS Sees NVIDIA GPUs#

sudo lshw -class display -json | jq '.[] | select(.description=="3D controller")'

Product-specific information:

sudo lshw -class system -json | jq '.[0]'

3. Verify System Requirements#

Check your Linux distribution, kernel version, and gcc version:

. /etc/os-release && echo "$PRETTY_NAME"   # Linux distribution
uname -r  # Kernel version
gcc --version  # GCC version

Example output:

Ubuntu 24.04.2 LTS
6.8.0-1044-nvidia-64k
gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0

Verify against Table 3: Supported Linux Distributions.

4. Update Linux Kernel Version (If Needed)#

For GB200 systems, use kernel version 6.8.0-1044-nvidia-64k or 6.8.0-1043-nvidia-64k.

If you have a different kernel version, configure grub:

# Update grub default menu entry
sudo sed --in-place=.bak \
  '/^[[:space:]]*GRUB_DEFAULT=/c\GRUB_DEFAULT="Advanced options for Ubuntu>Ubuntu, with Linux 6.8.0-1044-nvidia-64k"' \
  /etc/default/grub

# Verify update
cat /etc/default/grub

# Update grub and reboot
sudo update-grub
sudo reboot

5. Remove NVIDIA Libraries to Avoid Conflicts#

Check for existing NVIDIA libraries:

ls /usr/lib/aarch64-linux-gnu/ | grep -i nvidia

If not empty, remove them:

sudo apt remove --autoremove --purge -Vy \
  cuda-compat\* \
  cuda-drivers\*  \
  libnvidia-cfg1\* \
  libnvidia-compute\* \
  libnvidia-decode\* \
  libnvidia-encode\* \
  libnvidia-extra\* \
  libnvidia-fbc1\* \
  libnvidia-gl\* \
  libnvidia-gpucomp\* \
  libnvidia-nscq\* \
  libnvsdm\* \
  libxnvctrl\* \
  nvidia-dkms\* \
  nvidia-driver\* \
  nvidia-fabricmanager\* \
  nvidia-firmware\* \
  nvidia-headless\* \
  nvidia-imex\* \
  nvidia-kernel\* \
  nvidia-modprobe\* \
  nvidia-open\* \
  nvidia-persistenced\* \
  nvidia-settings\* \
  nvidia-xconfig\* \
  xserver-xorg-video-nvidia\*

6. Download Package Repositories and Install DGX Tools#

Follow Installing DGX System Configurations and Tools:

a. Download and unpack ARM64-specific packages:

curl https://repo.download.nvidia.com/baseos/ubuntu/noble/arm64/dgx-repo-files.tgz | sudo tar xzf - -C /

b. Update local APT database:

sudo apt update

c. Install DGX system tools:

sudo apt install -y nvidia-system-core
sudo apt install -y nvidia-system-utils
sudo apt install -y nvidia-system-extra

d. Install linux-tools for your kernel:

sudo apt install -y linux-tools-nvidia-64k

e. Install NVIDIA peermem loader:

sudo apt install -y nvidia-peermem-loader

7. Install GPU Driver#

Follow Installing the GPU Driver:

a. Pin the driver version:

sudo apt install nvidia-driver-pinning-590

b. Install the open GPU kernel module:

sudo apt install --allow-downgrades \
  nvidia-driver-590-open \
  libnvidia-nscq \
  nvidia-modprobe \
  nvidia-imex \
  datacenter-gpu-manager-4-cuda13 \
  nv-persistence-mode

c. Enable the persistence daemon:

sudo systemctl enable nvidia-persistenced nvidia-dcgm nvidia-imex

d. Reboot:

sudo reboot

8. Verify Driver Installation#

After reboot, repeat Step 1 to check NVIDIA Driver.

9. Install Docker and NVIDIA Container Toolkit#

Follow Installing Docker and the NVIDIA Container Toolkit.

Verify the stack:

sudo docker run --rm --gpus=all nvcr.io/nvidia/cuda:12.6.2-base-ubuntu24.04 nvidia-smi

10. Enable Docker for Non-Root User (Optional)#

See Manage Docker as non-root user.

11. Verify Complete Stack#

a. Log into NGC:

docker login nvcr.io --username '$oauthtoken'

b. Run verification:

sudo docker run --rm --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \
  nvcr.io/nvidia/pytorch:25.12-py3 \
  python -c \
"import torch, pynvml;
pynvml.nvmlInit();
print('Driver:', pynvml.nvmlSystemGetDriverVersion());
print('CUDA:', torch.version.cuda);
print('GPU count:', torch.cuda.device_count())"

Expected output:

NOTE: CUDA Forward Compatibility mode ENABLED.
  Using CUDA 13.1 driver version 590.44.01 with kernel driver version 590.44.01.
  See https://docs.nvidia.com/deploy/cuda-compatibility/ for details.

Driver: 590.44.01
CUDA: 13.1
GPU count: 4

Troubleshooting#

Common Issues#

Driver version mismatch: If nvidia-smi shows an older driver version, ensure you’ve rebooted after installation.

CUDA version mismatch: The driver must support CUDA 13.1 or higher. Check the CUDA version in the nvidia-smi output. If your system shows CUDA 12.x or CUDA 13.0, you need to install driver 590.44.01 or higher.

To verify CUDA compatibility:

  1. Check current driver: nvidia-smi

  2. Verify CUDA version shows 13.1 or higher

  3. If not, refer to NVIDIA CUDA Compatibility

Secure Boot (amd64 systems): If you have Secure Boot enabled, you may need to sign the NVIDIA kernel modules or disable Secure Boot in your BIOS.

Library version conflicts: If you encounter library version conflicts, ensure all old NVIDIA packages are removed before installing the new driver.

Architecture-Specific Troubleshooting#

For amd64 / x86_64 Systems:

Previous driver versions:

# Remove old drivers
sudo apt-get remove --purge nvidia-*
sudo apt-get autoremove

# Verify removal
ls /usr/lib/x86_64-linux-gnu/ | grep -i nvidia

Package conflicts:

# Clean package cache
sudo apt-get clean
sudo apt-get update

# Try installation again
sudo apt-get install -y cuda-drivers

For arm64 / aarch64 DGX Systems:

Kernel version issues:

# Check current kernel
uname -r

# List available kernels
dpkg --list | grep linux-image

# Configure grub to use correct kernel (see Step 4 in installation)

DGX-specific issues:

# Check DGX system status
sudo nvidia-bug-report.sh

# Verify fabricmanager (if using NVSwitch)
systemctl status nvidia-fabricmanager

# Check NVIDIA services
systemctl status nvidia-persistenced
systemctl status nvidia-dcgm

Build errors for older kernel:

  • Ignore build errors for modules built for 6.14.0-1015-nvidia-64k

  • These errors are expected and do not affect functionality

Getting Additional Help#

If you continue to experience issues:

  1. Check NVIDIA driver logs: dmesg | grep -i nvidia

  2. Review Docker logs: sudo journalctl -u docker.service

  3. Consult NVIDIA Driver Installation Guide

  4. For DGX systems: NVIDIA DGX OS 7 User Guide