Prerequisites#
Before you begin using the OpenFold3 NIM, ensure the following requirements described on this page are met.
Begin with a Linux Distribution that supports NVIDIA Driver >=590
The OpenFold3 NIM runs CUDA 13.1, which requires NVIDIA Driver >=590.44.01, see compatibility matrix
To check your OS version, refer to Collect system information below.
Setup an NVIDIA GPU Cloud (NGC) Account, and NGC CLI Tool
Setup a NIM cache
Install NVIDIA Driver - minimum version: 590.44.01
Install Docker - minimum version: 23.0.1
Install the NVIDIA Container Toolkit - minimum version: 1.13.5
The installation and setup workflows work with the following system architectures:
Ubuntu 22.04 / 24.04 and amd64 (x86_64)
Ubuntu 24.04 with arm64 (aarch64)
Without NVSwitch. For systems with NVSwitch, you may need fabricmanager. To get fabricmanager, refer to Installing the GPU Driver.
Known issues#
There are known issues with NVIDIA Driver 580.105.08 on Hopper GPUs subrevision 3.
NGC (NVIDIA GPU Cloud) Account#
Log in to the NVIDIA Container Registry, using your NGC API key as passord
NVIDIA docker images will be used to verify the NVIDIA Driver, CUDA, Docker, and NVIDIA Container Toolkit stack
docker login nvcr.io --username='$oauthtoken'
NGC CLI Tool#
Download the NGC CLI tool for your OS.
Important: Use NGC CLI version
3.41.1or newer. Here is the command to install this on AMD64 Linux in your home directory:wget --content-disposition https://api.ngc.nvidia.com/v2/resources/nvidia/ngc-apps/ngc_cli/versions/3.41.3/files/ngccli_linux.zip -O ~/ngccli_linux.zip && \ unzip ~/ngccli_linux.zip -d ~/ngc && \ chmod u+x ~/ngc/ngc-cli/ngc && \ echo "export PATH=\"\$PATH:~/ngc/ngc-cli\"" >> ~/.bash_profile && source ~/.bash_profile
Set up your NGC CLI Tool locally (You’ll need your API key for this!):
ngc config set
Note: After you enter your API key, you may see multiple options for the org and team. Select as desired or hit enter to accept the default.
Set up your NIM cache#
The NIM needs a directory on your system called the NIM cache, where it can
Download the model artifact (checkpoints and TRT engines)
Read the model artifact if it has been previously downloaded
The NIM cache directory must:
Reside on a disk with at least 15GB of storage
Have a permission state that allows the NIM to read, write, an execute
The NIM cache directory can be set up as follows, if your home directory ‘~’ is on a disk with enough storage.
## Create the NIM cache directory in a location with sufficent storage
mkdir -p ~/.cache/nim
## Set the NIM cache directory permissions to allow all (a) users to read, write, and execute (rwx)
sudo chmod -R a+rwx ~/.cache/nim
Now, you should be able to pull the NIM container, refer to the Getting Started. You won’t be able to run the NIM until completing the installation of the NVIDIA Driver, CUDA, Docker, and the NVIDIA Container Toolkit.
Installing the NVIDIA Driver, CUDA, Docker, and NVIDIA Container Toolkit Stack#
Collect System Information#
Before installation, collect your system information to determine the appropriate installation path.
Determine the OS version:
# Check OS version
cat /etc/os-release
# Example output for Ubuntu:
# NAME="Ubuntu"
# VERSION="24.04.3 LTS (Noble Numbat)"
# ID=ubuntu
# VERSION_ID="24.04"
# Set OS version as environment variable for use in subsequent commands
export OS_VERSION=$( . /etc/os-release && echo "$VERSION_ID" | tr -d '.' )
echo "OS Version: $OS_VERSION"
# Example output for Ubuntu 24.04:
# OS Version: 2404
Determine the GPU model:
# Check GPU model
nvidia-smi | grep -i "NVIDIA" | awk '{print $3, $4}'
# Example output:
# 590.44.01 Driver
# H100 PCIe
If you see a message like Command 'nvidia-smi' not found, then attempt to
determine GPU model with the command below:
# Check GPU model
lspci | grep -i "3D controller"
# Example output:
# 01:00.0 3D controller: NVIDIA Corporation GH100 [H100 SXM5 80GB] (rev a1)
Determine the CPU architecture:
# Set CPU arch as environment variable, on Ubuntu/Debian system
export CPU_ARCH=$(dpkg --print-architecture)
echo "CPU_ARCH: ${CPU_ARCH}"
# Example output:
# amd64
# Set CPU arch as environment variable, on a non-Ubuntu/Debian system
export CPU_ARCH=$(uname -m)
echo "CPU_ARCH: ${CPU_ARCH}"
# Example output:
# x86_64
Installation Instructions by Architecture#
Select the appropriate section based on your CPU architecture identified in the previous step:
Installation for amd64 / x86_64 Systems#
For systems with amd64 or x86_64 CPU architecture (H100, H200, A100, L40S, B200)
1. Find and Download Driver Package#
a. On your local machine (with browser), visit the NVIDIA Drivers download page, and observe the fields in the ‘Manual Driver Search’ dialogue box.
b. Enter your system information:
For H100, H200, A100, L40S:
Field |
Value |
|---|---|
Product Category |
Data Center / Tesla |
Product Series |
H-Series, A-Series, or L-Series |
Product |
H100, H200, A100 |
OS |
Linux 64-bit Ubuntu 24.04 |
CUDA Toolkit Version |
13.1 |
Language |
English (US) |
For B200:
Field |
Value |
|---|---|
Product Category |
Data Center / Tesla |
Product Series |
HGX-Series |
Product |
HGX B200 |
OS |
Linux 64-bit Ubuntu 24.04 |
CUDA Toolkit Version |
13.1 |
Language |
English (US) |
c. Click Find to find driver version 590.44.01 or higher
d. On the results page, click View
e. On the next page, right-click the Download button and select Copy Link Address
Note: Some distributions like Ubuntu, Debian, or RHEL have distribution-specific packages (.deb, .rpm). For other distributions, use the .run installer.
2. Direct Driver URLs#
For Ubuntu 24.04 (Noble):
# Driver 590.44.01 for H100/H200/B200/A100/L40S on x86_64 system
https://us.download.nvidia.com/tesla/590.44.01/nvidia-driver-local-repo-ubuntu2404-590.44.01_1.0-1_amd64.deb
For Ubuntu 22.04 (Jammy):
# Driver 590.44.01 for H100/H200/B200/A100/L40S on x86_64 system
https://us.download.nvidia.com/tesla/590.44.01/nvidia-driver-local-repo-ubuntu2204-590.44.01_1.0-1_amd64.deb
For RHEL 8/Rocky Linux 8:
# Driver 590.44.01 for H100/H200/B200/A100/L40S
https://us.download.nvidia.com/tesla/590.44.01/nvidia-driver-local-repo-rhel8-590.44.01-1.0-1.x86_64.rpm
Important: Always check the NVIDIA Driver Downloads page for the latest driver version compatible with your GPU and OS.
3. Check and Purge Old Drivers (Optional but Recommended)#
# Check current driver version
nvidia-smi --query-gpu=driver_version --format=csv,noheader 2>/dev/null || echo "No driver installed"
# If you have an older driver (< 590), purge it to prevent conflicts
sudo apt-get remove --purge nvidia-* -y
sudo apt-get autoremove -y
Important: This step prevents driver library version conflicts. If you have an existing NVIDIA driver older than version 590, we recommend purging it before installing the new driver.
4. Download the Driver#
# Download driver using OS_VERSION environment variable
# For Ubuntu (automatically uses correct version: 2204, 2404, etc.)
wget https://us.download.nvidia.com/tesla/590.44.01/nvidia-driver-local-repo-ubuntu${OS_VERSION}-590.44.01_1.0-1_${CPU_ARCH}.deb
5. Install the Local Repository#
For Ubuntu/Debian:
sudo dpkg -i nvidia-driver-local-repo-ubuntu${OS_VERSION}-590.44.01_1.0-1_${CPU_ARCH}.deb
For RHEL/CentOS/Rocky Linux:
sudo rpm -i nvidia-driver-local-repo-rhel8-590.44.01-1.0-1.${CPU_ARCH}.rpm
6. Update Package Lists and Install Driver#
For Ubuntu/Debian:
# Copy the GPG key
sudo cp /var/nvidia-driver-local-repo-ubuntu${OS_VERSION}-590.44.01/nvidia-driver-local-*-keyring.gpg /usr/share/keyrings/
# Update package cache
sudo apt-get update
# Install the driver
sudo apt-get install -y cuda-drivers
For RHEL/CentOS/Rocky Linux:
# Update package cache
sudo dnf clean all
sudo dnf makecache
# Install the driver
sudo dnf install -y cuda-drivers
7. Reboot System#
sudo reboot
8. Verify Driver Installation#
After reboot, verify the driver:
nvidia-smi
Expected output:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 590.44.01 Driver Version: 590.44.01 CUDA Version: 13.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA H100 PCIe Off | 00001E:00:00.0 Off | 0 |
| N/A 30C P0 68W / 350W | 0MiB / 81559MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
9. Install Docker#
Verify Docker is installed with version >=23.0.1:
docker --version
# Example output:
# Docker version 29.1.3, build f52814d
If Docker is not installed or does not meet requirements:
For Ubuntu: Follow the instructions in Install using the apt repository
For other distributions: Refer to docs.docker.com/engine/install
10. Install NVIDIA Container Toolkit#
Verify the NVIDIA Container Toolkit:
nvidia-container-cli --version
If not installed:
Configure Docker: Configuring Docker
11. Verify the Complete Stack#
sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
Example output:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 590.44.01 Driver Version: 590.44.01 CUDA Version: 13.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA H100 ... Off | 00000000:01:00.0 Off | N/A |
| 41% 30C P8 1W / 260W | 2244MiB / 81559MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
Note: For more information on enumerating multi-GPU systems, refer to the NVIDIA Container Toolkit’s GPU Enumeration Docs
Installation for arm64 / aarch64 DGX Systems#
For arm64 / aarch64 DGX Systems (e.g., DGX GB200 Compute Tray)
Note: These steps follow the NVIDIA DGX OS 7 User Guide: Installing the GPU Driver, customized for DGX GB200 Compute Tray with:
2x Grace CPUs (arm64 / aarch64)
4x Blackwell GPUs
Ubuntu 24.04
Linux kernel version 6.8.0-1044-nvidia-64k
1. Check NVIDIA Driver State#
Check the running driver version:
nvidia-smi
Example successful output:
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 590.44.01 Driver Version: 590.44.01 CUDA Version: 13.1 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GB200 On | 00000008:01:00.0 Off | 0 |
| N/A 29C P0 130W / 1200W | 0MiB / 189471MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
If the running driver version is 590+, skip to Step 9
If
nvidia-smifails, proceed to Step 2
2. Confirm OS Sees NVIDIA GPUs#
sudo lshw -class display -json | jq '.[] | select(.description=="3D controller")'
Product-specific information:
sudo lshw -class system -json | jq '.[0]'
3. Verify System Requirements#
Check your Linux distribution, kernel version, and gcc version:
. /etc/os-release && echo "$PRETTY_NAME" # Linux distribution
uname -r # Kernel version
gcc --version # GCC version
Example output:
Ubuntu 24.04.2 LTS
6.8.0-1044-nvidia-64k
gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0
Verify against Table 3: Supported Linux Distributions.
4. Update Linux Kernel Version (If Needed)#
For GB200 systems, use kernel version 6.8.0-1044-nvidia-64k or 6.8.0-1043-nvidia-64k.
If you have a different kernel version, configure grub:
# Update grub default menu entry
sudo sed --in-place=.bak \
'/^[[:space:]]*GRUB_DEFAULT=/c\GRUB_DEFAULT="Advanced options for Ubuntu>Ubuntu, with Linux 6.8.0-1044-nvidia-64k"' \
/etc/default/grub
# Verify update
cat /etc/default/grub
# Update grub and reboot
sudo update-grub
sudo reboot
5. Remove NVIDIA Libraries to Avoid Conflicts#
Check for existing NVIDIA libraries:
ls /usr/lib/aarch64-linux-gnu/ | grep -i nvidia
If not empty, remove them:
sudo apt remove --autoremove --purge -Vy \
cuda-compat\* \
cuda-drivers\* \
libnvidia-cfg1\* \
libnvidia-compute\* \
libnvidia-decode\* \
libnvidia-encode\* \
libnvidia-extra\* \
libnvidia-fbc1\* \
libnvidia-gl\* \
libnvidia-gpucomp\* \
libnvidia-nscq\* \
libnvsdm\* \
libxnvctrl\* \
nvidia-dkms\* \
nvidia-driver\* \
nvidia-fabricmanager\* \
nvidia-firmware\* \
nvidia-headless\* \
nvidia-imex\* \
nvidia-kernel\* \
nvidia-modprobe\* \
nvidia-open\* \
nvidia-persistenced\* \
nvidia-settings\* \
nvidia-xconfig\* \
xserver-xorg-video-nvidia\*
6. Download Package Repositories and Install DGX Tools#
Follow Installing DGX System Configurations and Tools:
a. Download and unpack ARM64-specific packages:
curl https://repo.download.nvidia.com/baseos/ubuntu/noble/arm64/dgx-repo-files.tgz | sudo tar xzf - -C /
b. Update local APT database:
sudo apt update
c. Install DGX system tools:
sudo apt install -y nvidia-system-core
sudo apt install -y nvidia-system-utils
sudo apt install -y nvidia-system-extra
d. Install linux-tools for your kernel:
sudo apt install -y linux-tools-nvidia-64k
e. Install NVIDIA peermem loader:
sudo apt install -y nvidia-peermem-loader
7. Install GPU Driver#
Follow Installing the GPU Driver:
a. Pin the driver version:
sudo apt install nvidia-driver-pinning-590
b. Install the open GPU kernel module:
sudo apt install --allow-downgrades \
nvidia-driver-590-open \
libnvidia-nscq \
nvidia-modprobe \
nvidia-imex \
datacenter-gpu-manager-4-cuda13 \
nv-persistence-mode
c. Enable the persistence daemon:
sudo systemctl enable nvidia-persistenced nvidia-dcgm nvidia-imex
d. Reboot:
sudo reboot
8. Verify Driver Installation#
After reboot, repeat Step 1 to check NVIDIA Driver.
9. Install Docker and NVIDIA Container Toolkit#
Follow Installing Docker and the NVIDIA Container Toolkit.
Verify the stack:
sudo docker run --rm --gpus=all nvcr.io/nvidia/cuda:12.6.2-base-ubuntu24.04 nvidia-smi
10. Enable Docker for Non-Root User (Optional)#
11. Verify Complete Stack#
a. Log into NGC:
docker login nvcr.io --username '$oauthtoken'
b. Run verification:
sudo docker run --rm --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \
nvcr.io/nvidia/pytorch:25.12-py3 \
python -c \
"import torch, pynvml;
pynvml.nvmlInit();
print('Driver:', pynvml.nvmlSystemGetDriverVersion());
print('CUDA:', torch.version.cuda);
print('GPU count:', torch.cuda.device_count())"
Expected output:
NOTE: CUDA Forward Compatibility mode ENABLED.
Using CUDA 13.1 driver version 590.44.01 with kernel driver version 590.44.01.
See https://docs.nvidia.com/deploy/cuda-compatibility/ for details.
Driver: 590.44.01
CUDA: 13.1
GPU count: 4
Troubleshooting#
Common Issues#
Driver version mismatch: If nvidia-smi shows an older driver version, ensure you’ve rebooted after installation.
CUDA version mismatch: The driver must support CUDA 13.1 or higher. Check the CUDA version in the nvidia-smi output. If your system shows CUDA 12.x or CUDA 13.0, you need to install driver 590.44.01 or higher.
To verify CUDA compatibility:
Check current driver:
nvidia-smiVerify CUDA version shows 13.1 or higher
If not, refer to NVIDIA CUDA Compatibility
Secure Boot (amd64 systems): If you have Secure Boot enabled, you may need to sign the NVIDIA kernel modules or disable Secure Boot in your BIOS.
Library version conflicts: If you encounter library version conflicts, ensure all old NVIDIA packages are removed before installing the new driver.
Architecture-Specific Troubleshooting#
For amd64 / x86_64 Systems:
Previous driver versions:
# Remove old drivers
sudo apt-get remove --purge nvidia-*
sudo apt-get autoremove
# Verify removal
ls /usr/lib/x86_64-linux-gnu/ | grep -i nvidia
Package conflicts:
# Clean package cache
sudo apt-get clean
sudo apt-get update
# Try installation again
sudo apt-get install -y cuda-drivers
For arm64 / aarch64 DGX Systems:
Kernel version issues:
# Check current kernel
uname -r
# List available kernels
dpkg --list | grep linux-image
# Configure grub to use correct kernel (see Step 4 in installation)
DGX-specific issues:
# Check DGX system status
sudo nvidia-bug-report.sh
# Verify fabricmanager (if using NVSwitch)
systemctl status nvidia-fabricmanager
# Check NVIDIA services
systemctl status nvidia-persistenced
systemctl status nvidia-dcgm
Build errors for older kernel:
Ignore build errors for modules built for
6.14.0-1015-nvidia-64kThese errors are expected and do not affect functionality
Getting Additional Help#
If you continue to experience issues:
Check NVIDIA driver logs:
dmesg | grep -i nvidiaReview Docker logs:
sudo journalctl -u docker.serviceConsult NVIDIA Driver Installation Guide
For DGX systems: NVIDIA DGX OS 7 User Guide