Prerequisites#
Before you begin using the OpenFold3 NIM, ensure the following requirements described on this page are met.
Begin with a Linux Distribution that supports NVIDIA Driver >=590
The OpenFold3 NIM runs CUDA 13.1, which requires NVIDIA Driver >=590.44.01, see compatibility matrix
To check your OS version, refer to Collect system information below.
Setup an NVIDIA GPU Cloud (NGC) Account, and NGC CLI Tool
Setup a NIM cache
Install NVIDIA Driver - minimum version: 590.44.01
Install Docker - minimum version: 23.0.1
Install the NVIDIA Container Toolkit - minimum version: 1.13.5
The installation and setup workflows work with the following system architectures:
Ubuntu 22.04 / 24.04 and amd64 (x86_64)
Ubuntu 24.04 with arm64 (aarch64)
Without NVSwitch. For systems with NVSwitch, you may need fabricmanager. To get fabricmanager, refer to Installing the GPU Driver.
Known issues#
There are known issues with NVIDIA Driver 580.105.08 on Hopper GPUs subrevision 3.
NGC (NVIDIA GPU Cloud) Account#
Log in to the NVIDIA Container Registry, using your NGC API key as passord
NVIDIA docker images will be used to verify the NVIDIA Driver, CUDA, Docker, and NVIDIA Container Toolkit stack
docker login nvcr.io --username='$oauthtoken'
NGC CLI Tool#
Download the NGC CLI tool for your OS.
Important: Use NGC CLI version
3.41.1or newer. Here is the command to install this on AMD64 Linux in your home directory:wget --content-disposition https://api.ngc.nvidia.com/v2/resources/nvidia/ngc-apps/ngc_cli/versions/3.41.3/files/ngccli_linux.zip -O ~/ngccli_linux.zip && \ unzip ~/ngccli_linux.zip -d ~/ngc && \ chmod u+x ~/ngc/ngc-cli/ngc && \ echo "export PATH=\"\$PATH:~/ngc/ngc-cli\"" >> ~/.bash_profile && source ~/.bash_profile
Set up your NGC CLI Tool locally (You’ll need your API key for this!):
ngc config set
Note: After you enter your API key, you may see multiple options for the org and team. Select as desired or hit enter to accept the default.
Set up your NIM cache#
The NIM needs a directory on your system called the NIM cache, where it can
Download the model artifact (checkpoints and TRT engines)
Read the model artifact if it has been previously downloaded
The NIM cache directory must:
Reside on a disk with at least 15GB of storage
Have a permission state that allows the NIM to read, write, an execute
The NIM cache directory can be set up as follows, if your home directory ‘~’ is on a disk with enough storage.
## Create the NIM cache directory in a location with sufficent storage
mkdir -p ~/.cache/nim
## Set the NIM cache directory permissions to allow all (a) users to read, write, and execute (rwx)
sudo chmod -R a+rwx ~/.cache/nim
Now, you should be able to pull the NIM container, refer to the Getting Started. You won’t be able to run the NIM until completing the installation of the NVIDIA Driver, CUDA, Docker, and the NVIDIA Container Toolkit.
Installing the NVIDIA Driver, CUDA, Docker, and NVIDIA Container Toolkit Stack#
Collect System Information#
Before installation, collect your system information to determine the appropriate installation path.
Determine the OS version:
# Check OS version
cat /etc/os-release
# Example output for Ubuntu:
# NAME="Ubuntu"
# VERSION="24.04.3 LTS (Noble Numbat)"
# ID=ubuntu
# VERSION_ID="24.04"
# Set OS version as environment variable for use in subsequent commands
export OS_VERSION=$( . /etc/os-release && echo "$VERSION_ID" | tr -d '.' )
echo "OS Version: $OS_VERSION"
# Example output for Ubuntu 24.04:
# OS Version: 2404
Determine the GPU model:
# Check GPU model
nvidia-smi | grep -i "NVIDIA" | awk '{print $3, $4}'
# Example output:
# 590.44.01 Driver
# H100 PCIe
If you see a message like Command 'nvidia-smi' not found, then attempt to
determine GPU model with the command below:
# Check GPU model
lspci | grep -i "3D controller"
# Example output:
# 01:00.0 3D controller: NVIDIA Corporation GH100 [H100 SXM5 80GB] (rev a1)
Determine the CPU architecture:
# Set CPU arch as environment variable, on Ubuntu/Debian system
export CPU_ARCH=$(dpkg --print-architecture)
echo "CPU_ARCH: ${CPU_ARCH}"
# Example output:
# amd64
# Set CPU arch as environment variable, on a non-Ubuntu/Debian system
export CPU_ARCH=$(uname -m)
echo "CPU_ARCH: ${CPU_ARCH}"
# Example output:
# x86_64
Installation Instructions by Architecture#
Select the appropriate section based on your CPU architecture identified in the previous step:
Installation for amd64 / x86_64 Systems#
For systems with amd64 or x86_64 CPU architecture (H100, H200, A100, L40S, B200)
1. Find and Download Driver Package#
a. On your local machine (with browser), visit the NVIDIA Drivers download page, and observe the fields in the ‘Manual Driver Search’ dialogue box.
b. Enter your system information:
For H100, H200, A100, L40S:
Field |
Value |
|---|---|
Product Category |
Data Center / Tesla |
Product Series |
H-Series, A-Series, or L-Series |
Product |
H100, H200, A100 |
OS |
Linux 64-bit Ubuntu 24.04 |
CUDA Toolkit Version |
13.1 |
Language |
English (US) |
For B200:
Field |
Value |
|---|---|
Product Category |
Data Center / Tesla |
Product Series |
HGX-Series |
Product |
HGX B200 |
OS |
Linux 64-bit Ubuntu 24.04 |
CUDA Toolkit Version |
13.1 |
Language |
English (US) |
c. Click Find to find driver version 590.44.01 or higher
d. On the results page, click View
e. On the next page, right-click the Download button and select Copy Link Address
Note: Some distributions like Ubuntu, Debian, or RHEL have distribution-specific packages (.deb, .rpm). For other distributions, use the .run installer.
2. Direct Driver URLs#
For Ubuntu 24.04 (Noble):
# Driver 590.44.01 for H100/H200/B200/A100/L40S on x86_64 system
https://us.download.nvidia.com/tesla/590.44.01/nvidia-driver-local-repo-ubuntu2404-590.44.01_1.0-1_amd64.deb
For Ubuntu 22.04 (Jammy):
# Driver 590.44.01 for H100/H200/B200/A100/L40S on x86_64 system
https://us.download.nvidia.com/tesla/590.44.01/nvidia-driver-local-repo-ubuntu2204-590.44.01_1.0-1_amd64.deb
For RHEL 8/Rocky Linux 8:
# Driver 590.44.01 for H100/H200/B200/A100/L40S
https://us.download.nvidia.com/tesla/590.44.01/nvidia-driver-local-repo-rhel8-590.44.01-1.0-1.x86_64.rpm
Important: Always check the NVIDIA Driver Downloads page for the latest driver version compatible with your GPU and OS.
3. Check and Purge Old Drivers (Optional but Recommended)#
# Check current driver version
nvidia-smi --query-gpu=driver_version --format=csv,noheader 2>/dev/null || echo "No driver installed"
# If you have an older driver (< 590), purge it to prevent conflicts
sudo apt-get remove --purge nvidia-* -y
sudo apt-get autoremove -y
Important: This step prevents driver library version conflicts. If you have an existing NVIDIA driver older than version 590, we recommend purging it before installing the new driver.
4. Download the Driver#
# Download driver using OS_VERSION environment variable
# For Ubuntu (automatically uses correct version: 2204, 2404, etc.)
wget https://us.download.nvidia.com/tesla/590.44.01/nvidia-driver-local-repo-ubuntu${OS_VERSION}-590.44.01_1.0-1_${CPU_ARCH}.deb
5. Install the Local Repository#
For Ubuntu/Debian:
sudo dpkg -i nvidia-driver-local-repo-ubuntu${OS_VERSION}-590.44.01_1.0-1_${CPU_ARCH}.deb
For RHEL/CentOS/Rocky Linux:
sudo rpm -i nvidia-driver-local-repo-rhel8-590.44.01-1.0-1.${CPU_ARCH}.rpm
6. Update Package Lists and Install Driver#
For Ubuntu/Debian:
# Copy the GPG key
sudo cp /var/nvidia-driver-local-repo-ubuntu${OS_VERSION}-590.44.01/nvidia-driver-local-*-keyring.gpg /usr/share/keyrings/
# Update package cache
sudo apt-get update
# Install the driver
sudo apt-get install -y cuda-drivers
For RHEL/CentOS/Rocky Linux:
# Update package cache
sudo dnf clean all
sudo dnf makecache
# Install the driver
sudo dnf install -y cuda-drivers
7. Reboot System#
sudo reboot
8. Verify Driver Installation#
After reboot, verify the driver:
nvidia-smi
Expected output:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 590.44.01 Driver Version: 590.44.01 CUDA Version: 13.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA H100 PCIe Off | 00001E:00:00.0 Off | 0 |
| N/A 30C P0 68W / 350W | 0MiB / 81559MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
9. Install Docker#
Verify Docker is installed with version >=23.0.1:
docker --version
# Example output:
# Docker version 29.1.3, build f52814d
If Docker is not installed or does not meet requirements:
For Ubuntu: Follow the instructions in Install using the apt repository
For other distributions: Refer to docs.docker.com/engine/install
10. Install NVIDIA Container Toolkit#
Verify the NVIDIA Container Toolkit:
nvidia-container-cli --version
If not installed:
Configure Docker: Configuring Docker
11. Verify the Complete Stack#
sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
Example output:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 590.44.01 Driver Version: 590.44.01 CUDA Version: 13.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA H100 ... Off | 00000000:01:00.0 Off | N/A |
| 41% 30C P8 1W / 260W | 2244MiB / 81559MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
Note: For more information on enumerating multi-GPU systems, refer to the NVIDIA Container Toolkit’s GPU Enumeration Docs
Installation for arm64 / aarch64 DGX Systems#
For arm64 / aarch64 DGX Systems (e.g., DGX GB200 Compute Tray)
Note: These steps follow the NVIDIA DGX OS 7 User Guide: Installing the GPU Driver, customized for DGX GB200 Compute Tray with:
2x Grace CPUs (arm64 / aarch64)
4x Blackwell GPUs
Ubuntu 24.04
Linux kernel version 6.8.0-1044-nvidia-64k
1. Check NVIDIA Driver State#
Check the running driver version:
nvidia-smi
Example successful output:
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 590.44.01 Driver Version: 590.44.01 CUDA Version: 13.1 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GB200 On | 00000008:01:00.0 Off | 0 |
| N/A 29C P0 130W / 1200W | 0MiB / 189471MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
If the running driver version is 590+, skip to Step 9
If
nvidia-smifails, proceed to Step 2
2. Confirm OS Sees NVIDIA GPUs#
sudo lshw -class display -json | jq '.[] | select(.description=="3D controller")'
Product-specific information:
sudo lshw -class system -json | jq '.[0]'
3. Verify System Requirements#
Check your Linux distribution, kernel version, and gcc version:
. /etc/os-release && echo "$PRETTY_NAME" # Linux distribution
uname -r # Kernel version
gcc --version # GCC version
Example output:
Ubuntu 24.04.2 LTS
6.8.0-1044-nvidia-64k
gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0
Verify against Table 3: Supported Linux Distributions.
4. Update Linux Kernel Version (If Needed)#
For GB200 systems, use kernel version 6.8.0-1044-nvidia-64k or 6.8.0-1043-nvidia-64k.
If you have a different kernel version, configure grub:
# Update grub default menu entry
sudo sed --in-place=.bak \
'/^[[:space:]]*GRUB_DEFAULT=/c\GRUB_DEFAULT="Advanced options for Ubuntu>Ubuntu, with Linux 6.8.0-1044-nvidia-64k"' \
/etc/default/grub
# Verify update
cat /etc/default/grub
# Update grub and reboot
sudo update-grub
sudo reboot
5. Remove NVIDIA Libraries to Avoid Conflicts#
Check for existing NVIDIA libraries:
ls /usr/lib/aarch64-linux-gnu/ | grep -i nvidia
If not empty, remove them:
sudo apt remove --autoremove --purge -Vy \
cuda-compat\* \
cuda-drivers\* \
libnvidia-cfg1\* \
libnvidia-compute\* \
libnvidia-decode\* \
libnvidia-encode\* \
libnvidia-extra\* \
libnvidia-fbc1\* \
libnvidia-gl\* \
libnvidia-gpucomp\* \
libnvidia-nscq\* \
libnvsdm\* \
libxnvctrl\* \
nvidia-dkms\* \
nvidia-driver\* \
nvidia-fabricmanager\* \
nvidia-firmware\* \
nvidia-headless\* \
nvidia-imex\* \
nvidia-kernel\* \
nvidia-modprobe\* \
nvidia-open\* \
nvidia-persistenced\* \
nvidia-settings\* \
nvidia-xconfig\* \
xserver-xorg-video-nvidia\*
6. Download Package Repositories and Install DGX Tools#
Follow Installing DGX System Configurations and Tools:
a. Download and unpack ARM64-specific packages:
curl https://repo.download.nvidia.com/baseos/ubuntu/noble/arm64/dgx-repo-files.tgz | sudo tar xzf - -C /
b. Update local APT database:
sudo apt update
c. Install DGX system tools:
sudo apt install -y nvidia-system-core
sudo apt install -y nvidia-system-utils
sudo apt install -y nvidia-system-extra
d. Install linux-tools for your kernel:
sudo apt install -y linux-tools-nvidia-64k
e. Install NVIDIA peermem loader:
sudo apt install -y nvidia-peermem-loader
7. Install GPU Driver#
Follow Installing the GPU Driver:
a. Pin the driver version:
sudo apt install nvidia-driver-pinning-590
b. Install the open GPU kernel module:
sudo apt install --allow-downgrades \
nvidia-driver-590-open \
libnvidia-nscq \
nvidia-modprobe \
nvidia-imex \
datacenter-gpu-manager-4-cuda13 \
nv-persistence-mode
c. Enable the persistence daemon:
sudo systemctl enable nvidia-persistenced nvidia-dcgm nvidia-imex
d. Reboot:
sudo reboot
8. Verify Driver Installation#
After reboot, repeat Step 1 to check NVIDIA Driver.
9. Install Docker and NVIDIA Container Toolkit#
Follow Installing Docker and the NVIDIA Container Toolkit.
Verify the stack:
sudo docker run --rm --gpus=all nvcr.io/nvidia/cuda:12.6.2-base-ubuntu24.04 nvidia-smi
10. Enable Docker for Non-Root User (Optional)#
11. Verify Complete Stack#
a. Log into NGC:
docker login nvcr.io --username '$oauthtoken'
b. Run verification:
sudo docker run --rm --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \
nvcr.io/nvidia/pytorch:25.12-py3 \
python -c \
"import torch, pynvml;
pynvml.nvmlInit();
print('Driver:', pynvml.nvmlSystemGetDriverVersion());
print('CUDA:', torch.version.cuda);
print('GPU count:', torch.cuda.device_count())"
Expected output:
NOTE: CUDA Forward Compatibility mode ENABLED.
Using CUDA 13.1 driver version 590.44.01 with kernel driver version 590.44.01.
See https://docs.nvidia.com/deploy/cuda-compatibility/ for details.
Driver: 590.44.01
CUDA: 13.1
GPU count: 4
Running on Slurm With Enroot#
This section describes how to run the OpenFold3 NIM on a Slurm-based HPC cluster using Enroot to run the NGC container.
Environment#
Enroot: Version 3.4.1 or newer is supported.
1. Check Enroot is Available#
On the login node, verify Enroot is installed:
enroot version
which enroot
If Enroot is not installed, contact your cluster administrator. Typical path: /usr/bin/enroot.
Create the Enroot config directory if it does not exist:
mkdir -p ~/.config/enroot
2. Create and Add NGC Credentials#
Create the credentials file so Enroot can pull images from NGC. Replace YOUR_NGC_API_KEY with your NGC API key:
echo 'machine nvcr.io login $oauthtoken password YOUR_NGC_API_KEY' > ~/.config/enroot/.credentials
Restrict permissions and confirm the file content:
chmod 600 ~/.config/enroot/.credentials
cat ~/.config/enroot/.credentials
Note: Use vi ~/.config/enroot/.credentials or another editor if you prefer not to echo the key on the command line.
3. Configure and Run the Slurm Job Script#
Copy the script below to a file, such as run_openfold3_slurm.sh. Set the variables at the top for your cluster and paths:
ACCOUNT: Slurm account name
PARTITION: Slurm partition (for example,
interactiveorgpu)ROOT_DIR: Your working directory on the cluster (for example, Lustre or NFS path)
NGC_IMAGE: NGC image with tag; use
#between registry and image path for Enroot (refer to the example script below)NGC_API_KEY: Export this in your environment before running, or ensure it is available inside the job for
enroot import
Use a shared filesystem path for ROOT_DIR and CACHE_PATH so that the SQSH image and cache are visible from the node where the job runs. Use a node-local or large-quota path for TMP_PATH if /tmp has strict quotas (e.g. Lustre scratch).
Save the following as run_openfold3_slurm.sh and run it with bash run_openfold3_slurm.sh to avoid copy-paste or quoting issues:
#!/bin/bash
# --- Slurm job configuration (customize for your cluster) ---
ACCOUNT="your_slurm_account"
PARTITION="interactive"
GPUS_PER_NODE=1
MEMORY="32G"
TIME="04:00:00"
# --- Paths (customize for your cluster) ---
# Root directory on shared storage (e.g. Lustre/NFS)
ROOT_DIR="/path/to/your/workspace"
# NIM cache on shared storage
CACHE_PATH="${ROOT_DIR}/.nim_cache"
WORKING_CACHE_DIR="/opt/nim/.cache"
# Project/data mount inside container
WORKING_PATH="${ROOT_DIR}"
MOUNT_PATH="/workspace/data"
# Temp directory (use shared storage if node /tmp has quota limits)
TMP_PATH="${ROOT_DIR}/tmp"
MOUNT_TMP_PATH="${MOUNT_PATH}/tmp"
# --- NGC image (use # between registry and image path for Enroot) ---
NGC_IMAGE="docker://nvcr.io#nim/openfold/openfold3:1.4.0"
CONTAINER_NAME="openfold3"
ACTIVE_SQSH_PATH="${ROOT_DIR}/openfold3.sqsh"
# --- Helpers ---
handle_error() {
echo "Error: $1"
exit 1
}
mkdir -p "$(dirname "${ACTIVE_SQSH_PATH}")" || handle_error "Failed to create directory for SQSH file"
mkdir -p "${CACHE_PATH}" || handle_error "Failed to create cache directory"
mkdir -p "${TMP_PATH}" || handle_error "Failed to create tmp directory"
echo "Step 1: Requesting interactive resources..."
srun --account="${ACCOUNT}" \
--partition="${PARTITION}" \
--gpus-per-node="${GPUS_PER_NODE}" \
--mem="${MEMORY}" \
--time="${TIME}" \
--export=ALL \
-o /dev/tty -e /dev/tty \
bash -c "
echo \"Loading environment...\"
# Step 2: Import Docker image (if not present)
echo \"Step 2: Importing Docker image\"
if [ ! -f \"${ACTIVE_SQSH_PATH}\" ]; then
echo \"Importing from NGC...\"
mkdir -p \"\$(dirname \"${ACTIVE_SQSH_PATH}\")\" || { echo \"Failed to create directory\"; exit 1; }
enroot import -o \"${ACTIVE_SQSH_PATH}\" ${NGC_IMAGE} || { echo \"Failed to import Docker image\"; exit 1; }
else
echo \"Docker image already exists, skipping import.\"
fi
# Step 3: Create Enroot container
echo \"Step 3: Creating Enroot container...\"
if enroot list 2>/dev/null | grep -q \"${CONTAINER_NAME}\"; then
echo \"Removing existing container...\"
enroot remove -f ${CONTAINER_NAME} || true
fi
enroot create --name ${CONTAINER_NAME} \"${ACTIVE_SQSH_PATH}\" || { echo \"Failed to create container\"; exit 1; }
# Step 4: Start container and NIM server
NODE=\$(hostname)
echo \"\$NODE\" > \"${WORKING_PATH}/.openfold3_node\"
echo \"Step 4: OpenFold3 container ready.\"
echo \"=====================================\"
echo \"Working directory in container: ${MOUNT_PATH}\"
echo \"\"
echo \"From another terminal (login node or your machine), run:\"
echo \" ssh -L 8000:localhost:8000 \$NODE\"
echo \"Then keep that SSH session open and call the API (see Step 4 in docs).\"
echo \"Node name saved to: ${WORKING_PATH}/.openfold3_node\"
echo \"Type exit to leave the container.\"
echo \"=====================================\"
cat > \"\${TMPDIR:-/tmp}/rc.local\" << RCEOF
#!/bin/sh
export TMPDIR=${MOUNT_TMP_PATH}
export TEMP=${MOUNT_TMP_PATH}
export TMP=${MOUNT_TMP_PATH}
export HOME=${MOUNT_PATH}
export XDG_CACHE_HOME=${WORKING_CACHE_DIR}
export XDG_DATA_HOME=${MOUNT_PATH}/.local/share
mkdir -p ${MOUNT_TMP_PATH} ${MOUNT_PATH}/.local/share
/opt/nim/start_server.sh &
exec /bin/bash
RCEOF
chmod +x \"\${TMPDIR:-/tmp}/rc.local\"
enroot start \\
--mount ${CACHE_PATH}:${WORKING_CACHE_DIR} \\
--mount ${WORKING_PATH}:${MOUNT_PATH} \\
--mount \"\${TMPDIR:-/tmp}/rc.local:/etc/rc.local\" \\
-e NGC_API_KEY \\
-e TMPDIR=${MOUNT_TMP_PATH} \\
-e TEMP=${MOUNT_TMP_PATH} \\
-e TMP=${MOUNT_TMP_PATH} \\
-e HOME=${MOUNT_PATH} \\
-e XDG_CACHE_HOME=${WORKING_CACHE_DIR} \\
-e XDG_DATA_HOME=${MOUNT_PATH}/.local/share \\
${CONTAINER_NAME} || { echo \"Failed to start container\"; exit 1; }
echo \"=====================================\"
echo \"Workflow completed.\"
"
Submit the job (interactive):
bash run_openfold3_slurm.sh
Or, for a batch job, wrap the same script body in a sbatch script and set ACCOUNT, PARTITION, and other Slurm directives as needed.
4. Call the API From Your Machine#
After the container is running on a compute node, the script prints the node name, such as batch-block1-2106, and saves it to ROOT_DIR/.openfold3_node.
From your laptop or the login node, create an SSH tunnel to that node:
ssh -L 8000:localhost:8000 $(cat /path/to/your/workspace/.openfold3_node)
Or use the node name directly if you know it:
ssh -L 8000:localhost:8000 <compute-node-name>
In another terminal (with the tunnel still open), run a test request:
curl -s -X POST "http://localhost:8000/biology/openfold/openfold3/predict" \ -H "Content-Type: application/json" \ -d '{"inputs":[{"input_id":"my_first_prediction","molecules":[{"type":"protein","sequence":"MKTVRQERLKSIVR","msa":{"main":{"a3m":{"alignment":">query\nMKTVRQERLKSIVR","format":"a3m"}}}}],"output_format":"pdb"}]}' \ --max-time 300
If the request succeeds, you will get a JSON response containing the prediction result.
Troubleshooting#
Common Issues#
Driver version mismatch: If nvidia-smi shows an older driver version, ensure you’ve rebooted after installation.
CUDA version mismatch: The driver must support CUDA 13.1 or higher. Check the CUDA version in the nvidia-smi output. If your system shows CUDA 12.x or CUDA 13.0, you need to install driver 590.44.01 or higher.
To verify CUDA compatibility:
Check current driver:
nvidia-smiVerify CUDA version shows 13.1 or higher
If not, refer to NVIDIA CUDA Compatibility
Secure Boot (amd64 systems): If you have Secure Boot enabled, you may need to sign the NVIDIA kernel modules or disable Secure Boot in your BIOS.
Library version conflicts: If you encounter library version conflicts, ensure all old NVIDIA packages are removed before installing the new driver.
Architecture-Specific Troubleshooting#
For amd64 / x86_64 Systems:
Previous driver versions:
# Remove old drivers
sudo apt-get remove --purge nvidia-*
sudo apt-get autoremove
# Verify removal
ls /usr/lib/x86_64-linux-gnu/ | grep -i nvidia
Package conflicts:
# Clean package cache
sudo apt-get clean
sudo apt-get update
# Try installation again
sudo apt-get install -y cuda-drivers
For arm64 / aarch64 DGX Systems:
Kernel version issues:
# Check current kernel
uname -r
# List available kernels
dpkg --list | grep linux-image
# Configure grub to use correct kernel (see Step 4 in installation)
DGX-specific issues:
# Check DGX system status
sudo nvidia-bug-report.sh
# Verify fabricmanager (if using NVSwitch)
systemctl status nvidia-fabricmanager
# Check NVIDIA services
systemctl status nvidia-persistenced
systemctl status nvidia-dcgm
Build errors for older kernel:
Ignore build errors for modules built for
6.14.0-1015-nvidia-64kThese errors are expected and do not affect functionality
Getting Additional Help#
If you continue to experience issues:
Check NVIDIA driver logs:
dmesg | grep -i nvidiaReview Docker logs:
sudo journalctl -u docker.serviceConsult NVIDIA Driver Installation Guide
For DGX systems: NVIDIA DGX OS 7 User Guide