Prerequisites#

Before you begin using the OpenFold3 NIM, ensure the following requirements described on this page are met.

Known issues#

Installing NVIDIA Drivers#

To install NVIDIA drivers, you can use either your local or remote machine.

Option 1: Interactive Installation (Local Machine)#

Visit the NVIDIA Drivers download page and use the dropdown menu to select your GPU and operating system to download the appropriate driver.

Option 2: Command Line Installation (Remote Machine/SSH)#

If you’re working on a remote machine over SSH, you can download and install the driver using command line tools:

  1. Determine your GPU model and OS version:

# Check GPU model
lspci | grep -i nvidia

# Example output:
# 00:1e.0 3D controller: NVIDIA Corporation GH100 [H100 PCIe] (rev a1)

# Check OS version
cat /etc/os-release

# Example output for Ubuntu:
# NAME="Ubuntu"
# VERSION="22.04.3 LTS (Jammy Jellyfish)"
# ID=ubuntu
# VERSION_ID="22.04"
  1. Find the driver 580+ download link:

    a. On your local machine (with browser), visit the NVIDIA Drivers download page

    b. Select your GPU information:

    • Product Type: Tesla (for datacenter GPUs like H100, A100) or GeForce (for consumer GPUs)

    • Product Series: H100, A100, or your specific GPU series

    • Operating System: Linux 64-bit

    • Download Type: Production Branch

    • Language: English (US)

    c. Click Search to find driver version 580 or higher

    d. On the results page, right-click the Download button and select Copy Link Address

    e. The link will look like:

    https://us.download.nvidia.com/tesla/580.95.05/NVIDIA-Linux-x86_64-580.95.05.run
    

    or for repository installation:

    https://us.download.nvidia.com/tesla/580.95.05/nvidia-driver-local-repo-ubuntu2404-580.95.05_1.0-1_amd64.deb
    

Note

For Ubuntu/Debian, use the .deb package. For RHEL/CentOS, use the .rpm package. For other distributions, use the .run installer.

  1. Direct driver URLs for common configurations:

    For Ubuntu 24.04 (Noble):

    # H100/A100 driver 580.95.05
    https://us.download.nvidia.com/tesla/580.95.05/nvidia-driver-local-repo-ubuntu2404-580.95.05_1.0-1_amd64.deb
    

    For Ubuntu 22.04 (Jammy):

    # H100/A100 driver 580.95.05
    https://us.download.nvidia.com/tesla/580.95.05/nvidia-driver-local-repo-ubuntu2204-580.95.05_1.0-1_amd64.deb
    

    For RHEL 8/Rocky Linux 8:

    # H100/A100 driver 580.95.05
    https://us.download.nvidia.com/tesla/580.95.05/nvidia-driver-local-repo-rhel8-580.95.05-1.0-1.x86_64.rpm
    

Important

Always check the NVIDIA Driver Downloads page for the latest driver version compatible with your GPU and OS.

Note

The following commands are for Ubuntu 24.04. If you’re using Ubuntu 22.04 or other versions, replace ubuntu2404 in the URLs and paths with your version (e.g., ubuntu2204 for Ubuntu 22.04).

  1. Use wget to download the driver on your remote machine:

# Example for Ubuntu 24.04 with driver version 580.95.05
wget https://us.download.nvidia.com/tesla/580.95.05/nvidia-driver-local-repo-ubuntu2404-580.95.05_1.0-1_amd64.deb

Note

Replace the URL with the appropriate driver version and distribution for your system. Use the URL you copied from step 2 or select from the common configurations listed above.

  1. Install the local repository using dpkg (for Ubuntu/Debian):

sudo dpkg -i nvidia-driver-local-repo-ubuntu2404-580.95.05_1.0-1_amd64.deb

For RHEL/CentOS/Rocky Linux:

sudo rpm -i nvidia-driver-local-repo-rhel8-580.95.05-1.0-1.x86_64.rpm
  1. Update package lists and install the driver:

For Ubuntu/Debian:

# Copy the GPG key
sudo cp /var/nvidia-driver-local-repo-ubuntu2404-580.95.05/nvidia-driver-local-*-keyring.gpg /usr/share/keyrings/

# Update package cache
sudo apt-get update

# Install the driver
sudo apt-get install -y cuda-drivers

For RHEL/CentOS/Rocky Linux:

# Update package cache
sudo dnf clean all
sudo dnf makecache

# Install the driver
sudo dnf install -y cuda-drivers
  1. After installation, reboot to load the new driver:

sudo reboot
  1. After reboot, verify the driver is installed correctly:

nvidia-smi

You should see output showing your GPU(s) and driver version 580 or higher with CUDA version 13.0 or higher.

Example expected output:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.05    Driver Version: 580.95.05    CUDA Version: 13.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA H100 PCIe    Off  | 00001E:00:00.0  Off |                    0  |
| N/A   30C    P0    68W / 350W |      0MiB / 81559MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+

Troubleshooting Driver Installation#

Driver version mismatch: If nvidia-smi shows an older driver version, ensure you’ve rebooted after installation.

CUDA version: The driver must support CUDA 13.0 or higher. Check the CUDA version in the nvidia-smi output.

Secure Boot: If you have Secure Boot enabled, you may need to sign the NVIDIA kernel modules or disable Secure Boot in your BIOS.

Previous driver versions: If you have older NVIDIA drivers installed, you may need to remove them first:

sudo apt-get remove --purge nvidia-*
sudo apt-get autoremove

Verifying GPU Access#

Verify your container runtime supports NVIDIA GPUs by running:

docker run --rm --gpus all ubuntu nvidia-smi

Example output:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 580.82.07    Driver Version: 580.82.07    CUDA Version: 13.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  N/A |
| 41%   30C    P8     1W / 260W |   2244MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

Note

For more information on enumerating multi-GPU systems, refer to the NVIDIA Container Toolkit’s GPU Enumeration Docs

NGC (NVIDIA GPU Cloud) Account#

  1. Create an account on NGC

  2. Generate an API Key

  3. Docker log in with your NGC API key using docker login nvcr.io --username='$oauthtoken' --password=${NGC_API_KEY}

NGC CLI Tool#

  1. Download the NGC CLI tool for your OS.

Important

Use NGC CLI version 3.41.1 or newer. Here is the command to install this on AMD64 Linux in your home directory:

wget --content-disposition https://api.ngc.nvidia.com/v2/resources/nvidia/ngc-apps/ngc_cli/versions/3.41.3/files/ngccli_linux.zip -O ~/ngccli_linux.zip && \
unzip ~/ngccli_linux.zip -d ~/ngc && \
chmod u+x ~/ngc/ngc-cli/ngc && \
echo "export PATH=\"\$PATH:~/ngc/ngc-cli\"" >> ~/.bash_profile && source ~/.bash_profile
  1. Set up your NGC CLI Tool locally (You’ll need your API key for this!):

ngc config set

Note

After you enter your API key, you may see multiple options for the org and team. Select as desired or hit enter to accept the default.

  1. Log in to NGC

You’ll need log in to NGC via Docker and set the NGC_API_KEY environment variable to pull images:

docker login nvcr.io
Username: $oauthtoken
Password: <Enter your NGC key here>

Then, set the relevant environment variables in your shell. You will need to set the NGC_API_KEY variable:

export NGC_API_KEY=<Enter your NGC key here>
  1. Set up your NIM cache

The NIM needs a directory on your system called the NIM cache, where it can

  • (a) download the model artifact (checkpoints and TRT engines)

  • (b) read the model artifact if it has been previously downloaded

The NIM cache directory must

  • (i) reside on a disk with at least 15GB of storage

  • (ii) have a permission state that allows the NIM to read, write, an execute

The NIM cache directory can be set up as follows, if your home directory ‘~’ is on a disk with enough storage

## Create the NIM cache directory in a location with sufficent storage
mkdir -p ~/.cache/nim

## Set the NIM cache directory permissions to allow all (a) users to read, write, and execute (rwx)
sudo chmod -R a+rwx ~/.cache/nim

Now, you should be able to pull the container and download the model using the environment variables. To get started, see the quickstart guide.