Prerequisites#
Begin with Docker-supported operating system
Install Docker - minimum version: 23.0.1
Install NVIDIA Drivers - minimum version: 590.44.01
Install the NVIDIA Container Toolkit - minimum version: 1.13.5
Verify your container runtime supports NVIDIA GPUs by running
docker run --rm --gpus all ubuntu nvidia-smi
Example output:
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 590.44.01 Driver Version: 590.44.01 CUDA Version: 13.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A | | 41% 30C P8 1W / 260W | 2244MiB / 11264MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| +-----------------------------------------------------------------------------+
Note
For more information on enumerating multi-GPU systems, please see the NVIDIA Container Toolkit’s GPU Enumeration Docs
NGC (NVIDIA GPU Cloud) Account#
Docker log in with your NGC API key using
docker login nvcr.io --username='$oauthtoken' --password=${NGC_API_KEY}
NGC CLI Tool#
Download the
NGC CLI tool <https://org.ngc.nvidia.com/setup/installers/cli>__ for your OS.
Important
Use NGC CLI version
3.41.1or newer. Here is the command to install this on AMD64 Linux in your home directory:
wget --content-disposition https://api.ngc.nvidia.com/v2/resources/nvidia/ngc-apps/ngc_cli/versions/3.41.3/files/ngccli_linux.zip -O ~/ngccli_linux.zip && \ unzip ~/ngccli_linux.zip -d ~/ngc && \ chmod u+x ~/ngc/ngc-cli/ngc && \ echo "export PATH=\"\$PATH:~/ngc/ngc-cli\"" >> ~/.bash_profile && source ~/.bash_profile
Set up your NGC CLI Tool locally (You’ll need your API key for this!):
ngc config set
Note
After you enter your API key, you may see multiple options for the org and team. Select as desired or hit enter to accept the default.
Log in to NGC
You’ll need log in to NGC via Docker and set the NGC_API_KEY environment variable to pull images:
docker login nvcr.io
Username: $oauthtoken
Password: <Enter your NGC key here>
Then, set the relevant environment variables in your shell. You will need to set the NGC_API_KEY variable:
export NGC_API_KEY=<Enter your NGC key here>
Set up your NIM cache
The NIM cache allows you to download models and store previously-downloaded models so that you don’t need to download them again later when you run the NIM again. The NIM cache must be readable and writable by the NIM, so in addition to creating the directory, the permissions on this directory need to be set to globally readable writable. The NIM cache directory can be set up as follows:
## Create the NIM cache directory
mkdir -p ~/.cache/nim
## Set the NIM cache directory permissions to the correct values
chmod -R 777 ~/.cache/nim
Now, you should be able to pull the container and download the model using the environment variables. To get started, see the quickstart guide.
Running on Slurm With Enroot#
This section describes how to run the OpenFold2 NIM on a Slurm-based HPC cluster using Enroot to run the NGC container.
Important
NGC_API_KEY: You must export your NGC API key before running the Slurm script so that Enroot can pull the image from NGC. Then run the script (for example, bash openfold2_srun.sh).
Example:
export NGC_API_KEY=<your_NGC_API_key>
bash openfold2_srun.sh
Environment#
Enroot: Version 3.4.1 or newer is supported.
1. Check Enroot is Available#
On the login node, verify Enroot is installed:
enroot version
which enroot
If Enroot is not installed, contact your cluster administrator. Typical path: /usr/bin/enroot.
Create the Enroot config directory if it does not exist:
mkdir -p ~/.config/enroot
2. Create and Add NGC Credentials#
Create the credentials file so Enroot can pull images from NGC. Replace YOUR_NGC_API_KEY with your NGC API key:
echo 'machine nvcr.io login $oauthtoken password YOUR_NGC_API_KEY' > ~/.config/enroot/.credentials
Restrict permissions and confirm the file content:
chmod 600 ~/.config/enroot/.credentials
cat ~/.config/enroot/.credentials
Note: Use vi ~/.config/enroot/.credentials or another editor if you prefer not to echo the key on the command line.
3. Configure and Run the Slurm Job Script#
Copy the script below to a file, such as openfold2_srun.sh. Set the variables at the top for your cluster and paths:
ACCOUNT: Slurm account name
PARTITION: Slurm partition (for example,
interactiveorgpu)ROOT_DIR: Your working directory on the cluster (for example, Lustre or NFS path)
NGC_IMAGE: NGC image with tag; use
#between registry and image path for Enroot (see example below)NGC_API_KEY: Must be exported in your environment before running the script (for example,
export NGC_API_KEY=<your_key>)
Use a shared filesystem path for ROOT_DIR and CACHE_PATH so that the SQSH image and cache are visible from the node where the job runs. Use a node-local or large-quota path for TMP_PATH if /tmp has strict quotas (for example, Lustre scratch).
Save the following as openfold2_srun.sh and run it with bash openfold2_srun.sh:
#!/bin/bash
# --- Slurm job configuration (customize for your cluster) ---
ACCOUNT="your_slurm_account"
PARTITION="interactive"
GPUS_PER_NODE=1
MEMORY="32G"
TIME="04:00:00"
# --- Paths (customize for your cluster) ---
# Root directory on shared storage (for example, Lustre/NFS)
ROOT_DIR="/path/to/your/workspace"
# NIM cache on shared storage
CACHE_PATH="${ROOT_DIR}/.nim_cache"
WORKING_CACHE_DIR="/opt/nim/.cache"
# Project/data mount inside container
WORKING_PATH="${ROOT_DIR}"
MOUNT_PATH="/workspace/data"
# Temp directory (use shared storage if node /tmp has quota limits)
TMP_PATH="${ROOT_DIR}/tmp"
MOUNT_TMP_PATH="${MOUNT_PATH}/tmp"
# --- NGC image (use # between registry and image path for Enroot) ---
NGC_IMAGE="docker://nvcr.io#nim/openfold/openfold2:2.4.0"
CONTAINER_NAME="openfold2"
ACTIVE_SQSH_PATH="${ROOT_DIR}/openfold2.sqsh"
# --- Helpers ---
handle_error() {
echo "Error: $1"
exit 1
}
mkdir -p "$(dirname "${ACTIVE_SQSH_PATH}")" || handle_error "Failed to create directory for SQSH file"
mkdir -p "${CACHE_PATH}" || handle_error "Failed to create cache directory"
mkdir -p "${TMP_PATH}" || handle_error "Failed to create tmp directory"
echo "Step 1: Requesting interactive resources..."
srun --account="${ACCOUNT}" \
--partition="${PARTITION}" \
--gpus-per-node="${GPUS_PER_NODE}" \
--mem="${MEMORY}" \
--time="${TIME}" \
--export=ALL \
-o /dev/tty -e /dev/tty \
bash -c "
echo \"Loading environment...\"
# Step 2: Import Docker image (if not present)
echo \"Step 2: Importing Docker image\"
if [ ! -f \"${ACTIVE_SQSH_PATH}\" ]; then
echo \"Importing from NGC...\"
mkdir -p \"\$(dirname \"${ACTIVE_SQSH_PATH}\")\" || { echo \"Failed to create directory\"; exit 1; }
enroot import -o \"${ACTIVE_SQSH_PATH}\" ${NGC_IMAGE} || { echo \"Failed to import Docker image\"; exit 1; }
else
echo \"Docker image already exists, skipping import.\"
fi
# Step 3: Create Enroot container
echo \"Step 3: Creating Enroot container...\"
if enroot list 2>/dev/null | grep -q \"${CONTAINER_NAME}\"; then
echo \"Removing existing container...\"
enroot remove -f ${CONTAINER_NAME} || true
fi
enroot create --name ${CONTAINER_NAME} \"${ACTIVE_SQSH_PATH}\" || { echo \"Failed to create container\"; exit 1; }
# Step 4: Start container and NIM server
NODE=\$(hostname)
echo \"\$NODE\" > \"${WORKING_PATH}/.openfold2_node\"
echo \"Step 4: OpenFold2 container ready.\"
echo \"=====================================\"
echo \"Working directory in container: ${MOUNT_PATH}\"
echo \"\"
echo \"From another terminal (login node or your machine), run:\"
echo \" ssh -L 8000:localhost:8000 \$NODE\"
echo \"Then keep that SSH session open and call the API (see Step 4 in docs).\"
echo \"Node name saved to: ${WORKING_PATH}/.openfold2_node\"
echo \"Type exit to leave the container.\"
echo \"=====================================\"
cat > \"\${TMPDIR:-/tmp}/rc.local\" << RCEOF
#!/bin/sh
export TMPDIR=${MOUNT_TMP_PATH}
export TEMP=${MOUNT_TMP_PATH}
export TMP=${MOUNT_TMP_PATH}
export HOME=${MOUNT_PATH}
export XDG_CACHE_HOME=${WORKING_CACHE_DIR}
export XDG_DATA_HOME=${MOUNT_PATH}/.local/share
mkdir -p ${MOUNT_TMP_PATH} ${MOUNT_PATH}/.local/share
/opt/nim/start_server.sh &
exec /bin/bash
RCEOF
chmod +x \"\${TMPDIR:-/tmp}/rc.local\"
enroot start \\
--mount ${CACHE_PATH}:${WORKING_CACHE_DIR} \\
--mount ${WORKING_PATH}:${MOUNT_PATH} \\
--mount \"\${TMPDIR:-/tmp}/rc.local:/etc/rc.local\" \\
-e NGC_API_KEY \\
-e TMPDIR=${MOUNT_TMP_PATH} \\
-e TEMP=${MOUNT_TMP_PATH} \\
-e TMP=${MOUNT_TMP_PATH} \\
-e HOME=${MOUNT_PATH} \\
-e XDG_CACHE_HOME=${WORKING_CACHE_DIR} \\
-e XDG_DATA_HOME=${MOUNT_PATH}/.local/share \\
${CONTAINER_NAME} || { echo \"Failed to start container\"; exit 1; }
echo \"=====================================\"
echo \"Workflow completed.\"
"
Submit the job (interactive):
export NGC_API_KEY=<your_NGC_API_key>
bash openfold2_srun.sh
Alternatively, for a batch job, wrap the same script body in an sbatch script and set ACCOUNT, PARTITION, and other Slurm directives as needed. Ensure NGC_API_KEY is exported or passed into the job environment.
4. Call the API From Your Machine#
After the container is running on a compute node, the script prints the node name and saves it to ROOT_DIR/.openfold2_node.
From your laptop or the login node, create an SSH tunnel to that node:
ssh -L 8000:localhost:8000 $(cat /path/to/your/workspace/.openfold2_node)
Or use the node name directly if you know it:
ssh -L 8000:localhost:8000 <compute-node-name>
In another terminal (with the tunnel still open), run a test request:
curl -s -X POST 'http://localhost:8000/biology/openfold/openfold2/predict-structure-from-msa-and-template' \ -H "content-type: application/json" \ -H "NVCF-POLL-SECONDS: 300" \ -d '{"sequence":"GGSKENEISHHAKEIERLQKEIERHKQSIKKLKQSEQSNPPPNPEGTRQARRNRRRRWRERQRQKENEISHHAKEIERLQKEIERHKQSIKKLKQSEC","selected_models":[1,2]}' \ --max-time 300
If the request succeeds, you will get a JSON response containing the prediction result.