Performance in OpenFold3 NIM#

NIM Accuracy#

OpenFold3 is an all-atom biomolecular complex structure prediction model from the OpenFold Consortium and the AlQuraishi Laboratory. OpenFold3 is a PyTorch implementation of the jax-based AlphaFold3 reported in Accurate structure prediction of biomolecular interactions with AlphaFold 3, and like AlphaFold3, OpenFold3 extends protein structure prediction capabilities to model complete biomolecular complexes including proteins, DNA, RNA, and small molecule ligands.

The OpenFold3 NIM’s accuracy should match that of the AlQuraishi Laboratory implementation of OpenFold3, when using equivalent parameters and inputs.

Note

Running on hardware that is not listed as supported in the prerequisites section may produce results that deviate from the expected accuracy.

The accuracy of the NIM is measured by structural quality metrics such as lddt (local distance difference test). These scores help assess the reliability of the predicted structures.

Factors Affecting NIM Performance#

The performance of the OpenFold3 NIM is determined by several key factors:

Hardware Factors#

GPU type and memory: Different GPU architectures provide different performance levels
System RAM: Larger proteins and complexes require more system memory
Storage speed: Fast NVMe SSD storage improves model loading and caching performance

Input Complexity#

Sequence length: Runtime increases with total sequence length
Number of chains: Multi-chain complexes require more computation than single chains
MSA size: Larger MSAs can improve accuracy but increase memory usage and computation time
Ligands, DNA, RNA: Additional molecular components increase computational cost

Model Configuration#

Diffusion samples: Multiple samples provide diversity but multiply computational cost

Performance Characteristics#

Typical Runtimes#

For reference, approximate runtimes on high-end hardware (NVIDIA H100 80GB):

Structure Prediction (OpenFold3 NIM v1.5.0 on NVIDIA H100 80GB):

~200 residues: 10.2-11.0 seconds
~300-400 residues: 12.3-21.0 seconds
~500-600 residues: 17.9-32.5 seconds
~800-900 residues: 27.6 seconds
~1300-1500 residues: 37.2-58.8 seconds
~1700-1900 residues: 65.3-82.5 seconds
Memory usage: Varies with sequence length, typically 40-80GB GPU memory

Performance Notes The total runtime for structure prediction depends on:

Total number of residues in the complex
Total number of atoms in the complex
Number of molecules and chains
Number of sequences in the MSAs
Number of diffusion samples requested

Performance Results on H100#

The following table shows OpenFold3 NIM runtime on NVIDIA H100 80GB, with and without structural template processing.

Note

Structural template support is available starting from version 1.1.0.

Test ID	Seq Length	Without structural templates (s)	With structural templates (s)
8eil	186	10.22	14.85
7r6r	203	11.00	13.66
1a3n	287	20.92	41.17
8c4d	331	12.28	13.49
7qsj	375	14.24	16.51
8cpk	384	16.96	21.03
8are	530	18.84	21.06
8owf	575	19.82	21.34
8aw3	590	32.47	34.76
7tpu	616	17.95	19.73
7ylz	623	24.72	26.59
8gpp	628	23.17	29.66
8clz	684	23.46	25.29
8k7x	858	27.57	34.63
8ibx	1286	37.17	40.13
8gi1	1464	51.61	52.62
8sm6	1496	58.83	62.77
8pso	1499	50.64	54.19
8jue	1657	65.28	72.99
8bsh	1762	72.39	79.27
5xgo	1869	82.49	91.32

All runtimes are in seconds for end-to-end structure prediction with a single diffusion sample. The template measurements include an average of 4 CIF template files per protein chain. The additional time is primarily attributed to CIF parsing.

Performance Analysis#

Key Observations:

Scaling Behavior: Runtime increases with total sequence length and with the number of chains and atoms in the complex.
Hardware range: The fastest data-center GPUs (B200, GH200, GB300, H200) complete typical predictions in roughly 60-80% of the H100 time. Refer to Table 1 for per-SKU numbers.
Small proteins (<400 residues): Complete in ~10-21 seconds on H100.
Large proteins (>1500 residues): Complete in ~65-90 seconds on H100.

Template Processing Impact:

Template overhead: Structural templates add modest overhead (typically 1-9 seconds) depending on sequence length and chain count.
Primary cost: The additional time is mostly attributed to CIF parsing (average 4 templates per chain).
Overall impact: Templates provide structural guidance with minimal performance cost.

Recommended Configuration:

For large proteins (>1800 residues): Prefer high-memory GPUs; actual requirements depend on workload (MSA depth, templates, and molecule composition). Refer to Table 1 for measured per-SKU results.
For template-guided predictions: Structural templates add minimal performance overhead.

Configuration#

The benchmarks use the following configuration:

Parameter	Setting
diffusion_samples	1
output_format	pdb
GPU	H100 80GB
structural_templates	Average 4 CIF files per chain

Performance Metrics#

The following table contains end-to-end runtime (seconds) for the OpenFold3 NIM across supported NVIDIA hardware units. Inputs are the same 21 benchmark cases, arranged by sequence length and annotated with PDB ID and sequence length.

Table 1: Performance Across the Supported NVIDIA Hardware Units#

The table shows the prediction times for tests with varying sequence lengths for hardware units. It features the following:

Hardware: Nineteen supported GPU SKUs (NVIDIA H100 80GB HBM3, H100 NVL, H100 PCIe, H200, H200 NVL, B200, B300, GB200, GB300, GH200, A100 SXM4 80GB, A100 80GB PCIe, A100 SXM4 40GB, A100 40GB PCIe, L40S, RTX PRO 6000 Blackwell Server Edition, RTX PRO 6000 Blackwell Workstation Edition, RTX 6000 Ada Generation, and DGX Spark GB10). One tab per SKU.
Metric: End-to-end predict time (seconds).
Configuration: Default model; no structural templates. Inputs sorted by sequence length and annotated with PDB ID and sequence length.

NVIDIA H100 80GB HBM3

Test ID	Seq Length	predict_time (s)
8eil	186	10.22
7r6r	203	11.00
1a3n	287	20.92
8c4d	331	12.28
7qsj	375	14.24
8cpk	384	16.96
8are	530	18.84
8owf	575	19.82
8aw3	590	32.47
7tpu	616	17.95
7ylz	623	24.72
8gpp	628	23.17
8clz	684	23.46
8k7x	858	27.57
8ibx	1286	37.17
8gi1	1464	51.61
8sm6	1496	58.83
8pso	1499	50.64
8jue	1657	65.28
8bsh	1762	72.39
5xgo	1869	82.49

NVIDIA H100 NVL

Test ID	Seq Length	predict_time (s)
8eil	186	9.68
7r6r	203	10.53
1a3n	287	20.39
8c4d	331	11.84
7qsj	375	13.27
8cpk	384	16.14
8are	530	18.57
8owf	575	19.49
8aw3	590	31.71
7tpu	616	17.85
7ylz	623	24.11
8gpp	628	22.68
8clz	684	23.43
8k7x	858	28.60
8ibx	1286	38.26
8gi1	1464	54.96
8sm6	1496	61.39
8pso	1499	53.85
8jue	1657	70.54
8bsh	1762	78.64
5xgo	1869	90.36

NVIDIA H100 PCIe

Test ID	Seq Length	predict_time (s)
8eil	186	9.65
7r6r	203	10.54
1a3n	287	20.97
8c4d	331	12.21
7qsj	375	13.72
8cpk	384	17.05
8are	530	19.72
8owf	575	20.47
8aw3	590	34.18
7tpu	616	19.38
7ylz	623	25.32
8gpp	628	24.15
8clz	684	25.04
8k7x	858	30.14
8ibx	1286	43.07
8gi1	1464	62.77
8sm6	1496	69.78
8pso	1499	61.49
8jue	1657	80.82
8bsh	1762	89.85
5xgo	1869	103.91

NVIDIA H200

Test ID	Seq Length	predict_time (s)
8eil	186	9.13
7r6r	203	9.84
1a3n	287	18.91
8c4d	331	11.06
7qsj	375	12.53
8cpk	384	15.74
8are	530	17.33
8owf	575	18.33
8aw3	590	29.03
7tpu	616	16.10
7ylz	623	22.22
8gpp	628	20.95
8clz	684	21.20
8k7x	858	24.91
8ibx	1286	34.04
8gi1	1464	47.13
8sm6	1496	53.50
8pso	1499	46.56
8jue	1657	60.19
8bsh	1762	66.27
5xgo	1869	75.79

NVIDIA H200 NVL

Test ID	Seq Length	predict_time (s)
8eil	186	10.84
7r6r	203	10.36
1a3n	287	20.08
8c4d	331	11.59
7qsj	375	13.05
8cpk	384	16.33
8are	530	18.78
8owf	575	18.74
8aw3	590	29.65
7tpu	616	17.21
7ylz	623	23.11
8gpp	628	21.71
8clz	684	22.27
8k7x	858	26.36
8ibx	1286	35.88
8gi1	1464	50.11
8sm6	1496	56.15
8pso	1499	49.57
8jue	1657	64.26
8bsh	1762	71.51
5xgo	1869	81.43

NVIDIA B200

Test ID	Seq Length	predict_time (s)
8eil	186	8.16
7r6r	203	8.87
1a3n	287	17.13
8c4d	331	9.95
7qsj	375	11.15
8cpk	384	13.55
8are	530	15.68
8owf	575	16.67
8aw3	590	27.58
7tpu	616	15.11
7ylz	623	20.79
8gpp	628	19.88
8clz	684	20.52
8k7x	858	24.50
8ibx	1286	32.82
8gi1	1464	40.43
8sm6	1496	45.53
8pso	1499	47.51
8jue	1657	62.44
8bsh	1762	69.82
5xgo	1869	80.49

NVIDIA B300 SXM6

Test ID	Seq Length	predict_time (s)
8eil	186	8.98
7r6r	203	10.50
1a3n	287	21.01
8c4d	331	14.78
7qsj	375	16.34
8cpk	384	20.51
8are	530	21.85
8owf	575	23.16
8aw3	590	39.54
7tpu	616	21.53
7ylz	623	29.78
8gpp	628	27.75
8clz	684	28.08
8k7x	858	34.54
8ibx	1286	42.08
8gi1	1464	49.53
8sm6	1496	54.06
8pso	1499	53.89
8jue	1657	78.23
8bsh	1762	87.33
5xgo	1869	99.26

NVIDIA GB200

Test ID	Seq Length	predict_time (s)
8eil	186	11.44
7r6r	203	11.53
1a3n	287	18.93
8c4d	331	12.41
7qsj	375	13.51
8cpk	384	15.18
8are	530	17.37
8owf	575	18.47
8aw3	590	28.34
7tpu	616	17.40
7ylz	623	21.87
8gpp	628	20.72
8clz	684	21.62
8k7x	858	28.28
8ibx	1286	36.68
8gi1	1464	43.24
8sm6	1496	46.82
8pso	1499	48.77
8jue	1657	60.96
8bsh	1762	68.60
5xgo	1869	78.80

NVIDIA GB300

Test ID	Seq Length	predict_time (s)
8eil	186	10.48
7r6r	203	11.01
1a3n	287	17.61
8c4d	331	11.93
7qsj	375	12.87
8cpk	384	14.41
8are	530	16.55
8owf	575	17.72
8aw3	590	26.91
7tpu	616	16.46
7ylz	623	20.38
8gpp	628	19.43
8clz	684	20.69
8k7x	858	27.84
8ibx	1286	35.25
8gi1	1464	40.15
8sm6	1496	43.95
8pso	1499	46.18
8jue	1657	58.89
8bsh	1762	64.29
5xgo	1869	73.87

NVIDIA GH200 144GB HBM3e

Test ID	Seq Length	predict_time (s)
8eil	186	9.03
7r6r	203	9.68
1a3n	287	16.08
8c4d	331	10.72
7qsj	375	11.82
8cpk	384	12.92
8are	530	14.71
8owf	575	15.60
8aw3	590	24.39
7tpu	616	14.74
7ylz	623	18.51
8gpp	628	17.42
8clz	684	18.22
8k7x	858	23.85
8ibx	1286	31.28
8gi1	1464	42.06
8sm6	1496	46.48
8pso	1499	42.28
8jue	1657	53.82
8bsh	1762	58.71
5xgo	1869	67.29

NVIDIA A100 SXM4 80GB

Test ID	Seq Length	predict_time (s)
8eil	186	11.94
7r6r	203	12.95
1a3n	287	25.04
8c4d	331	14.96
7qsj	375	16.72
8cpk	384	19.84
8are	530	23.36
8owf	575	24.30
8aw3	590	39.16
7tpu	616	23.01
7ylz	623	29.67
8gpp	628	27.88
8clz	684	29.04
8k7x	858	35.80
8ibx	1286	49.68
8gi1	1464	74.22
8sm6	1496	82.16
8pso	1499	72.48
8jue	1657	95.34
8bsh	1762	104.90
5xgo	1869	123.08

NVIDIA A100 80GB PCIe

Test ID	Seq Length	predict_time (s)
8eil	186	11.83
7r6r	203	12.74
1a3n	287	25.37
8c4d	331	14.71
7qsj	375	16.40
8cpk	384	19.90
8are	530	23.24
8owf	575	24.41
8aw3	590	39.88
7tpu	616	23.13
7ylz	623	30.09
8gpp	628	28.40
8clz	684	29.55
8k7x	858	36.03
8ibx	1286	51.98
8gi1	1464	77.99
8sm6	1496	86.22
8pso	1499	76.32
8jue	1657	100.85
8bsh	1762	111.44
5xgo	1869	130.37

NVIDIA A100 SXM4 40GB

Test ID	Seq Length	predict_time (s)
8eil	186	15.41
7r6r	203	16.64
1a3n	287	31.63
8c4d	331	18.69
7qsj	375	20.82
8cpk	384	27.12
8are	530	29.90
8owf	575	31.09
8aw3	590	48.73
7tpu	616	28.12
7ylz	623	37.23
8gpp	628	35.08
8clz	684	36.17
8k7x	858	43.28
8ibx	1286	58.83
8gi1	1464	85.56
8sm6	1496	95.10
8pso	1499	83.49
8jue	1657	—
8bsh	1762	—
5xgo	1869	—

NVIDIA A100 40GB PCIe

Test ID	Seq Length	predict_time (s)
8eil	186	15.74
7r6r	203	17.09
1a3n	287	32.31
8c4d	331	19.20
7qsj	375	21.35
8cpk	384	26.11
8are	530	29.80
8owf	575	31.61
8aw3	590	50.41
7tpu	616	29.24
7ylz	623	38.69
8gpp	628	36.38
8clz	684	37.44
8k7x	858	44.93
8ibx	1286	61.92
8gi1	1464	90.54
8sm6	1496	100.86
8pso	1499	87.87
8jue	1657	—
8bsh	1762	—
5xgo	1869	—

NVIDIA L40S

Test ID	Seq Length	predict_time (s)
8eil	186	10.45
7r6r	203	11.55
1a3n	287	28.52
8c4d	331	14.08
7qsj	375	16.81
8cpk	384	20.60
8are	530	25.19
8owf	575	27.39
8aw3	590	45.90
7tpu	616	26.92
7ylz	623	33.74
8gpp	628	32.44
8clz	684	34.78
8k7x	858	39.65
8ibx	1286	60.92
8gi1	1464	97.56
8sm6	1496	107.62
8pso	1499	92.94
8jue	1657	—
8bsh	1762	—
5xgo	1869	—

NVIDIA RTX PRO 6000 Blackwell Server Edition

Test ID	Seq Length	predict_time (s)
8eil	186	9.65
7r6r	203	10.31
1a3n	287	22.57
8c4d	331	12.31
7qsj	375	14.13
8cpk	384	16.95
8are	530	20.59
8owf	575	21.97
8aw3	590	36.77
7tpu	616	20.98
7ylz	623	27.33
8gpp	628	26.62
8clz	684	28.00
8k7x	858	29.74
8ibx	1286	42.68
8gi1	1464	66.45
8sm6	1496	72.86
8pso	1499	63.91
8jue	1657	85.03
8bsh	1762	93.51
5xgo	1869	108.00

NVIDIA RTX PRO 6000 Blackwell Workstation Edition

Test ID	Seq Length	predict_time (s)
8eil	186	7.40
7r6r	203	8.08
1a3n	287	18.52
8c4d	331	9.69
7qsj	375	11.01
8cpk	384	13.07
8are	530	16.70
8owf	575	18.21
8aw3	590	30.58
7tpu	616	17.69
7ylz	623	22.66
8gpp	628	21.95
8clz	684	22.97
8k7x	858	25.30
8ibx	1286	37.89
8gi1	1464	60.25
8sm6	1496	66.59
8pso	1499	58.07
8jue	1657	77.96
8bsh	1762	86.62
5xgo	1869	100.05

NVIDIA RTX 6000 Ada Generation

Test ID	Seq Length	predict_time (s)
8eil	186	12.51
7r6r	203	13.21
1a3n	287	28.07
8c4d	331	15.46
7qsj	375	17.52
8cpk	384	21.14
8are	530	25.49
8owf	575	27.50
8aw3	590	45.05
7tpu	616	26.84
7ylz	623	33.96
8gpp	628	32.44
8clz	684	34.39
8k7x	858	41.63
8ibx	1286	60.52
8gi1	1464	95.55
8sm6	1496	105.45
8pso	1499	90.06
8jue	1657	—
8bsh	1762	—
5xgo	1869	—

NVIDIA DGX Spark (GB10)

Test ID	Seq Length	predict_time (s)
8eil	186	12.94
7r6r	203	18.80
1a3n	287	69.73
8c4d	331	27.86
7qsj	375	33.79
8cpk	384	35.09
8are	530	60.33
8owf	575	70.66
8aw3	590	118.79
7tpu	616	76.01
7ylz	623	83.80
8gpp	628	83.12
8clz	684	96.90
8k7x	858	110.39
8ibx	1286	180.14
8gi1	1464	451.94
8sm6	1496	489.03
8pso	1499	426.82
8jue	1657	—
8bsh	1762	—
5xgo	1869	—

Performance Optimization Tips#

GPU Selection: Use any of the supported GPUs listed in Table 1. For best throughput, prefer higher-performance SKUs such as B200, H100, and H200.
Sequence Length: Performance scales with sequence length.
Multiple Samples: Setting diffusion_samples > 1 will increase runtime in affine fashion (input featurization time is independent of diffusion_samples).
MSA Size: While larger MSAs can improve accuracy, they also increase memory usage and computation time. Consider filtering MSAs for very large proteins.
Structural Templates: Templates add modest overhead (1-10 seconds) but can significantly improve prediction accuracy. See Template Processing for guidance.
Batch Processing: For multiple independent predictions, process them sequentially or use multiple NIM instances.
Memory Management: Ensure adequate GPU memory for your target sequence lengths. Very long sequences (>1800 residues) are memory-intensive, and actual requirements depend on workload characteristics (MSA depth, templates, and molecule composition). Refer to Table 1 for measured per-SKU results.

Reproducing Performance Benchmarks#

Overview#

This section provides scripts and instructions for reproducing the performance metrics reported above. The benchmarking process is as follows:

Run inference with OpenFold3 NIM to generate predicted structures
Use OpenStructure (OST) to compare predictions against reference structures
Extract accuracy metrics like lDDT (local Distance Difference Test)

Prerequisites#

Run the OpenFold3 NIM#

Before running benchmarks, ensure that OpenFold3 NIM is deployed and running. The benchmarking scripts will send inference requests to the NIM service.

Verify NIM is Running:

# Check if NIM is accessible and ready
curl http://localhost:8000/v1/health/ready

# Expected response:
# {"object":"health.response","message":"ready","status":"ready"}

Note

The default NIM URL is http://localhost:8000. If your NIM is running on a different host or port, you’ll need to specify it using the --nim-url parameter when running benchmarks. For detailed deployment instructions and configuration options, refer to the Getting Started guide.

Install the OpenStructure Docker Image#

OpenStructure is a computational structural biology framework that provides tools for structure comparison and validation. You’ll need the OpenStructure Docker image for running benchmarks:

# Pull the latest OpenStructure image from the OST registry
docker pull registry.scicore.unibas.ch/schwede/openstructure:latest

# Verify the installation
docker run --rm -v $(pwd):/home registry.scicore.unibas.ch/schwede/openstructure:latest --version

# Expected response:
# OpenStructure 2.11.1

Download Reference Structures#

You’ll need reference structures (ground truth) in CIF format for comparison. These are experimentally determined structures from the Protein Data Bank (PDB):

CIF (Crystallographic Information File): A standard format for representing molecular structures, including atomic coordinates, experimental metadata, and structural annotations
Obtaining reference structures: Download from RCSB PDB using the PDB IDs from the performance table (e.g., 8eil, 7r6r, 1a3n)

Example:

# reate directories
mkdir -p references

# Download a reference structure (e.g., 8eil) to references folder
wget https://files.rcsb.org/download/8eil.cif -O references/8eil.cif

Preparing Input JSON Files#

What are Input JSON Files?

Input JSON files contain the sequence information needed for OpenFold3 NIM to make predictions. Each file specifies:

Protein sequences (amino acid chains)
Chain IDs
Optional: DNA, RNA, ligands, MSAs, templates

Example Input JSON Structure#

{
  "name": "8eil",
  "sequences": [
    {
      "protein": {
        "id": "A",
        "sequence": "MKQHKAMIVALIVICITAVVAALVTRKDLCEVHIRTGQTEVAVF..."
      }
    }
  ]
}

Create Input JSON Files#

You can create input JSON files by extracting sequences from PDB structures. Save this script as generate_inputs.py:

#!/usr/bin/env python3
"""Generate input JSON files from PDB structures."""
import argparse
import json
import requests
import sys
from pathlib import Path

def fetch_sequences_from_pdb(pdb_id):
    """Fetch protein sequences from RCSB PDB FASTA endpoint."""
    url = f"https://www.rcsb.org/fasta/entry/{pdb_id}"
    response = requests.get(url)
    if response.status_code != 200:
        raise Exception(f"Failed to fetch FASTA for {pdb_id}")
    
    sequences = []
    current_seq = []
    chain_index = 0
    
    for line in response.text.strip().split('\n'):
        if line.startswith('>'):
            if current_seq:
                # Use sequential chain IDs: A, B, C, D, ...
                chain_id = chr(65 + chain_index)
                sequences.append({
                    "protein": {
                        "id": chain_id,
                        "sequence": ''.join(current_seq)
                    }
                })
                chain_index += 1
            current_seq = []
        else:
            current_seq.append(line.strip())
    
    # Don't forget the last sequence
    if current_seq:
        chain_id = chr(65 + chain_index)
        sequences.append({
            "protein": {
                "id": chain_id,
                "sequence": ''.join(current_seq)
            }
        })
    
    return sequences

def create_input_json(pdb_id):
    """Create OpenFold3 input JSON file for a PDB structure."""
    output_dir = Path("inputs")
    output_dir.mkdir(parents=True, exist_ok=True)
    
    print(f"Fetching sequences for {pdb_id}...")
    sequences = fetch_sequences_from_pdb(pdb_id)
    
    input_data = {
        "name": pdb_id,
        "sequences": sequences
    }
    
    output_path = output_dir / f"{pdb_id}_input.json"
    with open(output_path, 'w') as f:
        json.dump(input_data, f, indent=2)
    
    print(f"Created {output_path} ({len(sequences)} chain(s))")
    return output_path

if __name__ == "__main__":
    # All benchmark PDB IDs from performance table
    benchmark_pdbs = [
        "8eil", "7r6r", "1a3n", "8c4d", "7qsj", "8cpk", 
        "8are", "8owf", "8aw3", "7tpu", "7ylz", "8gpp", 
        "8clz", "8k7x", "8ibx", "8gi1", "8sm6", "8pso", 
        "8jue", "8bsh", "5xgo"
    ]
    
    parser = argparse.ArgumentParser(
        description='Generate OpenFold3 input JSON files from PDB structures',
        epilog='Example: python generate_inputs.py --pdb 8eil'
    )
    parser.add_argument('--pdb', type=str, 
                        help='Specific PDB ID to generate input for (e.g., 8eil)')
    parser.add_argument('--all', action='store_true',
                        help='Generate input files for all benchmark PDB IDs')
    
    args = parser.parse_args()
    
    # For backwards compatibility: if no arguments provided, generate all
    if not args.pdb and not args.all:
        args.all = True
    
    if args.pdb:
        # Generate input for specific PDB ID
        try:
            create_input_json(args.pdb)
        except Exception as e:
            print(f"Error processing {args.pdb}: {e}", file=sys.stderr)
            sys.exit(1)
    elif args.all:
        # Generate all benchmark inputs
        print("Generating input files for all benchmark cases...")
        failed = []
        for pdb_id in benchmark_pdbs:
            try:
                create_input_json(pdb_id)
            except Exception as e:
                print(f"Error processing {pdb_id}: {e}")
                failed.append(pdb_id)
        
        if failed:
            print(f"\nFailed to generate inputs for: {', '.join(failed)}", file=sys.stderr)
            sys.exit(1)

Benchmarking Script#

The following is a complete script to benchmark OpenFold3 NIM predictions. Save this as benchmark_openfold3.py:

#!/usr/bin/env python3
"""
Benchmark OpenFold3 NIM predictions against reference structures.
"""

import argparse
import json
import subprocess
import time
from pathlib import Path
import requests

def run_inference(nim_url, input_json, output_dir):
    """
    Run OpenFold3 NIM inference and save the predicted structure.
    
    Args:
        nim_url: URL of the OpenFold3 NIM service (e.g., http://localhost:8000)
        input_json: Path to input JSON file with sequence information
        output_dir: Directory to save predicted structures
    
    Returns:
        tuple: (predicted_pdb_path, inference_time_seconds)
    """
    output_dir = Path(output_dir)
    output_dir.mkdir(parents=True, exist_ok=True)
    
    # Read input configuration
    with open(input_json, 'r') as f:
        input_data = json.load(f)
    
    # Convert to NIM API format
    molecules = []
    for seq in input_data.get("sequences", []):
        if "protein" in seq:
            protein = seq["protein"]
            # Create minimal MSA with just the query sequence
            msa_csv = f"key,sequence\n-1,{protein['sequence']}"
            molecules.append({
                "type": "protein",
                "id": [protein["id"]],
                "sequence": protein["sequence"],
                "msa": {
                    "main_db": {
                        "csv": {
                            "alignment": msa_csv,
                            "format": "csv"
                        }
                    }
                }
            })
    
    nim_request = {
        "inputs": [{
            "input_id": input_data.get("name", "prediction"),
            "molecules": molecules,
            "output_format": "pdb"
        }]
    }
    
    # Start timing
    start_time = time.time()
    
    # Run inference
    response = requests.post(
        f"{nim_url}/biology/openfold/openfold3/predict",
        json=nim_request,
        headers={"Content-Type": "application/json"}
    )
    
    # End timing
    inference_time = time.time() - start_time
    
    if response.status_code != 200:
        raise Exception(f"Inference failed: {response.text}")
    
    # Extract PDB data from response
    result = response.json()
    
    outputs = result.get('outputs', [])
    if not outputs:
        raise Exception(f"No outputs in response")
    
    structures = outputs[0].get('structures_with_scores', [])
    if not structures:
        raise Exception(f"No structures in response")
    
    # Get the first (best) structure
    pdb_content = structures[0].get('structure', '')
    if not pdb_content:
        raise Exception(f"No structure content in response")
    
    # Save predicted structure
    pdb_id = input_data.get('name', 'prediction')
    pred_path = output_dir / f"{pdb_id}_pred.pdb"
    
    with open(pred_path, 'w') as f:
        f.write(pdb_content)
    
    return pred_path, inference_time

def compare_structures(pred_pdb, reference_cif, output_dir):
    """
    Compare predicted structure against reference using OpenStructure.
    
    Args:
        pred_pdb: Path to predicted structure (PDB format)
        reference_cif: Path to reference structure (CIF format)
        output_dir: Directory to save comparison results
    
    Returns:
        dict: Comparison metrics including lDDT score
    """
    output_dir = Path(output_dir)
    output_dir.mkdir(parents=True, exist_ok=True)
    
    out_path = output_dir / "comparison_results.json"
    
    # Build OpenStructure comparison command
    cmd = [
        "compare-structures",
        "-m", str(pred_pdb),           # Model (predicted structure)
        "-r", str(reference_cif),      # Reference (ground truth)
        "--fault-tolerant",            # Handle minor structural differences
        "--min-pep-length", "4",       # Minimum peptide chain length
        "--min-nuc-length", "4",       # Minimum nucleotide chain length
        "-o", str(out_path),           # Output JSON file
        "--lddt",                      # Calculate lDDT metric
    ]
    
    # Run comparison using OpenStructure Docker container
    docker_cmd = [
        "docker", "run", "--rm",
        "-v", f"{Path.cwd()}:/home",
        "registry.scicore.unibas.ch/schwede/openstructure:latest"
    ] + cmd
    
    result = subprocess.run(
        docker_cmd,
        capture_output=True,
        text=True
    )
    
    if result.returncode != 0:
        raise Exception(f"Structure comparison failed: {result.stderr}")
    
    # Read comparison results
    with open(out_path, 'r') as f:
        metrics = json.load(f)
    
    return metrics

def extract_lddt(comparison_results):
    """
    Extract lDDT score from comparison results.
    
    Args:
        comparison_results: Dictionary containing comparison metrics
    
    Returns:
        float: lDDT score (0.0 to 1.0, higher is better)
    """
    # lDDT is directly a float value in the comparison results
    lddt_score = comparison_results.get('lddt', 0.0)
    return lddt_score

def benchmark_structure(nim_url, input_json, reference_cif, output_dir):
    """
    Complete benchmark pipeline for a single structure.
    
    Args:
        nim_url: URL of OpenFold3 NIM service
        input_json: Input configuration for prediction
        reference_cif: Reference structure for validation
        output_dir: Output directory for results
    
    Returns:
        dict: Benchmark results including timing and accuracy
    """
    output_dir = Path(output_dir)
    
    print(f"Running inference...")
    pred_path, inference_time = run_inference(nim_url, input_json, output_dir)
    print(f"Inference completed in {inference_time:.2f} seconds")
    
    print(f"Comparing structures...")
    metrics = compare_structures(pred_path, reference_cif, output_dir)
    lddt_score = extract_lddt(metrics)
    print(f"lDDT score: {lddt_score:.4f}")
    
    return {
        'inference_time': inference_time,
        'lddt_score': lddt_score,
        'predicted_structure': str(pred_path),
        'full_metrics': metrics
    }

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description='Benchmark OpenFold3 NIM predictions')
    parser.add_argument('--nim-url', default='http://localhost:8000', 
                        help='URL of the OpenFold3 NIM service')
    parser.add_argument('--input', required=True, 
                        help='Path to input JSON file')
    parser.add_argument('--reference', required=True, 
                        help='Path to reference CIF file')
    parser.add_argument('--output', required=True, 
                        help='Output directory for results')
    
    args = parser.parse_args()
    
    results = benchmark_structure(args.nim_url, args.input, args.reference, args.output)
    
    print("\nBenchmark Results:")
    print(f"  Inference Time: {results['inference_time']:.2f}s")
    print(f"  lDDT Score: {results['lddt_score']:.4f}")
    print(f"  Predicted Structure: {results['predicted_structure']}")

Running a Single Benchmark#

To run a benchmark for a single structure, follow these steps in order:

# Step 1: Install dependencies
pip3 install requests

# Step 2: Create directories
mkdir -p inputs results

# Step 3: Generate the input JSON file
python3 generate_inputs.py --pdb 8eil

# Step 4: Verify files exist before running benchmark
ls -l inputs/8eil_input.json references/8eil.cif

# Step 5: Run the benchmark (requires OpenFold3 NIM to be running)
python3 benchmark_openfold3.py \
    --nim-url http://localhost:8000 \
    --input inputs/8eil_input.json \
    --reference references/8eil.cif \
    --output results/8eil

Note

Each step must complete successfully before proceeding to the next. Verify that input and reference files exist (Step 4) before running the benchmark (Step 5).

Understanding lDDT Metric#

lDDT (local Distance Difference Test) is a robust metric for assessing the quality of predicted protein structures:

Score Range: 0.0 to 1.0 (or 0 to 100 when expressed as percentage)
Interpretation:
- lDDT > 0.90: Excellent accuracy, very high confidence
- lDDT 0.70-0.90: Good accuracy, reliable predictions
- lDDT 0.50-0.70: Moderate accuracy, some structural features correct
- lDDT < 0.50: Low accuracy, limited reliability

How lDDT Works:

Measures local geometric agreement between predicted and reference structures
Evaluates distances between atoms within local neighborhoods (typically 15Å radius)
More robust to domain movements and flexible regions compared to global metrics like RMSD
Focuses on local structural correctness rather than global superposition

Why lDDT for OpenFold3:

AlphaFold3 and OpenFold3 models are trained to optimize lDDT during the training process
Well-suited for evaluating multi-chain complexes and structures with flexible regions
Provides per-residue metrics in addition to global metrics
Standard metric used in CASP (Critical Assessment of protein Structure Prediction) competitions

Running Benchmarks for Performance Table#

To reproduce the performance metrics from the table above, follow these steps in order:

Important

Complete Steps 1 and 2 before running Step 3. The benchmark workflow requires:

Both Python scripts (generate_inputs.py and benchmark_openfold3.py) saved as files
The bash script (run_benchmarks.sh) saved as a file
All scripts must be in the same directory before execution

Running steps out of order will result in “File Not Found” errors.

Step 1: Prepare the Python scripts#

Save the two Python scripts provided earlier in this document:

generate_inputs.py - Creates input JSON files from PDB (see Preparing Input JSON Files)
benchmark_openfold3.py - Runs inference and compares structures (see Benchmarking Script)

Step 2: Create the benchmark workflow script#

Save the following as run_benchmarks.sh:

#!/bin/bash
# Complete benchmark workflow for all test cases

NIM_URL="http://localhost:8000"
PDB_IDS=("8eil" "7r6r" "1a3n" "8c4d" "7qsj" "8cpk" "8are" "8owf" "8aw3" "7tpu" "7ylz" "8gpp" "8clz" "8k7x" "8ibx" "8gi1" "8sm6" "8pso" "8jue" "8bsh" "5xgo")

# Create directories
mkdir -p references inputs results

# Step 1: Generate input JSON files from PDB
echo "Step 1: Generating input JSON files..."
python3 generate_inputs.py
echo "✓ Input files created in inputs/"

# Step 2: Download reference structures
echo "Step 2: Downloading reference structures..."
for pdb_id in "${PDB_IDS[@]}"; do
    if [ ! -f "references/${pdb_id}.cif" ]; then
        echo "  Downloading ${pdb_id}.cif..."
        wget -q "https://files.rcsb.org/download/${pdb_id}.cif" -O "references/${pdb_id}.cif"
    fi
done
echo "✓ Reference structures downloaded"

# Step 3: Verify NIM is running
echo "Step 3: Checking NIM availability..."
curl -sf "${NIM_URL}/v1/health/ready" > /dev/null || {
    echo "Error: NIM not accessible at ${NIM_URL}"
    exit 1
}
echo "✓ NIM is ready"

# Step 4: Run benchmarks
echo "Step 4: Running benchmarks..."
for pdb_id in "${PDB_IDS[@]}"; do
    echo "  Benchmarking ${pdb_id}..."
    python3 benchmark_openfold3.py \
        --nim-url "$NIM_URL" \
        --input "inputs/${pdb_id}_input.json" \
        --reference "references/${pdb_id}.cif" \
        --output "results/${pdb_id}"
done

echo ""
echo "Benchmarking complete! Results saved in results/"

Step 3: Run the complete workflow#

# 1. Install dependencies
pip3 install requests

# 2. Make the benchmark script executable
chmod +x run_benchmarks.sh

# 3. Run the complete workflow (generates inputs, downloads references, and runs benchmarks)
./run_benchmarks.sh

Note

The run_benchmarks.sh script handles all steps automatically: generating input files, downloading reference structures, and running benchmarks for all test cases. Make sure the OpenFold3 NIM is running before executing the script.

Expected Output Format#

The comparison results JSON file contains detailed metrics:

{
  "lddt": 0.9234,
  "chain_mapping": {
    "A": "A",
    "B": "B"
  },
  "aln": [
    ">reference:A\nMKQLYGHSTI...",
    ">model:A\nMKQLYGHSTI..."
  ],
  "model_clashes": [],
  "model_bad_bonds": [],
  "model_bad_angles": [],
  "reference_clashes": [],
  "reference_bad_bonds": [],
  "reference_bad_angles": [],
  "status": "SUCCESS",
  "ost_version": "2.11.1"
}

The primary metric of interest is lddt, which is a float value (0.0 to 1.0) representing the overall structural quality of the prediction. Higher values indicate better agreement with the reference structure.

Troubleshooting#

General Performance Issues#

Out of memory errors: Reduce MSA depth, decrease sequence length, or upgrade to GPUs with more memory
Slow performance: Ensure fast storage (NVMe SSD), sufficient CPU cores, and adequate system RAM
Poor quality predictions: Check input sequence quality, increase MSA depth if available, or adjust diffusion parameters

Benchmarking Issues#

The following are “File not found” errors.

“Input file not found”: Generate input JSON files first using generate_inputs.py

python3 generate_inputs.py --all  # Creates all input files
python3 generate_inputs.py --pdb 8eil  # Or for specific PDB

“Reference file not found”: Download CIF files from RCSB PDB

wget https://files.rcsb.org/download/8eil.cif -O references/8eil.cif

The following are structure comparison issues.

OpenStructure comparison failures: Ensure both predicted and reference structures have compatible chain IDs and residue numbering
Missing atoms in predictions: Use --fault-tolerant flag to handle incomplete structures (already included in script)
lDDT score of 0.0: Check that sequences match between prediction and reference; may indicate alignment failure

The following are NIM connection issues.

“Connection refused”: Ensure OpenFold3 NIM is running

curl http://localhost:8000/v1/health/ready  # Should return {"object":"health.response","message":"ready","status":"ready"}

Note

For detailed performance tuning guidance specific to your deployment, refer to the documentation on configuration and optimization.