Performance in OpenFold3 NIM#

NIM Accuracy#

OpenFold3 is an all-atom biomolecular complex structure prediction model from the OpenFold Consortium and the AlQuraishi Laboratory. OpenFold3 is a PyTorch implementation of the jax-based AlphaFold3 reported in Accurate structure prediction of biomolecular interactions with AlphaFold 3, and like AlphaFold3, OpenFold3 extends protein structure prediction capabilities to model complete biomolecular complexes including proteins, DNA, RNA, and small molecule ligands.

The OpenFold3 NIM’s accuracy should match that of the AlQuraishi Laboratory implementation of OpenFold3, when using equivalent parameters and inputs.

Note

Running on hardware that is not listed as supported in the prerequisites section may produce results that deviate from the expected accuracy.

The accuracy of the NIM is measured by structural quality metrics such as lddt (local distance difference test). These scores help assess the reliability of the predicted structures.

Factors Affecting NIM Performance#

The performance of the OpenFold3 NIM is determined by several key factors:

Hardware Factors#

  • GPU type and memory: Different GPU architectures provide different performance levels

  • System RAM: Larger proteins and complexes require more system memory

  • Storage speed: Fast NVMe SSD storage improves model loading and caching performance

Input Complexity#

  • Sequence length: Runtime increases with total sequence length

  • Number of chains: Multi-chain complexes require more computation than single chains

  • MSA size: Larger MSAs can improve accuracy but increase memory usage and computation time

  • Ligands, DNA, RNA: Additional molecular components increase computational cost

Model Configuration#

  • Inference backend: TensorRT + cuEquivariance provides significant speedups over PyTorch

  • Diffusion samples: Multiple samples provide diversity but multiply computational cost

Performance Characteristics#

Typical Runtimes#

For reference, approximate runtimes on high-end hardware (NVIDIA H100 80GB):

Structure Prediction (OpenFold3 v1.0.0 on H100 with TensorRT + cuEquivariance):

  • ~200 residues: 11.8-14.5 seconds

  • ~300-400 residues: 15.0-20.4 seconds

  • ~500-600 residues: 21.5-25.7 seconds

  • ~800-900 residues: 32.7 seconds

  • ~1300-1500 residues: 46.5-67.9 seconds

  • ~1700-1900 residues: 88.7-101.8 seconds

  • Memory usage: Varies with sequence length, typically 40-80GB GPU memory

Performance Notes The total runtime for structure prediction depends on:

  • Total number of residues in the complex

  • Total number of atoms in the complex

  • Number of molecules and chains

  • Number of sequences in the MSAs

  • Number of diffusion samples requested

Performance Results on H100 (NVIDIA H100 80GB)#

Structure Prediction Performance#

The following table shows runtime performance for OpenFold3 NIM on NVIDIA H100 GPUs across different inference backends. The rightmost column shows speedup at settings where structural template processing is not active. These backends are described in Backend Selection and in Backend Optimization Options.

Note

Structural template support is available starting from version 1.1.0.

Test ID

Sequence Length

PyTorch + cuEquivariance (s)

Open source OF3 from openfold consortium (baseline) (s)

OF3 NIM (TRT + cuEquivariance) (s)

OF3 NIM with Templates (TRT + cuEquivariance) (s)

Speedup (NIM vs Baseline)

8eil

186

17.06

16.79

11.78

15.52

1.42x

7r6r

203

18.82

19.26

14.53

16.60

1.33x

1a3n

287

26.76

29.13

23.55

31.24

1.24x

8c4d

331

20.08

18.85

15.04

15.37

1.25x

7qsj

375

20.79

19.92

16.06

18.10

1.24x

8cpk

384

26.85

23.50

20.37

22.12

1.15x

8are

530

26.31

27.42

21.89

23.82

1.25x

8owf

575

27.54

28.86

22.90

25.02

1.26x

8aw3

590

41.96

45.64

37.69

41.40

1.21x

7tpu

616

25.69

28.61

21.48

23.48

1.33x

7ylz

623

31.80

35.66

27.51

30.38

1.30x

8gpp

628

29.92

33.81

25.71

34.91

1.32x

8clz

684

30.68

35.35

26.41

30.41

1.34x

8k7x

858

37.39

44.76

32.75

37.87

1.37x

8ibx

1286

52.12

63.54

46.51

48.06

1.37x

8gi1

1464

74.42

99.45

60.56

62.28

1.64x

8sm6

1496

83.33

110.55

67.92

72.79

1.63x

8pso

1499

72.68

96.87

57.41

61.76

1.69x

8jue

1657

98.09

125.53

78.04

85.66

1.61x

8bsh

1762

110.98

144.03

88.68

96.94

1.62x

5xgo

1869

127.79

163.83

101.79

111.89

1.61x

All runtimes are in seconds for end-to-end structure prediction with a single diffusion sample. The template measurements include an average of 4 CIF template files per protein chain. The additional time is primarily attributed to CIF parsing.

Performance Analysis#

Key Observations:

  • OpenFold3 NIM Optimization: Provides consistent 1.15x to 1.69x speedup over the open source OF3 baseline

  • cuEquivariance Acceleration: PyTorch + cuEquivariance shows speedups for larger proteins, demonstrating the value of cuEquivariance optimization

  • Scaling Behavior: Runtime increases with sequence length

  • Best Performance: Largest speedups (1.6x-1.69x) observed for proteins in the 1400-1500 residue range

  • Small Proteins: Still achieve significant speedups (1.2x-1.4x) even for sequences under 400 residues

Template Processing Impact:

  • Template overhead: Structural templates add modest overhead (typically 0.3-10 seconds) depending on sequence length

  • Small proteins (<400 residues): Template overhead is ~0.3-4 seconds

  • Medium proteins (400-1000 residues): Template overhead is ~2-9 seconds

  • Large proteins (>1000 residues): Template overhead is ~1.5-10 seconds

  • Primary cost: The additional time is mostly attributed to CIF parsing (average 4 templates per chain)

  • Overall impact: Templates provide structural guidance with minimal performance cost

Backend Comparison:

  • Open source OF3 (PyTorch + DeepSpeed): Baseline implementation from the openfold consortium, good for development and debugging

  • PyTorch + cuEquivariance: Improved performance, especially for larger proteins, while maintaining PyTorch flexibility

  • OpenFold3 NIM (TensorRT + cuEquivariance): Best performance across all sequence lengths, recommended for production deployments

Recommended Configuration:

  • For development/testing: Use open source OF3 (PyTorch + DeepSpeed) for easier debugging and flexibility

  • For production: Use OpenFold3 NIM (TensorRT + cuEquivariance, default) for optimal performance

  • For large proteins (>1500 residues): OpenFold3 NIM (TensorRT + cuEquivariance) provides the best speedups (1.6x-1.7x)

  • For sequence lengths outside 4-2048 range: Use PyTorch backend as TensorRT has length limitations

  • For template-guided predictions: Structural templates add minimal performance overhead

Configuration#

The benchmarks use the following configuration:

Parameter

Setting

diffusion_samples

1

output_format

pdb

GPU

H100 80GB

structural_templates

Average 4 CIF files per chain

Performance Metrics#

The following tables contain end-to-end runtime (seconds) for the OpenFold3 NIM across supported NVIDIA hardware units, optimization backends, and structural template configuration. Inputs are the same 21 benchmark cases, arranged by sequence length and annotated with PDB ID and sequence length.

Table 1: Performance Across the Supported NVIDIA Hardware Units#

The table shows the prediction times for tests with varying sequence lengths for hardware units. It features the following:

  • Hardware: Nine supported GPU SKUs, including NVIDIA A100 80GB HBM3, B200, H100 80GB HBM3, H200, GB200, L40s, GB10, RTX6000, and GH200. One tab per SKU. Refer to the tables below.

  • Metric: End-to-end predict time (seconds).

  • Configuration: Default model; no structural templates; default optimization backend (TensorRT + cuEquivariance). Inputs sorted by sequence length and annotated with PDB ID and sequence length.

Test ID

Seq Length

predict_time (s)

8eil

186

14.03

7r6r

203

15.08

1a3n

287

26.57

8c4d

331

16.28

7qsj

375

17.93

8cpk

384

20.47

8are

530

24.51

8owf

575

26.39

8aw3

590

41.71

7tpu

616

25.28

7ylz

623

31.70

8gpp

628

30.04

8clz

684

31.71

8k7x

858

40.16

8ibx

1286

53.05

8gi1

1464

86.39

8sm6

1496

94.46

8pso

1499

82.84

8jue

1657

115.30

8bsh

1762

131.57

5xgo

1869

150.10

Test ID

Seq Length

predict_time (s)

8eil

186

10.05

7r6r

203

10.63

1a3n

287

19.15

8c4d

331

11.57

7qsj

375

12.89

8cpk

384

15.67

8are

530

17.63

8owf

575

18.75

8aw3

590

29.59

7tpu

616

17.07

7ylz

623

23.03

8gpp

628

21.68

8clz

684

22.17

8k7x

858

27.39

8ibx

1286

37.95

8gi1

1464

54.49

8sm6

1496

60.43

8pso

1499

54.29

8jue

1657

71.84

8bsh

1762

81.14

5xgo

1869

92.93

Test ID

Seq Length

predict_time (s)

8eil

186

12.20

7r6r

203

12.72

1a3n

287

22.54

8c4d

331

13.80

7qsj

375

15.24

8cpk

384

18.33

8are

530

20.60

8owf

575

21.57

8aw3

590

34.50

7tpu

616

19.78

7ylz

623

26.78

8gpp

628

25.15

8clz

684

25.64

8k7x

858

30.69

8ibx

1286

39.58

8gi1

1464

56.50

8sm6

1496

63.45

8pso

1499

54.01

8jue

1657

73.99

8bsh

1762

85.16

5xgo

1869

97.41

Test ID

Seq Length

predict_time (s)

8eil

186

20.24

7r6r

203

20.81

1a3n

287

27.13

8c4d

331

21.21

7qsj

375

22.12

8cpk

384

23.99

8are

530

26.07

8owf

575

26.79

8aw3

590

35.74

7tpu

616

25.55

7ylz

623

29.52

8gpp

628

28.70

8clz

684

29.58

8k7x

858

34.86

8ibx

1286

44.12

8gi1

1464

59.46

8sm6

1496

63.62

8pso

1499

58.95

8jue

1657

74.93

8bsh

1762

82.36

5xgo

1869

93.29

Test ID

Seq Length

predict_time (s)

8eil

186

10.38

7r6r

203

10.86

1a3n

287

19.52

8c4d

331

12.02

7qsj

375

13.14

8cpk

384

15.80

8are

530

17.77

8owf

575

18.78

8aw3

590

30.08

7tpu

616

17.17

7ylz

623

23.28

8gpp

628

21.72

8clz

684

22.35

8k7x

858

26.87

8ibx

1286

35.22

8gi1

1464

50.12

8sm6

1496

56.29

8pso

1499

48.47

8jue

1657

65.19

8bsh

1762

75.22

5xgo

1869

87.13

Test ID

Seq Length

predict_time (s)

8eil

186

12.34

7r6r

203

13.09

1a3n

287

28.57

8c4d

331

14.88

7qsj

375

16.89

8cpk

384

20.17

8are

530

25.78

8owf

575

27.83

8aw3

590

45.18

7tpu

616

27.33

7ylz

623

34.09

8gpp

628

32.77

8clz

684

34.85

8k7x

858

44.29

8ibx

1286

65.76

8gi1

1464

103.55

8sm6

1496

112.75

8pso

1499

98.41

8jue

1657

8bsh

1762

5xgo

1869

Test ID

Seq Length

predict_time (s)

8eil

186

13.52

7r6r

203

18.96

1a3n

287

73.84

8c4d

331

27.44

7qsj

375

32.58

8cpk

384

34.30

8are

530

62.91

8owf

575

73.95

8aw3

590

127.72

7tpu

616

81.40

7ylz

623

88.03

8gpp

628

87.25

8clz

684

103.79

8k7x

858

145.92

8ibx

1286

259.91

8gi1

1464

615.61

8sm6

1496

657.24

8pso

1499

567.58

8jue

1657

8bsh

1762

5xgo

1869

Test ID

Seq Length

predict_time (s)

8eil

186

8.90

7r6r

203

9.67

1a3n

287

19.52

8c4d

331

10.71

7qsj

375

11.83

8cpk

384

13.96

8are

530

17.56

8owf

575

19.08

8aw3

590

30.86

7tpu

616

18.51

7ylz

623

23.38

8gpp

628

22.31

8clz

684

23.81

8k7x

858

31.06

8ibx

1286

45.42

8gi1

1464

70.29

8sm6

1496

76.67

8pso

1499

68.76

8jue

1657

92.27

8bsh

1762

102.93

5xgo

1869

118.07

Test ID

Seq Length

predict_time (s)

8eil

186

19.18

7r6r

203

19.63

1a3n

287

25.52

8c4d

331

20.26

7qsj

375

21.23

8cpk

384

22.75

8are

530

24.87

8owf

575

25.31

8aw3

590

34.04

7tpu

616

24.15

7ylz

623

28.26

8gpp

628

27.37

8clz

684

28.32

8k7x

858

32.78

8ibx

1286

39.84

8gi1

1464

53.61

8sm6

1496

58.12

8pso

1499

52.10

8jue

1657

67.16

8bsh

1762

76.03

5xgo

1869

86.87

Table 2: Performance Across Optimization Backends#

The table shows the prediction times for tests with varying sequence lengths for optimized backends. It features the following:

  • Hardware: NVIDIA H100 80GB HBM3. The same comparison is supported on the other SKUs listed in Table 1.

  • Metric: End-to-end predict time (seconds).

  • Configuration: Default model; no structural templates. Inputs sorted by sequence length and annotated with PDB ID and sequence length.

Test ID

Seq Length

torch_baseline (s)

torch (s)

trt (s)

trt-speedup-over-baseline

8eil

186

17.40

17.34

12.20

1.43x

7r6r

203

17.88

17.83

12.72

1.41x

1a3n

287

28.00

26.05

22.54

1.24x

8c4d

331

18.50

18.14

13.80

1.34x

7qsj

375

19.88

19.32

15.24

1.30x

8cpk

384

23.03

22.61

18.33

1.26x

8are

530

25.95

24.38

20.60

1.26x

8owf

575

27.26

25.26

21.57

1.26x

8aw3

590

41.02

37.46

34.50

1.19x

7tpu

616

25.79

23.53

19.78

1.30x

7ylz

623

32.43

30.24

26.78

1.21x

8gpp

628

31.10

28.71

25.15

1.24x

8clz

684

32.16

28.98

25.64

1.25x

8k7x

858

39.10

33.46

30.69

1.27x

8ibx

1286

58.03

47.14

39.58

1.47x

8gi1

1464

92.36

69.92

56.50

1.63x

8sm6

1496

101.92

77.46

63.45

1.61x

8pso

1499

88.69

68.09

54.01

1.64x

8jue

1657

119.95

93.32

73.99

1.62x

8bsh

1762

137.31

105.20

85.16

1.61x

5xgo

1869

156.68

122.00

97.41

1.61x

Table 3: Performance Impact From Structural Templates#

The table shows the prediction times for tests with varying sequence lengths for structural templates. It features the following:

  • Hardware: NVIDIA H100 80GB HBM3. The same comparison is supported on the other SKUs listed in Table 1.

  • Metric: End-to-end predict time (seconds).

  • Configuration: Default model; default optimization backend (TensorRT + cuEquivariance). Inputs sorted by sequence length and annotated with PDB ID and sequence length.

Test ID

Seq Length

Without structural templates (s)

With structural templates (s)

8eil

186

12.20

17.08

7r6r

203

12.72

15.39

1a3n

287

22.54

31.28

8c4d

331

13.80

15.24

7qsj

375

15.24

17.28

8cpk

384

18.33

21.35

8are

530

20.60

22.62

8owf

575

21.57

23.24

8aw3

590

34.50

37.10

7tpu

616

19.78

21.93

7ylz

623

26.78

28.70

8gpp

628

25.15

31.92

8clz

684

25.64

27.67

8k7x

858

30.69

33.49

8ibx

1286

39.58

44.57

8gi1

1464

56.50

58.45

8sm6

1496

63.45

68.29

8pso

1499

54.01

58.25

8jue

1657

73.99

83.12

8bsh

1762

85.16

93.68

5xgo

1869

97.41

107.87

Performance Optimization Tips#

Backend Selection#

  • Default (TensorRT + cuEquivariance): Best for most use cases

    # This is the default, no environment variable needed
    
  • PyTorch + cuEquivariance: For flexibility with good performance

    export NIM_OPTIMIZED_BACKEND=torch
    
  • PyTorch + DeepSpeed: For debugging or sequences outside TRT range

    export NIM_OPTIMIZED_BACKEND=torch_baseline
    

General Optimization Tips#

  • GPU Selection: Use H100, H200, or B200 GPUs for optimal performance. A100 and L40S GPUs are also supported.

  • TRT Sequence Length Limits: TensorRT mode (default) supports sequences between 4 and 2048 residues. For sequences outside this range, use PyTorch backend.

  • Sequence Length: Performance scales with sequence length.

  • Multiple Samples: Setting diffusion_samples > 1 will increase runtime in affine fashion (input featurization time is independent of diffusion_samples).

  • MSA Size: While larger MSAs can improve accuracy, they also increase memory usage and computation time. Consider filtering MSAs for very large proteins.

  • Structural Templates: Templates add modest overhead (1-10 seconds) but can significantly improve prediction accuracy. See Template Processing for guidance.

  • Batch Processing: For multiple independent predictions, process them sequentially or use multiple NIM instances.

  • Memory Management: Ensure adequate GPU memory for your target sequence lengths. Very long sequences (>1800 residues) may require GPUs with 80GB+ memory (H100, H200, A100, or B200).

Reproducing Performance Benchmarks#

Overview#

This section provides scripts and instructions for reproducing the performance metrics reported above. The benchmarking process is as follows:

  1. Run inference with OpenFold3 NIM to generate predicted structures

  2. Use OpenStructure (OST) to compare predictions against reference structures

  3. Extract accuracy metrics like lDDT (local Distance Difference Test)

Prerequisites#

Run the OpenFold3 NIM#

Before running benchmarks, ensure that OpenFold3 NIM is deployed and running. The benchmarking scripts will send inference requests to the NIM service.

Verify NIM is Running:

# Check if NIM is accessible and ready
curl http://localhost:8000/v1/health/ready

# Expected response:
# {"object":"health.response","message":"ready","status":"ready"}

Note

The default NIM URL is http://localhost:8000. If your NIM is running on a different host or port, you’ll need to specify it using the --nim-url parameter when running benchmarks. For detailed deployment instructions and configuration options, refer to the Getting Started guide.

Install the OpenStructure Docker Image#

OpenStructure is a computational structural biology framework that provides tools for structure comparison and validation. You’ll need the OpenStructure Docker image for running benchmarks:

# Pull the latest OpenStructure image from the OST registry
docker pull registry.scicore.unibas.ch/schwede/openstructure:latest

# Verify the installation
docker run --rm -v $(pwd):/home registry.scicore.unibas.ch/schwede/openstructure:latest --version

# Expected response:
# OpenStructure 2.11.1

Download Reference Structures#

You’ll need reference structures (ground truth) in CIF format for comparison. These are experimentally determined structures from the Protein Data Bank (PDB):

  • CIF (Crystallographic Information File): A standard format for representing molecular structures, including atomic coordinates, experimental metadata, and structural annotations

  • Obtaining reference structures: Download from RCSB PDB using the PDB IDs from the performance table (e.g., 8eil, 7r6r, 1a3n)

Example:

# reate directories
mkdir -p references

# Download a reference structure (e.g., 8eil) to references folder
wget https://files.rcsb.org/download/8eil.cif -O references/8eil.cif

Preparing Input JSON Files#

What are Input JSON Files?

Input JSON files contain the sequence information needed for OpenFold3 NIM to make predictions. Each file specifies:

  • Protein sequences (amino acid chains)

  • Chain IDs

  • Optional: DNA, RNA, ligands, MSAs, templates

Example Input JSON Structure#

{
  "name": "8eil",
  "sequences": [
    {
      "protein": {
        "id": "A",
        "sequence": "MKQHKAMIVALIVICITAVVAALVTRKDLCEVHIRTGQTEVAVF..."
      }
    }
  ]
}

Create Input JSON Files#

You can create input JSON files by extracting sequences from PDB structures. Save this script as generate_inputs.py:

#!/usr/bin/env python3
"""Generate input JSON files from PDB structures."""
import argparse
import json
import requests
import sys
from pathlib import Path

def fetch_sequences_from_pdb(pdb_id):
    """Fetch protein sequences from RCSB PDB FASTA endpoint."""
    url = f"https://www.rcsb.org/fasta/entry/{pdb_id}"
    response = requests.get(url)
    if response.status_code != 200:
        raise Exception(f"Failed to fetch FASTA for {pdb_id}")
    
    sequences = []
    current_seq = []
    chain_index = 0
    
    for line in response.text.strip().split('\n'):
        if line.startswith('>'):
            if current_seq:
                # Use sequential chain IDs: A, B, C, D, ...
                chain_id = chr(65 + chain_index)
                sequences.append({
                    "protein": {
                        "id": chain_id,
                        "sequence": ''.join(current_seq)
                    }
                })
                chain_index += 1
            current_seq = []
        else:
            current_seq.append(line.strip())
    
    # Don't forget the last sequence
    if current_seq:
        chain_id = chr(65 + chain_index)
        sequences.append({
            "protein": {
                "id": chain_id,
                "sequence": ''.join(current_seq)
            }
        })
    
    return sequences

def create_input_json(pdb_id):
    """Create OpenFold3 input JSON file for a PDB structure."""
    output_dir = Path("inputs")
    output_dir.mkdir(parents=True, exist_ok=True)
    
    print(f"Fetching sequences for {pdb_id}...")
    sequences = fetch_sequences_from_pdb(pdb_id)
    
    input_data = {
        "name": pdb_id,
        "sequences": sequences
    }
    
    output_path = output_dir / f"{pdb_id}_input.json"
    with open(output_path, 'w') as f:
        json.dump(input_data, f, indent=2)
    
    print(f"Created {output_path} ({len(sequences)} chain(s))")
    return output_path

if __name__ == "__main__":
    # All benchmark PDB IDs from performance table
    benchmark_pdbs = [
        "8eil", "7r6r", "1a3n", "8c4d", "7qsj", "8cpk", 
        "8are", "8owf", "8aw3", "7tpu", "7ylz", "8gpp", 
        "8clz", "8k7x", "8ibx", "8gi1", "8sm6", "8pso", 
        "8jue", "8bsh", "5xgo"
    ]
    
    parser = argparse.ArgumentParser(
        description='Generate OpenFold3 input JSON files from PDB structures',
        epilog='Example: python generate_inputs.py --pdb 8eil'
    )
    parser.add_argument('--pdb', type=str, 
                        help='Specific PDB ID to generate input for (e.g., 8eil)')
    parser.add_argument('--all', action='store_true',
                        help='Generate input files for all benchmark PDB IDs')
    
    args = parser.parse_args()
    
    # For backwards compatibility: if no arguments provided, generate all
    if not args.pdb and not args.all:
        args.all = True
    
    if args.pdb:
        # Generate input for specific PDB ID
        try:
            create_input_json(args.pdb)
        except Exception as e:
            print(f"Error processing {args.pdb}: {e}", file=sys.stderr)
            sys.exit(1)
    elif args.all:
        # Generate all benchmark inputs
        print("Generating input files for all benchmark cases...")
        failed = []
        for pdb_id in benchmark_pdbs:
            try:
                create_input_json(pdb_id)
            except Exception as e:
                print(f"Error processing {pdb_id}: {e}")
                failed.append(pdb_id)
        
        if failed:
            print(f"\nFailed to generate inputs for: {', '.join(failed)}", file=sys.stderr)
            sys.exit(1)

Benchmarking Script#

The following is a complete script to benchmark OpenFold3 NIM predictions. Save this as benchmark_openfold3.py:

#!/usr/bin/env python3
"""
Benchmark OpenFold3 NIM predictions against reference structures.
"""

import argparse
import json
import subprocess
import time
from pathlib import Path
import requests

def run_inference(nim_url, input_json, output_dir):
    """
    Run OpenFold3 NIM inference and save the predicted structure.
    
    Args:
        nim_url: URL of the OpenFold3 NIM service (e.g., http://localhost:8000)
        input_json: Path to input JSON file with sequence information
        output_dir: Directory to save predicted structures
    
    Returns:
        tuple: (predicted_pdb_path, inference_time_seconds)
    """
    output_dir = Path(output_dir)
    output_dir.mkdir(parents=True, exist_ok=True)
    
    # Read input configuration
    with open(input_json, 'r') as f:
        input_data = json.load(f)
    
    # Convert to NIM API format
    molecules = []
    for seq in input_data.get("sequences", []):
        if "protein" in seq:
            protein = seq["protein"]
            # Create minimal MSA with just the query sequence
            msa_csv = f"key,sequence\n-1,{protein['sequence']}"
            molecules.append({
                "type": "protein",
                "id": [protein["id"]],
                "sequence": protein["sequence"],
                "msa": {
                    "main_db": {
                        "csv": {
                            "alignment": msa_csv,
                            "format": "csv"
                        }
                    }
                }
            })
    
    nim_request = {
        "inputs": [{
            "input_id": input_data.get("name", "prediction"),
            "molecules": molecules,
            "output_format": "pdb"
        }]
    }
    
    # Start timing
    start_time = time.time()
    
    # Run inference
    response = requests.post(
        f"{nim_url}/biology/openfold/openfold3/predict",
        json=nim_request,
        headers={"Content-Type": "application/json"}
    )
    
    # End timing
    inference_time = time.time() - start_time
    
    if response.status_code != 200:
        raise Exception(f"Inference failed: {response.text}")
    
    # Extract PDB data from response
    result = response.json()
    
    outputs = result.get('outputs', [])
    if not outputs:
        raise Exception(f"No outputs in response")
    
    structures = outputs[0].get('structures_with_scores', [])
    if not structures:
        raise Exception(f"No structures in response")
    
    # Get the first (best) structure
    pdb_content = structures[0].get('structure', '')
    if not pdb_content:
        raise Exception(f"No structure content in response")
    
    # Save predicted structure
    pdb_id = input_data.get('name', 'prediction')
    pred_path = output_dir / f"{pdb_id}_pred.pdb"
    
    with open(pred_path, 'w') as f:
        f.write(pdb_content)
    
    return pred_path, inference_time

def compare_structures(pred_pdb, reference_cif, output_dir):
    """
    Compare predicted structure against reference using OpenStructure.
    
    Args:
        pred_pdb: Path to predicted structure (PDB format)
        reference_cif: Path to reference structure (CIF format)
        output_dir: Directory to save comparison results
    
    Returns:
        dict: Comparison metrics including lDDT score
    """
    output_dir = Path(output_dir)
    output_dir.mkdir(parents=True, exist_ok=True)
    
    out_path = output_dir / "comparison_results.json"
    
    # Build OpenStructure comparison command
    cmd = [
        "compare-structures",
        "-m", str(pred_pdb),           # Model (predicted structure)
        "-r", str(reference_cif),      # Reference (ground truth)
        "--fault-tolerant",            # Handle minor structural differences
        "--min-pep-length", "4",       # Minimum peptide chain length
        "--min-nuc-length", "4",       # Minimum nucleotide chain length
        "-o", str(out_path),           # Output JSON file
        "--lddt",                      # Calculate lDDT metric
    ]
    
    # Run comparison using OpenStructure Docker container
    docker_cmd = [
        "docker", "run", "--rm",
        "-v", f"{Path.cwd()}:/home",
        "registry.scicore.unibas.ch/schwede/openstructure:latest"
    ] + cmd
    
    result = subprocess.run(
        docker_cmd,
        capture_output=True,
        text=True
    )
    
    if result.returncode != 0:
        raise Exception(f"Structure comparison failed: {result.stderr}")
    
    # Read comparison results
    with open(out_path, 'r') as f:
        metrics = json.load(f)
    
    return metrics

def extract_lddt(comparison_results):
    """
    Extract lDDT score from comparison results.
    
    Args:
        comparison_results: Dictionary containing comparison metrics
    
    Returns:
        float: lDDT score (0.0 to 1.0, higher is better)
    """
    # lDDT is directly a float value in the comparison results
    lddt_score = comparison_results.get('lddt', 0.0)
    return lddt_score

def benchmark_structure(nim_url, input_json, reference_cif, output_dir):
    """
    Complete benchmark pipeline for a single structure.
    
    Args:
        nim_url: URL of OpenFold3 NIM service
        input_json: Input configuration for prediction
        reference_cif: Reference structure for validation
        output_dir: Output directory for results
    
    Returns:
        dict: Benchmark results including timing and accuracy
    """
    output_dir = Path(output_dir)
    
    print(f"Running inference...")
    pred_path, inference_time = run_inference(nim_url, input_json, output_dir)
    print(f"Inference completed in {inference_time:.2f} seconds")
    
    print(f"Comparing structures...")
    metrics = compare_structures(pred_path, reference_cif, output_dir)
    lddt_score = extract_lddt(metrics)
    print(f"lDDT score: {lddt_score:.4f}")
    
    return {
        'inference_time': inference_time,
        'lddt_score': lddt_score,
        'predicted_structure': str(pred_path),
        'full_metrics': metrics
    }

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description='Benchmark OpenFold3 NIM predictions')
    parser.add_argument('--nim-url', default='http://localhost:8000', 
                        help='URL of the OpenFold3 NIM service')
    parser.add_argument('--input', required=True, 
                        help='Path to input JSON file')
    parser.add_argument('--reference', required=True, 
                        help='Path to reference CIF file')
    parser.add_argument('--output', required=True, 
                        help='Output directory for results')
    
    args = parser.parse_args()
    
    results = benchmark_structure(args.nim_url, args.input, args.reference, args.output)
    
    print("\nBenchmark Results:")
    print(f"  Inference Time: {results['inference_time']:.2f}s")
    print(f"  lDDT Score: {results['lddt_score']:.4f}")
    print(f"  Predicted Structure: {results['predicted_structure']}")

Running a Single Benchmark#

To run a benchmark for a single structure, follow these steps in order:

# Step 1: Create directories
mkdir -p inputs results

# Step 2: Generate the input JSON file
python3 generate_inputs.py --pdb 8eil

# Step 3: Verify files exist before running benchmark
ls -l inputs/8eil_input.json references/8eil.cif

# Step 4: Run the benchmark (requires OpenFold3 NIM to be running)
python3 benchmark_openfold3.py \
    --nim-url http://localhost:8000 \
    --input inputs/8eil_input.json \
    --reference references/8eil.cif \
    --output results/8eil

Note

Each step must complete successfully before proceeding to the next. Verify that input and reference files exist (Step 4) before running the benchmark (Step 5).

Understanding lDDT Metric#

lDDT (local Distance Difference Test) is a robust metric for assessing the quality of predicted protein structures:

  • Score Range: 0.0 to 1.0 (or 0 to 100 when expressed as percentage)

  • Interpretation:

    • lDDT > 0.90: Excellent accuracy, very high confidence

    • lDDT 0.70-0.90: Good accuracy, reliable predictions

    • lDDT 0.50-0.70: Moderate accuracy, some structural features correct

    • lDDT < 0.50: Low accuracy, limited reliability

How lDDT Works:

  • Measures local geometric agreement between predicted and reference structures

  • Evaluates distances between atoms within local neighborhoods (typically 15Å radius)

  • More robust to domain movements and flexible regions compared to global metrics like RMSD

  • Focuses on local structural correctness rather than global superposition

Why lDDT for OpenFold3:

  • AlphaFold3 and OpenFold3 models are trained to optimize lDDT during the training process

  • Well-suited for evaluating multi-chain complexes and structures with flexible regions

  • Provides per-residue metrics in addition to global metrics

  • Standard metric used in CASP (Critical Assessment of protein Structure Prediction) competitions

Running Benchmarks for Performance Table#

To reproduce the performance metrics from the table above, follow these steps in order:

Important

Complete Steps 1 and 2 before running Step 3. The benchmark workflow requires:

  1. Both Python scripts (generate_inputs.py and benchmark_openfold3.py) saved as files

  2. The bash script (run_benchmarks.sh) saved as a file

  3. All scripts must be in the same directory before execution

Running steps out of order will result in “File Not Found” errors.

Step 1: Prepare the Python scripts#

Save the two Python scripts provided earlier in this document:

Step 2: Create the benchmark workflow script#

Save the following as run_benchmarks.sh:

#!/bin/bash
# Complete benchmark workflow for all test cases

NIM_URL="http://localhost:8000"
PDB_IDS=("8eil" "7r6r" "1a3n" "8c4d" "7qsj" "8cpk" "8are" "8owf" "8aw3" "7tpu" "7ylz" "8gpp" "8clz" "8k7x" "8ibx" "8gi1" "8sm6" "8pso" "8jue" "8bsh" "5xgo")

# Create directories
mkdir -p references inputs results

# Step 1: Generate input JSON files from PDB
echo "Step 1: Generating input JSON files..."
python3 generate_inputs.py
echo "✓ Input files created in inputs/"

# Step 2: Download reference structures
echo "Step 2: Downloading reference structures..."
for pdb_id in "${PDB_IDS[@]}"; do
    if [ ! -f "references/${pdb_id}.cif" ]; then
        echo "  Downloading ${pdb_id}.cif..."
        wget -q "https://files.rcsb.org/download/${pdb_id}.cif" -O "references/${pdb_id}.cif"
    fi
done
echo "✓ Reference structures downloaded"

# Step 3: Verify NIM is running
echo "Step 3: Checking NIM availability..."
curl -sf "${NIM_URL}/v1/health/ready" > /dev/null || {
    echo "Error: NIM not accessible at ${NIM_URL}"
    exit 1
}
echo "✓ NIM is ready"

# Step 4: Run benchmarks
echo "Step 4: Running benchmarks..."
for pdb_id in "${PDB_IDS[@]}"; do
    echo "  Benchmarking ${pdb_id}..."
    python3 benchmark_openfold3.py \
        --nim-url "$NIM_URL" \
        --input "inputs/${pdb_id}_input.json" \
        --reference "references/${pdb_id}.cif" \
        --output "results/${pdb_id}"
done

echo ""
echo "Benchmarking complete! Results saved in results/"

Step 3: Run the complete workflow#

# 1. Install dependencies
pip3 install requests

# 2. Make the benchmark script executable
chmod +x run_benchmarks.sh

# 3. Run the complete workflow (generates inputs, downloads references, and runs benchmarks)
./run_benchmarks.sh

Note

The run_benchmarks.sh script handles all steps automatically: generating input files, downloading reference structures, and running benchmarks for all test cases. Make sure the OpenFold3 NIM is running before executing the script.

Expected Output Format#

The comparison results JSON file contains detailed metrics:

{
  "lddt": 0.9234,
  "chain_mapping": {
    "A": "A",
    "B": "B"
  },
  "aln": [
    ">reference:A\nMKQLYGHSTI...",
    ">model:A\nMKQLYGHSTI..."
  ],
  "model_clashes": [],
  "model_bad_bonds": [],
  "model_bad_angles": [],
  "reference_clashes": [],
  "reference_bad_bonds": [],
  "reference_bad_angles": [],
  "status": "SUCCESS",
  "ost_version": "2.11.1"
}

The primary metric of interest is lddt, which is a float value (0.0 to 1.0) representing the overall structural quality of the prediction. Higher values indicate better agreement with the reference structure.

Troubleshooting#

General Performance Issues#

  • Out of memory errors: Reduce MSA depth, decrease sequence length, or upgrade to GPUs with more memory

  • Slow performance: Ensure fast storage (NVMe SSD), sufficient CPU cores, and adequate system RAM

  • Poor quality predictions: Check input sequence quality, increase MSA depth if available, or adjust diffusion parameters

Backend-Specific Issues#

  • TensorRT errors with long sequences: Use PyTorch backend for sequences >2048 residues

    export NIM_OPTIMIZED_BACKEND=torch
    
  • TensorRT errors with short sequences: Use PyTorch backend for sequences <4 residues

    export NIM_OPTIMIZED_BACKEND=torch
    
  • Inconsistent results between backends: This is expected; TensorRT uses optimizations that may produce slightly different numerical results while maintaining accuracy

Benchmarking Issues#

The following are “File not found” errors.

  • “Input file not found”: Generate input JSON files first using generate_inputs.py

    python3 generate_inputs.py --all  # Creates all input files
    python3 generate_inputs.py --pdb 8eil  # Or for specific PDB
    
  • “Reference file not found”: Download CIF files from RCSB PDB

    wget https://files.rcsb.org/download/8eil.cif -O references/8eil.cif
    

The following are structure comparison issues.

  • OpenStructure comparison failures: Ensure both predicted and reference structures have compatible chain IDs and residue numbering

  • Missing atoms in predictions: Use --fault-tolerant flag to handle incomplete structures (already included in script)

  • lDDT score of 0.0: Check that sequences match between prediction and reference; may indicate alignment failure

The following are NIM connection issues.

  • “Connection refused”: Ensure OpenFold3 NIM is running

    curl http://localhost:8000/v1/health/ready  # Should return {"object":"health.response","message":"ready","status":"ready"}
    

Note

For detailed performance tuning guidance specific to your deployment, refer to the documentation on configuration and optimization.