Performance in Boltz-2 NIM#

NIM Accuracy#

The Boltz-2 NIM is based on the state-of-the-art Boltz-2 architecture for biomolecular structure prediction. The NIM’s accuracy should match that of the reference implementation when using equivalent parameters and inputs.

Note

Running on hardware that is not listed as supported in the prerequisites section may produce results that deviate from the expected accuracy.

The accuracy of the NIM is measured by structural quality metrics such as lddt. These scores help assess the reliability of the predicted structures.

Factors Affecting NIM Performance#

The performance of the Boltz-2 NIM is determined by several key factors:

Hardware Factors#

  • Number and type of GPUs: More GPUs generally improve throughput for concurrent requests

  • GPU memory: Larger proteins and complexes require more GPU memory

  • Storage speed: Fast NVMe SSD storage improves model loading and caching performance

Input Complexity#

  • Sequence length: Runtime scales approximately quadratically with total sequence length

  • Number of chains: Multi-chain complexes require more computation than single chains

  • Ligands and constraints: Additional molecular components increase computational cost

Model Parameters#

  • Sampling steps: Higher values improve quality but significantly increase runtime

  • Recycling steps: More iterations improve accuracy with modest runtime increase

  • Diffusion samples: Multiple samples provide diversity but multiply computational cost

Performance Characteristics#

Typical Runtimes#

For reference, runtime varies with sequence length, backend selection, and template usage.
Detailed benchmark metrics for the current release are reported in the tables below.

Performance Metrics (Boltz2 v1.6.0)#

The tables below report benchmark results from Boltz2 v1.6.0 performance runs.

Configuration#

Parameter

Setting

workers

1

output_format

mmcif

benchmark_mode

structure prediction

compared_backends

OSS, TensorRT

structural_templates

compared with and without templates

Table 1: Performance Across the Supported NVIDIA Hardware Units#

The table below reports predict_time (seconds) with TensorRT and no templates.

Test ID

Sequence Length

Predict time (s)

8eil

186

2.90

8c4d

331

4.94

7qsj

375

6.26

8cpk

384

7.87

8are

530

10.60

8owf

575

12.13

7tpu

616

11.25

7ylz

623

14.70

8gpp

628

13.97

8clz

684

15.98

8k7x

858

24.88

8ibx

1286

37.80

8gi1

1464

56.35

8sm6

1496

66.96

8pso

1499

53.54

msc1

1588

71.64

bcor

1755

96.21

evpl

2033

138.09

Test ID

Sequence Length

Predict time (s)

8eil

186

1.56

8c4d

331

3.01

7qsj

375

3.34

8cpk

384

5.28

8are

530

6.01

8owf

575

6.98

7tpu

616

6.29

7ylz

623

8.22

8gpp

628

7.77

8clz

684

8.87

8k7x

858

14.11

8ibx

1286

29.63

8gi1

1464

35.55

8sm6

1496

40.23

8pso

1499

35.48

msc1

1588

44.42

bcor

1755

57.42

evpl

2033

83.43

Test ID

Sequence Length

Predict time (s)

8eil

186

1.72

8c4d

331

2.88

7qsj

375

3.81

8cpk

384

5.47

8are

530

6.63

8owf

575

7.17

7tpu

616

6.72

7ylz

623

8.74

8gpp

628

8.23

8clz

684

9.39

8k7x

858

14.56

8ibx

1286

24.67

8gi1

1464

29.49

8sm6

1496

35.64

8pso

1499

28.35

msc1

1588

37.96

bcor

1755

51.69

evpl

2033

79.55

Test ID

Sequence Length

Predict time (s)

8eil

186

6.15

8c4d

331

12.06

7qsj

375

10.13

8cpk

384

10.65

8are

530

10.68

8owf

575

11.76

7tpu

616

10.71

7ylz

623

14.61

8gpp

628

12.71

8clz

684

13.72

8k7x

858

20.10

8ibx

1286

37.65

8gi1

1464

40.06

8sm6

1496

45.58

8pso

1499

39.31

msc1

1588

49.02

bcor

1755

62.82

evpl

2033

109.54

Test ID

Sequence Length

Predict time (s)

8eil

186

1.44

8c4d

331

2.48

7qsj

375

3.22

8cpk

384

4.56

8are

530

5.66

8owf

575

6.17

7tpu

616

5.64

7ylz

623

7.41

8gpp

628

6.93

8clz

684

7.92

8k7x

858

12.47

8ibx

1286

21.62

8gi1

1464

25.77

8sm6

1496

31.06

8pso

1499

25.13

msc1

1588

32.81

bcor

1755

45.32

evpl

2033

71.01

Test ID

Sequence Length

Predict time (s)

8eil

186

2.08

8c4d

331

4.41

7qsj

375

5.92

8cpk

384

8.10

8are

530

12.82

8owf

575

14.40

7tpu

616

14.17

7ylz

623

17.14

8gpp

628

17.08

8clz

684

19.70

8k7x

858

33.47

8ibx

1286

47.45

8gi1

1464

71.14

8sm6

1496

86.33

8pso

1499

67.11

msc1

1588

bcor

1755

evpl

2033

Test ID

Sequence Length

Predict time (s)

8eil

186

7.42

8c4d

331

19.47

7qsj

375

25.92

8cpk

384

27.51

8are

530

61.13

8owf

575

72.77

7tpu

616

77.48

7ylz

623

87.51

8gpp

628

86.83

8clz

684

106.46

8k7x

858

169.95

8ibx

1286

353.91

8gi1

1464

892.86

8sm6

1496

999.09

8pso

1499

730.60

msc1

1588

bcor

1755

evpl

2033

Test ID

Sequence Length

Predict time (s)

8eil

186

2.14

8c4d

331

4.35

7qsj

375

5.61

8cpk

384

7.85

8are

530

12.11

8owf

575

13.48

7tpu

616

12.58

7ylz

623

16.27

8gpp

628

15.58

8clz

684

19.34

8k7x

858

33.21

8ibx

1286

47.70

8gi1

1464

67.22

8sm6

1496

81.59

8pso

1499

63.74

msc1

1588

bcor

1755

evpl

2033

Test ID

Sequence Length

Predict time (s)

8eil

186

1.66

8c4d

331

3.05

7qsj

375

4.26

8cpk

384

5.50

8are

530

8.61

8owf

575

10.07

7tpu

616

9.64

7ylz

623

12.04

8gpp

628

11.90

8clz

684

13.89

8k7x

858

21.85

8ibx

1286

39.79

8gi1

1464

54.29

8sm6

1496

63.36

8pso

1499

53.84

msc1

1588

66.75

bcor

1755

85.45

evpl

2033

117.66

Test ID

Sequence Length

Predict time (s)

8eil

186

2.22

8c4d

331

3.09

7qsj

375

4.47

8cpk

384

5.43

8are

530

6.85

8owf

575

7.97

7tpu

616

7.87

7ylz

623

10.32

8gpp

628

9.83

8clz

684

12.15

8k7x

858

18.69

8ibx

1286

30.38

8gi1

1464

44.89

8sm6

1496

54.34

8pso

1499

44.27

msc1

1588

59.61

bcor

1755

85.19

evpl

2033

133.14

Table 2: Performance Across Optimization Backends#

The table below compares H100 performance between OSS and TensorRT backends without templates.

Test ID

Sequence Length

OSS (s)

TensorRT (s)

Speed up

8eil

186

11.07

1.72

6.44x

8c4d

331

6.80

2.88

2.36x

7qsj

375

7.76

3.81

2.04x

8cpk

384

7.93

5.47

1.45x

8are

530

12.24

6.63

1.85x

8owf

575

13.45

7.17

1.88x

7tpu

616

19.78

6.72

2.94x

7ylz

623

16.59

8.74

1.90x

8gpp

628

15.40

8.23

1.87x

8clz

684

17.43

9.39

1.86x

8k7x

858

26.84

14.56

1.84x

8ibx

1286

50.32

24.67

2.04x

8gi1

1464

66.87

29.49

2.27x

8sm6

1496

77.02

35.64

2.16x

8pso

1499

66.41

28.35

2.34x

msc1

1588

64.28

37.96

1.69x

bcor

1755

85.22

51.69

1.65x

evpl

2033

123.07

79.55

1.55x

Table 3: Performance Impact From Structural Templates#

The table below reports H100 TensorRT performance with and without templates.

Test ID

Sequence Length

Without Templates (s)

With Templates (s)

8eil

186

1.72

7.27

8c4d

331

2.88

3.42

7qsj

375

3.81

5.05

8cpk

384

5.47

5.93

8are

530

6.63

7.82

8owf

575

7.17

8.13

7tpu

616

6.72

8.51

7ylz

623

8.74

10.26

8gpp

628

8.23

20.81

8clz

684

9.39

11.20

8k7x

858

14.56

14.65

8ibx

1286

24.67

27.11

8gi1

1464

29.49

31.87

8sm6

1496

35.64

43.65

8pso

1499

28.35

32.91

msc1

1588

37.96

42.01

bcor

1755

51.69

56.17

evpl

2033

79.55

86.66

Boltz2 v1.6.0 Performance Results on H100#

Typical Runtimes#

For reference, approximate runtimes on NVIDIA H100 80GB HBM3:

Structure Prediction (Boltz2 v1.6.0 on H100):

  • ~200 residues: 1.72 seconds (TensorRT) / 11.07 seconds (OSS)

  • ~500-700 residues: 6.63-9.39 seconds (TensorRT) / 12.24-19.78 seconds (OSS)

  • ~1200-1500 residues: 24.67-35.64 seconds (TensorRT) / 50.32-77.02 seconds (OSS)

  • ~1500-1800 residues: 37.96-51.69 seconds (TensorRT) / 64.28-85.22 seconds (OSS)

  • ~2000 residues: 79.55 seconds (TensorRT) / 123.07 seconds (OSS)

Binding Affinity Prediction (Boltz2 v1.6.0 on H100):

  • ~200 residues: 6.21 seconds (TensorRT) / 18.52 seconds (OSS)

  • ~500-700 residues: 10.74-14.16 seconds (TensorRT) / 26.96-34.03 seconds (OSS)

  • ~1200-1500 residues: 29.69-43.99 seconds (TensorRT) / 53.13-87.57 seconds (OSS)

  • ~1500-1800 residues: 43.77-54.10 seconds (TensorRT) / 80.94-93.29 seconds (OSS)

  • ~2000 residues: 81.19 seconds (TensorRT) / 130.67 seconds (OSS)

Structure Prediction Performance#

The following table shows runtime performance for Boltz2 v1.6.0 on NVIDIA H100 GPUs for structure prediction across different sequence lengths:

Test ID

Sequence Length

OSS Runtime (s)

TensorRT Runtime (s)

Speed up

8eil

186

11.07

1.72

6.44x

8c4d

331

6.80

2.88

2.36x

7qsj

375

7.76

3.81

2.04x

8cpk

384

7.93

5.47

1.45x

8are

530

12.24

6.63

1.85x

8owf

575

13.45

7.17

1.88x

7tpu

616

19.78

6.72

2.94x

7ylz

623

16.59

8.74

1.90x

8gpp

628

15.40

8.23

1.87x

8clz

684

17.43

9.39

1.86x

8k7x

858

26.84

14.56

1.84x

8ibx

1286

50.32

24.67

2.04x

8gi1

1464

66.87

29.49

2.27x

8sm6

1496

77.02

35.64

2.16x

8pso

1499

66.41

28.35

2.34x

msc1

1588

64.28

37.96

1.69x

bcor

1755

85.22

51.69

1.65x

evpl

2033

123.07

79.55

1.55x

Binding Affinity Prediction Performance#

The following table shows runtime performance for binding affinity prediction on NVIDIA H100 GPUs:

Test ID

Sequence Length

OSS Runtime (s)

TensorRT Runtime (s)

Speed up

8eil

186

18.52

6.21

2.98x

8c4d

331

22.65

7.91

2.86x

7qsj

375

23.44

7.89

2.97x

8cpk

384

24.43

8.89

2.75x

8are

530

26.96

11.15

2.42x

8owf

575

28.65

11.64

2.46x

7tpu

616

29.35

10.74

2.73x

7ylz

623

31.59

12.60

2.51x

8gpp

628

34.03

13.24

2.57x

8clz

684

32.75

14.16

2.31x

8k7x

858

38.65

19.48

1.98x

8ibx

1286

53.13

29.69

1.79x

8gi1

1464

78.53

35.91

2.19x

8sm6

1496

87.57

43.99

1.99x

8pso

1499

66.90

30.67

2.18x

msc1

1588

80.94

43.77

1.85x

bcor

1755

93.29

54.10

1.72x

evpl

2033

130.67

81.19

1.61x

Performance Analysis#

Key Observations:

  • TensorRT Optimization: TensorRT consistently outperforms OSS for both structure and affinity prediction across all 18 H100 benchmark cases.

  • Structure Prediction: Speed up ranges from 1.45x to 6.44x; H100 runtime ranges from 1.72s to 79.55s (TensorRT) vs 6.80s to 123.07s (OSS).

  • Binding Affinity Prediction: Speed up ranges from 1.61x to 2.98x; H100 runtime ranges from 6.21s to 81.19s (TensorRT) vs 18.52s to 130.67s (OSS).

  • Scaling Behavior: Runtime increases with sequence length for both OSS and TensorRT backends.

Recommended Configuration:

  • For development/testing: Use OSS for easier debugging and TensorRT for latency validation.

  • For production: Use TensorRT to maximize throughput and minimize inference latency.

  • For large proteins (>1500 residues): Prefer TensorRT and provision sufficient GPU memory headroom.

Performance Testing#

You can test basic performance and functionality with protein structure and binding affinity prediction:

import requests
import json
import time

def test_boltz2_structure_performance():
    """Test basic protein structure prediction performance."""
    url = "http://localhost:8000/biology/mit/boltz2/predict"
    
    # Test protein: Green Fluorescent Protein (GFP) - ~240 residues
    test_sequence = (
        "MSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTFCYGD"
        "QIQEQYKGIPLDGDQVQAVNGHEFEIEGEGEGRPYEGTQTAQLNKFCDKLPVMHYKQFFDSGNYNTLS"
        "AKAGFPFKVPHTYNNSSFVVKQKPGMVFKFIHGKDPGLNGQTVFLMVGGISQNLSGSSNLGVGYTFVQ"
        "KTSVLLESEIKKRLRGFHTRGAVTQGLHQFVNLPTLVTQVLDGDMSQLLQVT"
    )
    
    data = {
        "polymers": [
            {
                "id": "A",
                "molecule_type": "protein",
                "sequence": test_sequence
            }
        ],
        "recycling_steps": 3,
        "sampling_steps": 50,
        "diffusion_samples": 1,
        "step_scale": 1.638,
        "output_format": "mmcif"
    }
    
    print("Starting Boltz-2 structure prediction performance test...")
    start_time = time.time()
    
    response = requests.post(url, json=data)
    
    end_time = time.time()
    runtime = end_time - start_time
    
    if response.status_code == 200:
        result = response.json()
        confidence = result.get('confidence_scores', [0])[0]
        print(f"✓ Structure prediction successful")
        print(f"✓ Runtime: {runtime:.1f} seconds")
        print(f"✓ Confidence score: {confidence:.3f}")
        print(f"✓ Structure format: {result['structures'][0]['format']}")
        return True
    else:
        print(f"✗ Structure prediction failed: {response.status_code}")
        print(f"✗ Error: {response.text}")
        return False

def test_boltz2_affinity_performance():
    """Test binding affinity prediction performance."""
    url = "http://localhost:8000/biology/mit/boltz2/predict"
    
    # Smaller test protein for affinity testing - ~140 residues
    hemoglobin_alpha = (
        "MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALT"
        "NAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTV"
        "LTSKYR"
    )
    
    data = {
        "polymers": [
            {
                "id": "A",
                "molecule_type": "protein",
                "sequence": hemoglobin_alpha
            }
        ],
        "ligands": [
            {
                "id": "HEME",
                "smiles": "[Fe+2].C1=CC2=NC1=CC3=NC(=CC4=NC(=CC5=NC(=C2)C=C5)C=C4)C=C3",
                "predict_affinity": True
            }
        ],
        "recycling_steps": 3,
        "sampling_steps": 50,
        "sampling_steps_affinity": 100,  # Reduced from default (200) for faster performance testing
        "diffusion_samples_affinity": 3,  # Reduced from default (5) for faster performance testing
        "output_format": "mmcif"
    }
    
    print("Starting Boltz-2 binding affinity performance test...")
    start_time = time.time()
    
    response = requests.post(url, json=data)
    
    end_time = time.time()
    runtime = end_time - start_time
    
    if response.status_code == 200:
        result = response.json()
        confidence = result.get('confidence_scores', [0])[0]
        print(f"✓ Affinity prediction successful")
        print(f"✓ Runtime: {runtime:.1f} seconds")
        print(f"✓ Confidence score: {confidence:.3f}")
        
        if result.get("affinities"):
            for ligand_id, affinity_data in result["affinities"].items():
                if affinity_data.get("affinity_pic50"):
                    print(f"✓ Predicted pIC50 for {ligand_id}: {affinity_data['affinity_pic50'][0]:.2f} kcal/mol")
        return True
    else:
        print(f"✗ Affinity prediction failed: {response.status_code}")
        print(f"✗ Error: {response.text}")
        return False

if __name__ == "__main__":
    # Test both structure and affinity prediction
    structure_success = test_boltz2_structure_performance()
    print("\n" + "="*50 + "\n")
    affinity_success = test_boltz2_affinity_performance()
    
    if structure_success and affinity_success:
        print("\n✓ All performance tests passed!")
    else:
        print("\n✗ Some performance tests failed.")

Performance Optimization Tips#

  1. For development/testing: Use faster settings

    "recycling_steps": 2,
    "sampling_steps": 25
    
  2. For production quality: Use higher quality settings

    "recycling_steps": 5,
    "sampling_steps": 100
    
  3. For batch processing: Submit multiple concurrent requests with default settings

  4. For very large proteins (>1000 residues): Consider domain-based approaches or consult the literature for handling strategies

Troubleshooting Performance Issues#

Common Issues and Solutions#

General Performance Issues#

  • Out of memory errors: Reduce sequence length, decrease sampling steps, or use fewer concurrent requests

  • Slow performance: Ensure fast storage (NVMe SSD), sufficient CPU cores (12+ per GPU), and adequate system RAM (48+ GB per GPU)

  • Poor quality predictions: Increase sampling steps, recycling steps, or check input sequence quality

Binding Affinity Specific Issues#

  • Affinity prediction timeouts: Reduce sampling_steps_affinity (default: 200) and diffusion_samples_affinity (default: 5) for faster results (e.g., 50 and 1)

  • Unrealistic affinity values: Enable affinity_mw_correction for metal-containing ligands and verify SMILES format

  • Memory errors with affinity: Binding affinity requires 2-3x more memory than structure prediction alone

  • Inconsistent affinity results: Ensure you use default values (diffusion_samples_affinity=5) or increase it to 7-10 for even more reliable estimates

  • Cannot predict affinity for multiple ligands: Only one ligand per request can have predict_affinity=True

Note

For detailed performance tuning guidance specific to your deployment, refer to the Optimization section.