Performance in Boltz-2 NIM#
NIM Accuracy#
The Boltz-2 NIM is based on the state-of-the-art Boltz-2 architecture for biomolecular structure prediction. The NIM’s accuracy should match that of the reference implementation when using equivalent parameters and inputs.
Note
Running on hardware that is not listed as supported in the prerequisites section may produce results that deviate from the expected accuracy.
The accuracy of the NIM is measured by structural quality metrics such as lddt. These scores help assess the reliability of the predicted structures.
Factors Affecting NIM Performance#
The performance of the Boltz-2 NIM is determined by several key factors:
Hardware Factors#
Number and type of GPUs: More GPUs generally improve throughput for concurrent requests
GPU memory: Larger proteins and complexes require more GPU memory
Storage speed: Fast NVMe SSD storage improves model loading and caching performance
Input Complexity#
Sequence length: Runtime scales approximately quadratically with total sequence length
Number of chains: Multi-chain complexes require more computation than single chains
Ligands and constraints: Additional molecular components increase computational cost
Model Parameters#
Sampling steps: Higher values improve quality but significantly increase runtime
Recycling steps: More iterations improve accuracy with modest runtime increase
Diffusion samples: Multiple samples provide diversity but multiply computational cost
Performance Characteristics#
Typical Runtimes#
For reference, runtime varies with sequence length, backend selection, and template usage.
Detailed benchmark metrics for the current release are reported in the tables below.
Performance Metrics (Boltz2 v1.6.0)#
The tables below report benchmark results from Boltz2 v1.6.0 performance runs.
Configuration#
Parameter |
Setting |
|---|---|
workers |
1 |
output_format |
mmcif |
benchmark_mode |
structure prediction |
compared_backends |
OSS, TensorRT |
structural_templates |
compared with and without templates |
Table 1: Performance Across the Supported NVIDIA Hardware Units#
The table below reports predict_time (seconds) with TensorRT and no templates.
Test ID |
Sequence Length |
Predict time (s) |
|---|---|---|
8eil |
186 |
2.90 |
8c4d |
331 |
4.94 |
7qsj |
375 |
6.26 |
8cpk |
384 |
7.87 |
8are |
530 |
10.60 |
8owf |
575 |
12.13 |
7tpu |
616 |
11.25 |
7ylz |
623 |
14.70 |
8gpp |
628 |
13.97 |
8clz |
684 |
15.98 |
8k7x |
858 |
24.88 |
8ibx |
1286 |
37.80 |
8gi1 |
1464 |
56.35 |
8sm6 |
1496 |
66.96 |
8pso |
1499 |
53.54 |
msc1 |
1588 |
71.64 |
bcor |
1755 |
96.21 |
evpl |
2033 |
138.09 |
Test ID |
Sequence Length |
Predict time (s) |
|---|---|---|
8eil |
186 |
1.56 |
8c4d |
331 |
3.01 |
7qsj |
375 |
3.34 |
8cpk |
384 |
5.28 |
8are |
530 |
6.01 |
8owf |
575 |
6.98 |
7tpu |
616 |
6.29 |
7ylz |
623 |
8.22 |
8gpp |
628 |
7.77 |
8clz |
684 |
8.87 |
8k7x |
858 |
14.11 |
8ibx |
1286 |
29.63 |
8gi1 |
1464 |
35.55 |
8sm6 |
1496 |
40.23 |
8pso |
1499 |
35.48 |
msc1 |
1588 |
44.42 |
bcor |
1755 |
57.42 |
evpl |
2033 |
83.43 |
Test ID |
Sequence Length |
Predict time (s) |
|---|---|---|
8eil |
186 |
1.72 |
8c4d |
331 |
2.88 |
7qsj |
375 |
3.81 |
8cpk |
384 |
5.47 |
8are |
530 |
6.63 |
8owf |
575 |
7.17 |
7tpu |
616 |
6.72 |
7ylz |
623 |
8.74 |
8gpp |
628 |
8.23 |
8clz |
684 |
9.39 |
8k7x |
858 |
14.56 |
8ibx |
1286 |
24.67 |
8gi1 |
1464 |
29.49 |
8sm6 |
1496 |
35.64 |
8pso |
1499 |
28.35 |
msc1 |
1588 |
37.96 |
bcor |
1755 |
51.69 |
evpl |
2033 |
79.55 |
Test ID |
Sequence Length |
Predict time (s) |
|---|---|---|
8eil |
186 |
6.15 |
8c4d |
331 |
12.06 |
7qsj |
375 |
10.13 |
8cpk |
384 |
10.65 |
8are |
530 |
10.68 |
8owf |
575 |
11.76 |
7tpu |
616 |
10.71 |
7ylz |
623 |
14.61 |
8gpp |
628 |
12.71 |
8clz |
684 |
13.72 |
8k7x |
858 |
20.10 |
8ibx |
1286 |
37.65 |
8gi1 |
1464 |
40.06 |
8sm6 |
1496 |
45.58 |
8pso |
1499 |
39.31 |
msc1 |
1588 |
49.02 |
bcor |
1755 |
62.82 |
evpl |
2033 |
109.54 |
Test ID |
Sequence Length |
Predict time (s) |
|---|---|---|
8eil |
186 |
1.44 |
8c4d |
331 |
2.48 |
7qsj |
375 |
3.22 |
8cpk |
384 |
4.56 |
8are |
530 |
5.66 |
8owf |
575 |
6.17 |
7tpu |
616 |
5.64 |
7ylz |
623 |
7.41 |
8gpp |
628 |
6.93 |
8clz |
684 |
7.92 |
8k7x |
858 |
12.47 |
8ibx |
1286 |
21.62 |
8gi1 |
1464 |
25.77 |
8sm6 |
1496 |
31.06 |
8pso |
1499 |
25.13 |
msc1 |
1588 |
32.81 |
bcor |
1755 |
45.32 |
evpl |
2033 |
71.01 |
Test ID |
Sequence Length |
Predict time (s) |
|---|---|---|
8eil |
186 |
2.08 |
8c4d |
331 |
4.41 |
7qsj |
375 |
5.92 |
8cpk |
384 |
8.10 |
8are |
530 |
12.82 |
8owf |
575 |
14.40 |
7tpu |
616 |
14.17 |
7ylz |
623 |
17.14 |
8gpp |
628 |
17.08 |
8clz |
684 |
19.70 |
8k7x |
858 |
33.47 |
8ibx |
1286 |
47.45 |
8gi1 |
1464 |
71.14 |
8sm6 |
1496 |
86.33 |
8pso |
1499 |
67.11 |
msc1 |
1588 |
— |
bcor |
1755 |
— |
evpl |
2033 |
— |
Test ID |
Sequence Length |
Predict time (s) |
|---|---|---|
8eil |
186 |
7.42 |
8c4d |
331 |
19.47 |
7qsj |
375 |
25.92 |
8cpk |
384 |
27.51 |
8are |
530 |
61.13 |
8owf |
575 |
72.77 |
7tpu |
616 |
77.48 |
7ylz |
623 |
87.51 |
8gpp |
628 |
86.83 |
8clz |
684 |
106.46 |
8k7x |
858 |
169.95 |
8ibx |
1286 |
353.91 |
8gi1 |
1464 |
892.86 |
8sm6 |
1496 |
999.09 |
8pso |
1499 |
730.60 |
msc1 |
1588 |
— |
bcor |
1755 |
— |
evpl |
2033 |
— |
Test ID |
Sequence Length |
Predict time (s) |
|---|---|---|
8eil |
186 |
2.14 |
8c4d |
331 |
4.35 |
7qsj |
375 |
5.61 |
8cpk |
384 |
7.85 |
8are |
530 |
12.11 |
8owf |
575 |
13.48 |
7tpu |
616 |
12.58 |
7ylz |
623 |
16.27 |
8gpp |
628 |
15.58 |
8clz |
684 |
19.34 |
8k7x |
858 |
33.21 |
8ibx |
1286 |
47.70 |
8gi1 |
1464 |
67.22 |
8sm6 |
1496 |
81.59 |
8pso |
1499 |
63.74 |
msc1 |
1588 |
— |
bcor |
1755 |
— |
evpl |
2033 |
— |
Test ID |
Sequence Length |
Predict time (s) |
|---|---|---|
8eil |
186 |
1.66 |
8c4d |
331 |
3.05 |
7qsj |
375 |
4.26 |
8cpk |
384 |
5.50 |
8are |
530 |
8.61 |
8owf |
575 |
10.07 |
7tpu |
616 |
9.64 |
7ylz |
623 |
12.04 |
8gpp |
628 |
11.90 |
8clz |
684 |
13.89 |
8k7x |
858 |
21.85 |
8ibx |
1286 |
39.79 |
8gi1 |
1464 |
54.29 |
8sm6 |
1496 |
63.36 |
8pso |
1499 |
53.84 |
msc1 |
1588 |
66.75 |
bcor |
1755 |
85.45 |
evpl |
2033 |
117.66 |
Test ID |
Sequence Length |
Predict time (s) |
|---|---|---|
8eil |
186 |
2.22 |
8c4d |
331 |
3.09 |
7qsj |
375 |
4.47 |
8cpk |
384 |
5.43 |
8are |
530 |
6.85 |
8owf |
575 |
7.97 |
7tpu |
616 |
7.87 |
7ylz |
623 |
10.32 |
8gpp |
628 |
9.83 |
8clz |
684 |
12.15 |
8k7x |
858 |
18.69 |
8ibx |
1286 |
30.38 |
8gi1 |
1464 |
44.89 |
8sm6 |
1496 |
54.34 |
8pso |
1499 |
44.27 |
msc1 |
1588 |
59.61 |
bcor |
1755 |
85.19 |
evpl |
2033 |
133.14 |
Table 2: Performance Across Optimization Backends#
The table below compares H100 performance between OSS and TensorRT backends without templates.
Test ID |
Sequence Length |
OSS (s) |
TensorRT (s) |
Speed up |
|---|---|---|---|---|
8eil |
186 |
11.07 |
1.72 |
6.44x |
8c4d |
331 |
6.80 |
2.88 |
2.36x |
7qsj |
375 |
7.76 |
3.81 |
2.04x |
8cpk |
384 |
7.93 |
5.47 |
1.45x |
8are |
530 |
12.24 |
6.63 |
1.85x |
8owf |
575 |
13.45 |
7.17 |
1.88x |
7tpu |
616 |
19.78 |
6.72 |
2.94x |
7ylz |
623 |
16.59 |
8.74 |
1.90x |
8gpp |
628 |
15.40 |
8.23 |
1.87x |
8clz |
684 |
17.43 |
9.39 |
1.86x |
8k7x |
858 |
26.84 |
14.56 |
1.84x |
8ibx |
1286 |
50.32 |
24.67 |
2.04x |
8gi1 |
1464 |
66.87 |
29.49 |
2.27x |
8sm6 |
1496 |
77.02 |
35.64 |
2.16x |
8pso |
1499 |
66.41 |
28.35 |
2.34x |
msc1 |
1588 |
64.28 |
37.96 |
1.69x |
bcor |
1755 |
85.22 |
51.69 |
1.65x |
evpl |
2033 |
123.07 |
79.55 |
1.55x |
Table 3: Performance Impact From Structural Templates#
The table below reports H100 TensorRT performance with and without templates.
Test ID |
Sequence Length |
Without Templates (s) |
With Templates (s) |
|---|---|---|---|
8eil |
186 |
1.72 |
7.27 |
8c4d |
331 |
2.88 |
3.42 |
7qsj |
375 |
3.81 |
5.05 |
8cpk |
384 |
5.47 |
5.93 |
8are |
530 |
6.63 |
7.82 |
8owf |
575 |
7.17 |
8.13 |
7tpu |
616 |
6.72 |
8.51 |
7ylz |
623 |
8.74 |
10.26 |
8gpp |
628 |
8.23 |
20.81 |
8clz |
684 |
9.39 |
11.20 |
8k7x |
858 |
14.56 |
14.65 |
8ibx |
1286 |
24.67 |
27.11 |
8gi1 |
1464 |
29.49 |
31.87 |
8sm6 |
1496 |
35.64 |
43.65 |
8pso |
1499 |
28.35 |
32.91 |
msc1 |
1588 |
37.96 |
42.01 |
bcor |
1755 |
51.69 |
56.17 |
evpl |
2033 |
79.55 |
86.66 |
Boltz2 v1.6.0 Performance Results on H100#
Typical Runtimes#
For reference, approximate runtimes on NVIDIA H100 80GB HBM3:
Structure Prediction (Boltz2 v1.6.0 on H100):
~200 residues: 1.72 seconds (TensorRT) / 11.07 seconds (OSS)
~500-700 residues: 6.63-9.39 seconds (TensorRT) / 12.24-19.78 seconds (OSS)
~1200-1500 residues: 24.67-35.64 seconds (TensorRT) / 50.32-77.02 seconds (OSS)
~1500-1800 residues: 37.96-51.69 seconds (TensorRT) / 64.28-85.22 seconds (OSS)
~2000 residues: 79.55 seconds (TensorRT) / 123.07 seconds (OSS)
Binding Affinity Prediction (Boltz2 v1.6.0 on H100):
~200 residues: 6.21 seconds (TensorRT) / 18.52 seconds (OSS)
~500-700 residues: 10.74-14.16 seconds (TensorRT) / 26.96-34.03 seconds (OSS)
~1200-1500 residues: 29.69-43.99 seconds (TensorRT) / 53.13-87.57 seconds (OSS)
~1500-1800 residues: 43.77-54.10 seconds (TensorRT) / 80.94-93.29 seconds (OSS)
~2000 residues: 81.19 seconds (TensorRT) / 130.67 seconds (OSS)
Structure Prediction Performance#
The following table shows runtime performance for Boltz2 v1.6.0 on NVIDIA H100 GPUs for structure prediction across different sequence lengths:
Test ID |
Sequence Length |
OSS Runtime (s) |
TensorRT Runtime (s) |
Speed up |
|---|---|---|---|---|
8eil |
186 |
11.07 |
1.72 |
6.44x |
8c4d |
331 |
6.80 |
2.88 |
2.36x |
7qsj |
375 |
7.76 |
3.81 |
2.04x |
8cpk |
384 |
7.93 |
5.47 |
1.45x |
8are |
530 |
12.24 |
6.63 |
1.85x |
8owf |
575 |
13.45 |
7.17 |
1.88x |
7tpu |
616 |
19.78 |
6.72 |
2.94x |
7ylz |
623 |
16.59 |
8.74 |
1.90x |
8gpp |
628 |
15.40 |
8.23 |
1.87x |
8clz |
684 |
17.43 |
9.39 |
1.86x |
8k7x |
858 |
26.84 |
14.56 |
1.84x |
8ibx |
1286 |
50.32 |
24.67 |
2.04x |
8gi1 |
1464 |
66.87 |
29.49 |
2.27x |
8sm6 |
1496 |
77.02 |
35.64 |
2.16x |
8pso |
1499 |
66.41 |
28.35 |
2.34x |
msc1 |
1588 |
64.28 |
37.96 |
1.69x |
bcor |
1755 |
85.22 |
51.69 |
1.65x |
evpl |
2033 |
123.07 |
79.55 |
1.55x |
Binding Affinity Prediction Performance#
The following table shows runtime performance for binding affinity prediction on NVIDIA H100 GPUs:
Test ID |
Sequence Length |
OSS Runtime (s) |
TensorRT Runtime (s) |
Speed up |
|---|---|---|---|---|
8eil |
186 |
18.52 |
6.21 |
2.98x |
8c4d |
331 |
22.65 |
7.91 |
2.86x |
7qsj |
375 |
23.44 |
7.89 |
2.97x |
8cpk |
384 |
24.43 |
8.89 |
2.75x |
8are |
530 |
26.96 |
11.15 |
2.42x |
8owf |
575 |
28.65 |
11.64 |
2.46x |
7tpu |
616 |
29.35 |
10.74 |
2.73x |
7ylz |
623 |
31.59 |
12.60 |
2.51x |
8gpp |
628 |
34.03 |
13.24 |
2.57x |
8clz |
684 |
32.75 |
14.16 |
2.31x |
8k7x |
858 |
38.65 |
19.48 |
1.98x |
8ibx |
1286 |
53.13 |
29.69 |
1.79x |
8gi1 |
1464 |
78.53 |
35.91 |
2.19x |
8sm6 |
1496 |
87.57 |
43.99 |
1.99x |
8pso |
1499 |
66.90 |
30.67 |
2.18x |
msc1 |
1588 |
80.94 |
43.77 |
1.85x |
bcor |
1755 |
93.29 |
54.10 |
1.72x |
evpl |
2033 |
130.67 |
81.19 |
1.61x |
Performance Analysis#
Key Observations:
TensorRT Optimization: TensorRT consistently outperforms OSS for both structure and affinity prediction across all 18 H100 benchmark cases.
Structure Prediction: Speed up ranges from 1.45x to 6.44x; H100 runtime ranges from 1.72s to 79.55s (TensorRT) vs 6.80s to 123.07s (OSS).
Binding Affinity Prediction: Speed up ranges from 1.61x to 2.98x; H100 runtime ranges from 6.21s to 81.19s (TensorRT) vs 18.52s to 130.67s (OSS).
Scaling Behavior: Runtime increases with sequence length for both OSS and TensorRT backends.
Recommended Configuration:
For development/testing: Use OSS for easier debugging and TensorRT for latency validation.
For production: Use TensorRT to maximize throughput and minimize inference latency.
For large proteins (>1500 residues): Prefer TensorRT and provision sufficient GPU memory headroom.
Performance Testing#
You can test basic performance and functionality with protein structure and binding affinity prediction:
import requests
import json
import time
def test_boltz2_structure_performance():
"""Test basic protein structure prediction performance."""
url = "http://localhost:8000/biology/mit/boltz2/predict"
# Test protein: Green Fluorescent Protein (GFP) - ~240 residues
test_sequence = (
"MSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTFCYGD"
"QIQEQYKGIPLDGDQVQAVNGHEFEIEGEGEGRPYEGTQTAQLNKFCDKLPVMHYKQFFDSGNYNTLS"
"AKAGFPFKVPHTYNNSSFVVKQKPGMVFKFIHGKDPGLNGQTVFLMVGGISQNLSGSSNLGVGYTFVQ"
"KTSVLLESEIKKRLRGFHTRGAVTQGLHQFVNLPTLVTQVLDGDMSQLLQVT"
)
data = {
"polymers": [
{
"id": "A",
"molecule_type": "protein",
"sequence": test_sequence
}
],
"recycling_steps": 3,
"sampling_steps": 50,
"diffusion_samples": 1,
"step_scale": 1.638,
"output_format": "mmcif"
}
print("Starting Boltz-2 structure prediction performance test...")
start_time = time.time()
response = requests.post(url, json=data)
end_time = time.time()
runtime = end_time - start_time
if response.status_code == 200:
result = response.json()
confidence = result.get('confidence_scores', [0])[0]
print(f"✓ Structure prediction successful")
print(f"✓ Runtime: {runtime:.1f} seconds")
print(f"✓ Confidence score: {confidence:.3f}")
print(f"✓ Structure format: {result['structures'][0]['format']}")
return True
else:
print(f"✗ Structure prediction failed: {response.status_code}")
print(f"✗ Error: {response.text}")
return False
def test_boltz2_affinity_performance():
"""Test binding affinity prediction performance."""
url = "http://localhost:8000/biology/mit/boltz2/predict"
# Smaller test protein for affinity testing - ~140 residues
hemoglobin_alpha = (
"MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALT"
"NAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTV"
"LTSKYR"
)
data = {
"polymers": [
{
"id": "A",
"molecule_type": "protein",
"sequence": hemoglobin_alpha
}
],
"ligands": [
{
"id": "HEME",
"smiles": "[Fe+2].C1=CC2=NC1=CC3=NC(=CC4=NC(=CC5=NC(=C2)C=C5)C=C4)C=C3",
"predict_affinity": True
}
],
"recycling_steps": 3,
"sampling_steps": 50,
"sampling_steps_affinity": 100, # Reduced from default (200) for faster performance testing
"diffusion_samples_affinity": 3, # Reduced from default (5) for faster performance testing
"output_format": "mmcif"
}
print("Starting Boltz-2 binding affinity performance test...")
start_time = time.time()
response = requests.post(url, json=data)
end_time = time.time()
runtime = end_time - start_time
if response.status_code == 200:
result = response.json()
confidence = result.get('confidence_scores', [0])[0]
print(f"✓ Affinity prediction successful")
print(f"✓ Runtime: {runtime:.1f} seconds")
print(f"✓ Confidence score: {confidence:.3f}")
if result.get("affinities"):
for ligand_id, affinity_data in result["affinities"].items():
if affinity_data.get("affinity_pic50"):
print(f"✓ Predicted pIC50 for {ligand_id}: {affinity_data['affinity_pic50'][0]:.2f} kcal/mol")
return True
else:
print(f"✗ Affinity prediction failed: {response.status_code}")
print(f"✗ Error: {response.text}")
return False
if __name__ == "__main__":
# Test both structure and affinity prediction
structure_success = test_boltz2_structure_performance()
print("\n" + "="*50 + "\n")
affinity_success = test_boltz2_affinity_performance()
if structure_success and affinity_success:
print("\n✓ All performance tests passed!")
else:
print("\n✗ Some performance tests failed.")
Performance Optimization Tips#
For development/testing: Use faster settings
"recycling_steps": 2, "sampling_steps": 25
For production quality: Use higher quality settings
"recycling_steps": 5, "sampling_steps": 100
For batch processing: Submit multiple concurrent requests with default settings
For very large proteins (>1000 residues): Consider domain-based approaches or consult the literature for handling strategies
Troubleshooting Performance Issues#
Common Issues and Solutions#
General Performance Issues#
Out of memory errors: Reduce sequence length, decrease sampling steps, or use fewer concurrent requests
Slow performance: Ensure fast storage (NVMe SSD), sufficient CPU cores (12+ per GPU), and adequate system RAM (48+ GB per GPU)
Poor quality predictions: Increase sampling steps, recycling steps, or check input sequence quality
Binding Affinity Specific Issues#
Affinity prediction timeouts: Reduce
sampling_steps_affinity(default: 200) anddiffusion_samples_affinity(default: 5) for faster results (e.g., 50 and 1)Unrealistic affinity values: Enable
affinity_mw_correctionfor metal-containing ligands and verify SMILES formatMemory errors with affinity: Binding affinity requires 2-3x more memory than structure prediction alone
Inconsistent affinity results: Ensure you use default values (
diffusion_samples_affinity=5) or increase it to 7-10 for even more reliable estimatesCannot predict affinity for multiple ligands: Only one ligand per request can have
predict_affinity=True
Note
For detailed performance tuning guidance specific to your deployment, refer to the Optimization section.