Performance in Boltz-2 NIM#

NIM Accuracy#

The Boltz-2 NIM is based on the state-of-the-art Boltz-2 architecture for biomolecular structure prediction. The NIM’s accuracy should match that of the reference implementation when using equivalent parameters and inputs.

Note

Running on hardware that is not listed as supported in the prerequisites section may produce results that deviate from the expected accuracy.

The accuracy of the NIM is measured by structural quality metrics such as lddt. These scores help assess the reliability of the predicted structures.

Factors Affecting NIM Performance#

The performance of the Boltz-2 NIM is determined by several key factors:

Hardware Factors#

Number and type of GPUs: More GPUs generally improve throughput for concurrent requests
GPU memory: Larger proteins and complexes require more GPU memory
Storage speed: Fast NVMe SSD storage improves model loading and caching performance

Input Complexity#

Sequence length: Runtime scales approximately quadratically with total sequence length
Number of chains: Multi-chain complexes require more computation than single chains
Ligands and constraints: Additional molecular components increase computational cost

Model Parameters#

Sampling steps: Higher values improve quality but significantly increase runtime
Recycling steps: More iterations improve accuracy with modest runtime increase
Diffusion samples: Multiple samples provide diversity but multiply computational cost

Performance Characteristics#

Typical Runtimes#

For reference, runtime varies with sequence length, backend selection, and template usage.
Detailed benchmark metrics for the current release are reported in the tables below.

Performance Metrics (Boltz2 v1.6.0)#

Note

Release 1.7.0 adds support for NVIDIA B300 (NVIDIA-B300-SXM6-AC) and NVIDIA GB300 hardware. The benchmark tables in this section were produced with Boltz2 v1.6.0; runtime and throughput on B300 and GB300 may differ from the SKUs listed in the tabs below.

The tables below report benchmark results from Boltz2 v1.6.0 performance runs.

Configuration#

Parameter	Setting
workers	1
output_format	mmcif
benchmark_mode	structure prediction
compared_backends	OSS, TensorRT
structural_templates	compared with and without templates

Table 1: Performance Across the Supported NVIDIA Hardware Units#

The table below reports predict_time (seconds) with TensorRT and no templates.

NVIDIA A100-SXM4-80GB

Test ID	Sequence Length	Predict time (s)
8eil	186	2.90
8c4d	331	4.94
7qsj	375	6.26
8cpk	384	7.87
8are	530	10.60
8owf	575	12.13
7tpu	616	11.25
7ylz	623	14.70
8gpp	628	13.97
8clz	684	15.98
8k7x	858	24.88
8ibx	1286	37.80
8gi1	1464	56.35
8sm6	1496	66.96
8pso	1499	53.54
msc1	1588	71.64
bcor	1755	96.21
evpl	2033	138.09

NVIDIA B200

Test ID	Sequence Length	Predict time (s)
8eil	186	1.56
8c4d	331	3.01
7qsj	375	3.34
8cpk	384	5.28
8are	530	6.01
8owf	575	6.98
7tpu	616	6.29
7ylz	623	8.22
8gpp	628	7.77
8clz	684	8.87
8k7x	858	14.11
8ibx	1286	29.63
8gi1	1464	35.55
8sm6	1496	40.23
8pso	1499	35.48
msc1	1588	44.42
bcor	1755	57.42
evpl	2033	83.43

NVIDIA H100 80GB HBM3

Test ID	Sequence Length	Predict time (s)
8eil	186	1.72
8c4d	331	2.88
7qsj	375	3.81
8cpk	384	5.47
8are	530	6.63
8owf	575	7.17
7tpu	616	6.72
7ylz	623	8.74
8gpp	628	8.23
8clz	684	9.39
8k7x	858	14.56
8ibx	1286	24.67
8gi1	1464	29.49
8sm6	1496	35.64
8pso	1499	28.35
msc1	1588	37.96
bcor	1755	51.69
evpl	2033	79.55

NVIDIA GB200

Test ID	Sequence Length	Predict time (s)
8eil	186	6.15
8c4d	331	12.06
7qsj	375	10.13
8cpk	384	10.65
8are	530	10.68
8owf	575	11.76
7tpu	616	10.71
7ylz	623	14.61
8gpp	628	12.71
8clz	684	13.72
8k7x	858	20.10
8ibx	1286	37.65
8gi1	1464	40.06
8sm6	1496	45.58
8pso	1499	39.31
msc1	1588	49.02
bcor	1755	62.82
evpl	2033	109.54

NVIDIA H200

Test ID	Sequence Length	Predict time (s)
8eil	186	1.44
8c4d	331	2.48
7qsj	375	3.22
8cpk	384	4.56
8are	530	5.66
8owf	575	6.17
7tpu	616	5.64
7ylz	623	7.41
8gpp	628	6.93
8clz	684	7.92
8k7x	858	12.47
8ibx	1286	21.62
8gi1	1464	25.77
8sm6	1496	31.06
8pso	1499	25.13
msc1	1588	32.81
bcor	1755	45.32
evpl	2033	71.01

NVIDIA L40S

Test ID	Sequence Length	Predict time (s)
8eil	186	2.08
8c4d	331	4.41
7qsj	375	5.92
8cpk	384	8.10
8are	530	12.82
8owf	575	14.40
7tpu	616	14.17
7ylz	623	17.14
8gpp	628	17.08
8clz	684	19.70
8k7x	858	33.47
8ibx	1286	47.45
8gi1	1464	71.14
8sm6	1496	86.33
8pso	1499	67.11
msc1	1588	—
bcor	1755	—
evpl	2033	—

NVIDIA GB10 (DGX Spark)

Test ID	Sequence Length	Predict time (s)
8eil	186	7.42
8c4d	331	19.47
7qsj	375	25.92
8cpk	384	27.51
8are	530	61.13
8owf	575	72.77
7tpu	616	77.48
7ylz	623	87.51
8gpp	628	86.83
8clz	684	106.46
8k7x	858	169.95
8ibx	1286	353.91
8gi1	1464	892.86
8sm6	1496	999.09
8pso	1499	730.60
msc1	1588	—
bcor	1755	—
evpl	2033	—

NVIDIA RTX 6000 Ada Generation

Test ID	Sequence Length	Predict time (s)
8eil	186	2.14
8c4d	331	4.35
7qsj	375	5.61
8cpk	384	7.85
8are	530	12.11
8owf	575	13.48
7tpu	616	12.58
7ylz	623	16.27
8gpp	628	15.58
8clz	684	19.34
8k7x	858	33.21
8ibx	1286	47.70
8gi1	1464	67.22
8sm6	1496	81.59
8pso	1499	63.74
msc1	1588	—
bcor	1755	—
evpl	2033	—

NVIDIA RTX PRO 6000 Blackwell Workstation Edition

Test ID	Sequence Length	Predict time (s)
8eil	186	1.66
8c4d	331	3.05
7qsj	375	4.26
8cpk	384	5.50
8are	530	8.61
8owf	575	10.07
7tpu	616	9.64
7ylz	623	12.04
8gpp	628	11.90
8clz	684	13.89
8k7x	858	21.85
8ibx	1286	39.79
8gi1	1464	54.29
8sm6	1496	63.36
8pso	1499	53.84
msc1	1588	66.75
bcor	1755	85.45
evpl	2033	117.66

NVIDIA GH200 144GB

Test ID	Sequence Length	Predict time (s)
8eil	186	2.22
8c4d	331	3.09
7qsj	375	4.47
8cpk	384	5.43
8are	530	6.85
8owf	575	7.97
7tpu	616	7.87
7ylz	623	10.32
8gpp	628	9.83
8clz	684	12.15
8k7x	858	18.69
8ibx	1286	30.38
8gi1	1464	44.89
8sm6	1496	54.34
8pso	1499	44.27
msc1	1588	59.61
bcor	1755	85.19
evpl	2033	133.14

Table 2: Performance Across Optimization Backends#

The table below compares H100 performance between OSS and TensorRT backends without templates.

Test ID	Sequence Length	OSS (s)	TensorRT (s)	Speed up
8eil	186	11.07	1.72	6.44x
8c4d	331	6.80	2.88	2.36x
7qsj	375	7.76	3.81	2.04x
8cpk	384	7.93	5.47	1.45x
8are	530	12.24	6.63	1.85x
8owf	575	13.45	7.17	1.88x
7tpu	616	19.78	6.72	2.94x
7ylz	623	16.59	8.74	1.90x
8gpp	628	15.40	8.23	1.87x
8clz	684	17.43	9.39	1.86x
8k7x	858	26.84	14.56	1.84x
8ibx	1286	50.32	24.67	2.04x
8gi1	1464	66.87	29.49	2.27x
8sm6	1496	77.02	35.64	2.16x
8pso	1499	66.41	28.35	2.34x
msc1	1588	64.28	37.96	1.69x
bcor	1755	85.22	51.69	1.65x
evpl	2033	123.07	79.55	1.55x

Table 3: Performance Impact From Structural Templates#

The table below reports H100 TensorRT performance with and without templates.

Test ID	Sequence Length	Without Templates (s)	With Templates (s)
8eil	186	1.72	7.27
8c4d	331	2.88	3.42
7qsj	375	3.81	5.05
8cpk	384	5.47	5.93
8are	530	6.63	7.82
8owf	575	7.17	8.13
7tpu	616	6.72	8.51
7ylz	623	8.74	10.26
8gpp	628	8.23	20.81
8clz	684	9.39	11.20
8k7x	858	14.56	14.65
8ibx	1286	24.67	27.11
8gi1	1464	29.49	31.87
8sm6	1496	35.64	43.65
8pso	1499	28.35	32.91
msc1	1588	37.96	42.01
bcor	1755	51.69	56.17
evpl	2033	79.55	86.66

Boltz2 v1.6.0 Performance Results on H100#

Typical Runtimes#

For reference, approximate runtimes on NVIDIA H100 80GB HBM3:

Structure Prediction (Boltz2 v1.6.0 on H100):

~200 residues: 1.72 seconds (TensorRT) / 11.07 seconds (OSS)
~500-700 residues: 6.63-9.39 seconds (TensorRT) / 12.24-19.78 seconds (OSS)
~1200-1500 residues: 24.67-35.64 seconds (TensorRT) / 50.32-77.02 seconds (OSS)
~1500-1800 residues: 37.96-51.69 seconds (TensorRT) / 64.28-85.22 seconds (OSS)
~2000 residues: 79.55 seconds (TensorRT) / 123.07 seconds (OSS)

Binding Affinity Prediction (Boltz2 v1.6.0 on H100):

~200 residues: 6.21 seconds (TensorRT) / 18.52 seconds (OSS)
~500-700 residues: 10.74-14.16 seconds (TensorRT) / 26.96-34.03 seconds (OSS)
~1200-1500 residues: 29.69-43.99 seconds (TensorRT) / 53.13-87.57 seconds (OSS)
~1500-1800 residues: 43.77-54.10 seconds (TensorRT) / 80.94-93.29 seconds (OSS)
~2000 residues: 81.19 seconds (TensorRT) / 130.67 seconds (OSS)

Structure Prediction Performance#

The following table shows runtime performance for Boltz2 v1.6.0 on NVIDIA H100 GPUs for structure prediction across different sequence lengths:

Test ID	Sequence Length	OSS Runtime (s)	TensorRT Runtime (s)	Speed up
8eil	186	11.07	1.72	6.44x
8c4d	331	6.80	2.88	2.36x
7qsj	375	7.76	3.81	2.04x
8cpk	384	7.93	5.47	1.45x
8are	530	12.24	6.63	1.85x
8owf	575	13.45	7.17	1.88x
7tpu	616	19.78	6.72	2.94x
7ylz	623	16.59	8.74	1.90x
8gpp	628	15.40	8.23	1.87x
8clz	684	17.43	9.39	1.86x
8k7x	858	26.84	14.56	1.84x
8ibx	1286	50.32	24.67	2.04x
8gi1	1464	66.87	29.49	2.27x
8sm6	1496	77.02	35.64	2.16x
8pso	1499	66.41	28.35	2.34x
msc1	1588	64.28	37.96	1.69x
bcor	1755	85.22	51.69	1.65x
evpl	2033	123.07	79.55	1.55x

Binding Affinity Prediction Performance#

The following table shows runtime performance for binding affinity prediction on NVIDIA H100 GPUs:

Test ID	Sequence Length	OSS Runtime (s)	TensorRT Runtime (s)	Speed up
8eil	186	18.52	6.21	2.98x
8c4d	331	22.65	7.91	2.86x
7qsj	375	23.44	7.89	2.97x
8cpk	384	24.43	8.89	2.75x
8are	530	26.96	11.15	2.42x
8owf	575	28.65	11.64	2.46x
7tpu	616	29.35	10.74	2.73x
7ylz	623	31.59	12.60	2.51x
8gpp	628	34.03	13.24	2.57x
8clz	684	32.75	14.16	2.31x
8k7x	858	38.65	19.48	1.98x
8ibx	1286	53.13	29.69	1.79x
8gi1	1464	78.53	35.91	2.19x
8sm6	1496	87.57	43.99	1.99x
8pso	1499	66.90	30.67	2.18x
msc1	1588	80.94	43.77	1.85x
bcor	1755	93.29	54.10	1.72x
evpl	2033	130.67	81.19	1.61x

Performance Analysis#

Key Observations:

TensorRT Optimization: TensorRT consistently outperforms OSS for both structure and affinity prediction across all 18 H100 benchmark cases.
Structure Prediction: Speed up ranges from 1.45x to 6.44x; H100 runtime ranges from 1.72s to 79.55s (TensorRT) vs 6.80s to 123.07s (OSS).
Binding Affinity Prediction: Speed up ranges from 1.61x to 2.98x; H100 runtime ranges from 6.21s to 81.19s (TensorRT) vs 18.52s to 130.67s (OSS).
Scaling Behavior: Runtime increases with sequence length for both OSS and TensorRT backends.

Recommended Configuration:

For development/testing: Use OSS for easier debugging and TensorRT for latency validation.
For production: Use TensorRT to maximize throughput and minimize inference latency.
For large proteins (>1500 residues): Prefer TensorRT and provision sufficient GPU memory headroom.

Performance Testing#

You can test basic performance and functionality with protein structure and binding affinity prediction:

import requests
import json
import time

def test_boltz2_structure_performance():
    """Test basic protein structure prediction performance."""
    url = "http://localhost:8000/biology/mit/boltz2/predict"
    
    # Test protein: Green Fluorescent Protein (GFP) - ~240 residues
    test_sequence = (
        "MSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTFCYGD"
        "QIQEQYKGIPLDGDQVQAVNGHEFEIEGEGEGRPYEGTQTAQLNKFCDKLPVMHYKQFFDSGNYNTLS"
        "AKAGFPFKVPHTYNNSSFVVKQKPGMVFKFIHGKDPGLNGQTVFLMVGGISQNLSGSSNLGVGYTFVQ"
        "KTSVLLESEIKKRLRGFHTRGAVTQGLHQFVNLPTLVTQVLDGDMSQLLQVT"
    )
    
    data = {
        "polymers": [
            {
                "id": "A",
                "molecule_type": "protein",
                "sequence": test_sequence
            }
        ],
        "recycling_steps": 3,
        "sampling_steps": 50,
        "diffusion_samples": 1,
        "step_scale": 1.638,
        "output_format": "mmcif"
    }
    
    print("Starting Boltz-2 structure prediction performance test...")
    start_time = time.time()
    
    response = requests.post(url, json=data)
    
    end_time = time.time()
    runtime = end_time - start_time
    
    if response.status_code == 200:
        result = response.json()
        confidence = result.get('confidence_scores', [0])[0]
        print(f"✓ Structure prediction successful")
        print(f"✓ Runtime: {runtime:.1f} seconds")
        print(f"✓ Confidence score: {confidence:.3f}")
        print(f"✓ Structure format: {result['structures'][0]['format']}")
        return True
    else:
        print(f"✗ Structure prediction failed: {response.status_code}")
        print(f"✗ Error: {response.text}")
        return False

def test_boltz2_affinity_performance():
    """Test binding affinity prediction performance."""
    url = "http://localhost:8000/biology/mit/boltz2/predict"
    
    # Smaller test protein for affinity testing - ~140 residues
    hemoglobin_alpha = (
        "MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALT"
        "NAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTV"
        "LTSKYR"
    )
    
    data = {
        "polymers": [
            {
                "id": "A",
                "molecule_type": "protein",
                "sequence": hemoglobin_alpha
            }
        ],
        "ligands": [
            {
                "id": "HEME",
                "smiles": "[Fe+2].C1=CC2=NC1=CC3=NC(=CC4=NC(=CC5=NC(=C2)C=C5)C=C4)C=C3",
                "predict_affinity": True
            }
        ],
        "recycling_steps": 3,
        "sampling_steps": 50,
        "sampling_steps_affinity": 100,  # Reduced from default (200) for faster performance testing
        "diffusion_samples_affinity": 3,  # Reduced from default (5) for faster performance testing
        "output_format": "mmcif"
    }
    
    print("Starting Boltz-2 binding affinity performance test...")
    start_time = time.time()
    
    response = requests.post(url, json=data)
    
    end_time = time.time()
    runtime = end_time - start_time
    
    if response.status_code == 200:
        result = response.json()
        confidence = result.get('confidence_scores', [0])[0]
        print(f"✓ Affinity prediction successful")
        print(f"✓ Runtime: {runtime:.1f} seconds")
        print(f"✓ Confidence score: {confidence:.3f}")
        
        if result.get("affinities"):
            for ligand_id, affinity_data in result["affinities"].items():
                if affinity_data.get("affinity_pic50"):
                    print(f"✓ Predicted pIC50 for {ligand_id}: {affinity_data['affinity_pic50'][0]:.2f}")
        return True
    else:
        print(f"✗ Affinity prediction failed: {response.status_code}")
        print(f"✗ Error: {response.text}")
        return False

if __name__ == "__main__":
    # Test both structure and affinity prediction
    structure_success = test_boltz2_structure_performance()
    print("\n" + "="*50 + "\n")
    affinity_success = test_boltz2_affinity_performance()
    
    if structure_success and affinity_success:
        print("\n✓ All performance tests passed!")
    else:
        print("\n✗ Some performance tests failed.")

Performance Optimization Tips#

For development/testing: Use faster settings

"recycling_steps": 2,
"sampling_steps": 25

For production quality: Use higher quality settings
```
"recycling_steps": 5,
"sampling_steps": 100
```
For batch processing: Submit multiple concurrent requests with default settings
For very large proteins (>1000 residues): Consider domain-based approaches or consult the literature for handling strategies

Troubleshooting Performance Issues#

Common Issues and Solutions#

General Performance Issues#

Out of memory errors: Reduce sequence length, decrease sampling steps, or use fewer concurrent requests
Slow performance: Ensure fast storage (NVMe SSD), sufficient CPU cores (12+ per GPU), and adequate system RAM (48+ GB per GPU)
Poor quality predictions: Increase sampling steps, recycling steps, or check input sequence quality

Binding Affinity Specific Issues#

Affinity prediction timeouts: Reduce sampling_steps_affinity (default: 200) and diffusion_samples_affinity (default: 5) for faster results (e.g., 50 and 1)
Unrealistic affinity values: Enable affinity_mw_correction for metal-containing ligands and verify SMILES format
Memory errors with affinity: Binding affinity requires 2-3x more memory than structure prediction alone
Inconsistent affinity results: Ensure you use default values (diffusion_samples_affinity=5) or increase it to 7-10 for even more reliable estimates
Cannot predict affinity for multiple ligands: Only one ligand per request can have predict_affinity=True

Note

For detailed performance tuning guidance specific to your deployment, refer to the Optimization section.