Performance in Boltz-2 NIM#

NIM Accuracy#

The Boltz-2 NIM is based on the state-of-the-art Boltz-2 architecture for biomolecular structure prediction. The NIM’s accuracy should match that of the reference implementation when using equivalent parameters and inputs.

Note

Running on hardware that is not listed as supported in the prerequisites section may produce results that deviate from the expected accuracy.

The accuracy of the NIM is measured by structural quality metrics such as lddt. These scores help assess the reliability of the predicted structures.

Factors Affecting NIM Performance#

The performance of the Boltz-2 NIM is determined by several key factors:

Hardware Factors#

Number and type of GPUs: More GPUs generally improve throughput for concurrent requests
GPU memory: Larger proteins and complexes require more GPU memory
Storage speed: Fast NVMe SSD storage improves model loading and caching performance

Input Complexity#

Sequence length: Runtime scales approximately quadratically with total sequence length
Number of chains: Multi-chain complexes require more computation than single chains
Ligands and constraints: Additional molecular components increase computational cost

Model Parameters#

Sampling steps: Higher values improve quality but significantly increase runtime
Recycling steps: More iterations improve accuracy with modest runtime increase
Diffusion samples: Multiple samples provide diversity but multiply computational cost

Performance Characteristics#

Typical Runtimes#

For reference, approximate runtimes on high-end hardware (8x NVIDIA H100 80GB):

Protein Size	Configuration	Approximate Runtime
~100 residues	Default (steps=50, recycling=3)	2-3 minutes
~200 residues	Default	4-6 minutes
~500 residues	Default	15-25 minutes
~1000 residues	Default	60-120 minutes

Performance Testing#

You can test basic performance and functionality with a simple protein structure prediction:

import requests
import json
import time

def test_boltz2_performance():
    url = "http://localhost:8000/biology/mit/boltz2/predict"
    
    # Test protein: Green Fluorescent Protein (GFP) - ~240 residues
    test_sequence = (
        "MSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTFCYGD"
        "QIQEQYKGIPLDGDQVQAVNGHEFEIEGEGEGRPYEGTQTAQLNKFCDKLPVMHYKQFFDSGNYNTLS"
        "AKAGFPFKVPHTYNNSSFVVKQKPGMVFKFIHGKDPGLNGQTVFLMVGGISQNLSGSSNLGVGYTFVQ"
        "KTSVLLESEIKKRLRGFHTRGAVTQGLHQFVNLPTLVTQVLDGDMSQLLQVT"
    )
    
    data = {
        "polymers": [
            {
                "id": "A",
                "molecule_type": "protein",
                "sequence": test_sequence
            }
        ],
        "recycling_steps": 3,
        "sampling_steps": 50,
        "diffusion_samples": 1,
        "step_scale": 1.638,
        "output_format": "mmcif"
    }
    
    print("Starting Boltz-2 performance test...")
    start_time = time.time()
    
    response = requests.post(url, json=data)
    
    end_time = time.time()
    runtime = end_time - start_time
    
    if response.status_code == 200:
        result = response.json()
        confidence = result.get('confidence_scores', [0])[0]
        print(f"✓ Prediction successful")
        print(f"✓ Runtime: {runtime:.1f} seconds")
        print(f"✓ Confidence score: {confidence:.3f}")
        print(f"✓ Structure format: {result['structures'][0]['format']}")
        return True
    else:
        print(f"✗ Prediction failed: {response.status_code}")
        print(f"✗ Error: {response.text}")
        return False

if __name__ == "__main__":
    test_boltz2_performance()

Expected Performance Baselines#

For the test sequence above (~240 residues), you should expect:

Runtime: 3-8 minutes on 4-8 NVIDIA A100/H100 GPUs
Confidence score: Typically > 0.7 for well-folded proteins
Memory usage: ~4-8 GB GPU memory per prediction

Performance Optimization Tips#

For development/testing: Use faster settings

"recycling_steps": 2,
"sampling_steps": 25

For production quality: Use higher quality settings
```
"recycling_steps": 5,
"sampling_steps": 100
```
For batch processing: Submit multiple concurrent requests with default settings
For very large proteins (>1000 residues): Consider domain-based approaches or consult the literature for handling strategies

Troubleshooting Performance Issues#

Common Issues and Solutions#

Out of memory errors: Reduce sequence length, decrease sampling steps, or use fewer concurrent requests
Slow performance: Ensure fast storage (NVMe SSD), sufficient CPU cores (12+ per GPU), and adequate system RAM (32+ GB per GPU)
Poor quality predictions: Increase sampling steps, recycling steps, or check input sequence quality

Note

For detailed performance tuning guidance specific to your deployment, refer to the Optimization section.