Optimization with Boltz2-NIM#

This section details the options available for optimizing the Boltz-2 NIM. Note that achieving optimal performance depends on many factors and may require settings unique to your individual deployment.

Note

For most users, the default settings of the NIM will provide good performance that balances the throughput, latency, resource utilization, and complexity of using the NIM. We recommend only changing these options after consulting with an expert about your specific use case and any unique performance requirements it may have.

Automatic Profile Selection#

The Boltz-2 NIM is designed to automatically select the most suitable profile for the detected hardware from the list of available profiles. By default, the NIM will attempt to use TensorRT-LLM and NVIDIA TensorFloat32 (TF32) for inference for maximum performance. If the attached GPU does not support TensorRT-LLM, the NIM will automatically fall back to PyTorch and issue a warning.

Selecting a Profile Manually#

While the NIM automatically selects a profile, you can manually override this selection to use a specific model configuration. This can be useful for ensuring reproducibility or forcing a particular performance characteristic. Use the NIM_MODEL_PROFILE environment variable at startup to specify the desired profile.

# Example: Start the NIM with a specific profile
export NIM_MODEL_PROFILE=<profile_name>

docker run --rm --name boltz2 --runtime=nvidia \
-e NGC_API_KEY \
-e NIM_MODEL_PROFILE \
-v $LOCAL_NIM_CACHE:/opt/nim/.cache \
-p 8000:8000 \
nvcr.io/nim/mit/boltz2:1.0.0

Note

Refer to the Support Matrix for a list of available profiles and their characteristics.

Enabling or Disabling TensorRT and TensorFloat32#

For fine-grained control over performance, you can enable or disable specific acceleration features using the following environment variables.

NIM_BOLTZ_PAIRFORMER_BACKEND#

  • Default: trt

  • Values: trt, torch

  • Description: Sets the backend for the pairformer module. trt uses the TensorRT backend for maximum inference performance. torch uses the native PyTorch backend, which serves as a fallback and may be useful for debugging.

NIM_BOLTZ_ENABLE_DIFFUSION_TF32#

  • Default: 1

  • Values: 1 (enabled), 0 (disabled)

  • Description: Enables (1) or disables (0) TensorFloat32 precision for the diffusion model. Enabling TF32 provides a significant performance boost on NVIDIA Ampere and newer GPUs with minimal impact on accuracy. Disabling it can provide more deterministic numerical results.

Usage Example#

The following example starts the NIM with TF32 disabled and the pairformer backend set to PyTorch.

# Example: Disable TF32 and use the PyTorch backend for the pairformer
export NIM_BOLTZ_ENABLE_DIFFUSION_TF32=0
export NIM_BOLTZ_PAIRFORMER_BACKEND=torch

docker run --rm --name boltz2 --runtime=nvidia \
-e NGC_API_KEY \
-e NIM_BOLTZ_ENABLE_DIFFUSION_TF32 \
-e NIM_BOLTZ_PAIRFORMER_BACKEND \
-v $LOCAL_NIM_CACHE:/opt/nim/.cache \
-p 8000:8000 \
nvcr.io/nim/mit/boltz2:1.0.0

Deploying the NIM on a multi-GPU System#

The Boltz-2 NIM is designed to run with one or more NVIDIA GPUs. When increasing the number of GPUs allocated to the NIM, it is recommended to also increase the allocated number of CPU cores and RAM. As a rule of thumb, for each additional GPU allocated, you should also allocate another additional 12 CPU cores and 32 GB of additional system RAM.

Adjusting Start-Time NIM Input Limits#

The Boltz-2 NIM can be configured at startup time using environment variables to control input limits and resource usage. These settings help prevent excessively large requests that could impact performance or cause out-of-memory errors.

Environment Variables#

NIM_MAX_POLYMER_INPUTS#

  • Default: 12

  • Type: Integer

  • Description: Sets the maximum number of polymer chains (DNA, RNA, or protein) that can be included in a single prediction request.

NIM_MAX_LIGAND_INPUTS#

  • Default: 20

  • Type: Integer

  • Description: Sets the maximum number of ligands that can be included in a single prediction request.

NIM_MAX_POLYMER_LENGTH#

  • Default: 4096

  • Type: Integer

  • Description: Sets the maximum allowed length for individual polymer sequences (number of residues/nucleotides).

Usage#

Set these environment variables before starting the NIM to customize the input limits:

export NIM_MAX_POLYMER_INPUTS=8
export NIM_MAX_LIGAND_INPUTS=15
export NIM_MAX_POLYMER_LENGTH=2048

# Start the NIM with custom limits
docker run --rm --name boltz2-nim --runtime=nvidia \
  -e NGC_API_KEY \
  -e NIM_MAX_POLYMER_INPUTS \
  -e NIM_MAX_LIGAND_INPUTS \
  -e NIM_MAX_POLYMER_LENGTH \
  -v $LOCAL_NIM_CACHE:/opt/nim/.cache \
  -p 8000:8000 \
  nvcr.io/nim/mit/boltz2:1.0.0

Note

Increasing these values may result in runtime instability of the NIM, especially with regards to memory usage. Note that these limits are applied at startup and cannot be changed without restarting the NIM. Choose values that balance your performance requirements with the computational resources available to your deployment.

Optimization Parameters#

The Boltz-2 NIM provides several parameters that can be tuned to optimize performance for your specific use case:

Recycling Steps#

The recycling_steps parameter (range: 1-6, default: 3) controls the number of iterative refinement steps. Higher values generally improve accuracy but increase computation time.

Sampling Steps#

The sampling_steps parameter (range: 10-1,000, default: 50) controls the number of diffusion sampling steps. More steps can improve quality but significantly increase runtime.

Diffusion Samples#

The diffusion_samples parameter (range: 1-5, default: 1) controls how many independent structure predictions are generated. Multiple samples provide diversity but multiply the computational cost.

Step Scale#

The step_scale parameter (range: 0.5-5.0, default: 1.638) affects the sampling temperature. Lower values increase diversity among samples, while higher values may improve convergence.

Note

These parameters offer a tradeoff between prediction quality and computational cost. For production workloads, consider starting with default values and adjusting based on your specific quality and latency requirements.

The following examples highlight how to use the various optimization parameters.

import requests

# Higher recycling_steps for improved accuracy
payload = {
    "polymers": [
        {
            "id": "A",
            "molecule_type": "protein",
            "sequence": "YOUR_PROTEIN_SEQUENCE_HERE"
        }
    ],
    "recycling_steps": 5  # Default is 3. Higher values may improve accuracy.
}

response = requests.post(
    "http://localhost:8000/biology/mit/boltz2/predict",
    json=payload
)

response.raise_for_status()
print(response.json())
import requests

# Higher sampling_steps for improved quality
payload = {
    "polymers": [
        {
            "id": "A",
            "molecule_type": "protein",
            "sequence": "YOUR_PROTEIN_SEQUENCE_HERE"
        }
    ],
    "sampling_steps": 100  # Default is 50. More steps can improve quality but increase runtime.
}

response = requests.post(
    "http://localhost:8000/biology/mit/boltz2/predict",
    json=payload
)

response.raise_for_status()
print(response.json())
import requests

# Generate multiple distinct structures
payload = {
    "polymers": [
        {
            "id": "A",
            "molecule_type": "protein",
            "sequence": "YOUR_PROTEIN_SEQUENCE_HERE"
        }
    ],
    "diffusion_samples": 3  # Default is 1. Generates 3 candidate structures.
}

response = requests.post(
    "http://localhost:8000/biology/mit/boltz2/predict",
    json=payload
)

response.raise_for_status()
print(response.json())
import requests

# Lower step_scale for more diversity among samples
payload = {
    "polymers": [
        {
            "id": "A",
            "molecule_type": "protein",
            "sequence": "YOUR_PROTEIN_SEQUENCE_HERE"
        }
    ],
    "step_scale": 1.5  # Default is 1.638. Lower values increase diversity.
}

response = requests.post(
    "http://localhost:8000/biology/mit/boltz2/predict",
    json=payload
)

response.raise_for_status()
print(response.json())

Querying the NIM Repeatedly#

In computational biology and drug discovery, it is common to need to analyze hundreds, thousands, or even millions of protein sequences to find candidates with the desired properties. This section details some useful patterns for users that want to analyze more than one input sequence.

Note

The Boltz-2 NIM has been optimized for a balance of throughput and latency, and the performance of repeated queries may not be in line with published benchmarks due to the complexity of scheduling such workloads in the NIM. Factors, such as the underlying hardware and software stack, number of concurrent users, and system load, may impact the latency and throughput of the NIM.

Running Repeated Queries Against the NIM Serially#

Repeated queries against the NIM can be submitted as subsequent requests. For example, requests from a list of sequences can be submitted using a for loop. Each request is blocking, which means a maximum of one request from a single submitter will run at a time when calling the NIM in this manner. The following is an example of submitting multiple requests to the NIM:

Note

In the following examples, we use small multiple sequence alignments to demonstrate the usage of the API. Actual multiple sequence alignments may be much larger.

import requests
import json

def main():
    url = "http://localhost:8000/biology/mit/boltz2/predict"

    proteins = [
        {
            "name": "Green Fluorescent Protein (GFP)",
            "sequence": (
                "MSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTFCYGD"
                "QIQEQYKGIPLDGDQVQAVNGHEFEIEGEGEGRPYEGTQTAQ"
            ),
            "msa": {
                "uniref90": {
                    "a3m": {
                        "format": "a3m",
                        "alignment": (
                            ">seq1\nMSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTFCYGD"
                            "QIQEQYKGIPLDGDQVQAVNGHEFEIEGEGEGRPYEGTQTAQ\n"
                            ">seq2\nMSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTFCYGD"
                            "QIQEQYKGIPLDGDQVQAVNGHEFEIEGEGEGRPYEGTQTAQ"
                        )
                    }
                }
            }
        },
        {
            "name": "Tumor Protein p53",
            "sequence": (
                "MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGPDEAPRMPEA"
                "APPVAPAPAAPTPAAPAPAPSWPLSSSVPSQKTYQGSYGFRLGFLHSGTAKSVTCTYSPALNKMFCQLA"
                "KTCPVQLWVDSTPPPGTRVRAMAIYKKSQHMTEVVRRCPHHERCSDSDGLAPPQHLIRVEGNLRVEYLD"
                "DPSKYLQW"
            ),
            "msa": {
                "uniref90": {
                    "a3m": {
                        "format": "a3m",
                        "alignment": (
                            ">seq1\nMEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGPDEAPRMPEA"
                            "APPVAPAPAAPTPAAPAPAPSWPLSSSVPSQKTYQGSYGFRLGFLHSGTAKSVTCTYSPALNKMFCQLA"
                            "KTCPVQLWVDSTPPPGTRVRAMAIYKKSQHMTEVVRRCPHHERCSDSDGLAPPQHLIRVEGNLRVEYLD"
                            "DPSKYLQW\n"
                            ">seq2\nMEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGPDEAPRMPEA"
                            "APPVAPAPAAPTPAAPAPAPSWPLSSSVPSQKTYQGSYGFRLGFLHSGTAKSVTCTYSPALNKMFCQLA"
                            "KTCPVQLWVDSTPPPGTRVRAMAIYKKSQHMTEVVRRCPHHERCSDSDGLAPPQHLIRVEGNLRVEYLD"
                            "DPSKYLQW"
                        )
                    }
                }
            }
        },
        {
            "name": "Lactose Operon Repressor (LacI)",
            "sequence": (
                "MKPVTLYDVAEYAGVSYQTVSRVVNQASHVSAKTREKVEAAMAELNYIPNRVAQQLAGKQNLKDGDPTR"
                "ADKKSIEYSASVSRQQSYSIKKNLIDQFEAQKPSLTGMSADSQIGQVTKDAQAMIKAIGVNLLQFPRQ"
                "SPGDLEQGVNLTPCTLNTVTQTSLSVRGDKLIAEIGDKVAASEN"
            ),
            "msa": {
                "uniref90": {
                    "a3m": {
                        "format": "a3m",
                        "alignment": (
                            ">seq1\nMKPVTLYDVAEYAGVSYQTVSRVVNQASHVSAKTREKVEAAMAELNYIPNRVAQQLAGKQNLKDGDPTR"
                            "ADKKSIEYSASVSRQQSYSIKKNLIDQFEAQKPSLTGMSADSQIGQVTKDAQAMIKAIGVNLLQFPRQ"
                            "SPGDLEQGVNLTPCTLNTVTQTSLSVRGDKLIAEIGDKVAASEN\n"
                            ">seq2\nMKPVTLYDVAEYAGVSYQTVSRVVNQASHVSAKTREKVEAAMAELNYIPNRVAQQLAGKQNLKDGDPTR"
                            "ADKKSIEYSASVSRQQSYSIKKNLIDQFEAQKPSLTGMSADSQIGQVTKDAQAMIKAIGVNLLQFPRQ"
                            "SPGDLEQGVNLTPCTLNTVTQTSLSVRGDKLIAEIGDKVAASEN"
                        )
                    }
                }
            }
        },
        {
            "name": "Bovine Serum Albumin (BSA)",
            "sequence": (
                "MKWVTFISLLFLFSSAYSRGVFRRDTHKSEIAHRFKDLGEENFKALVLIAFAQYLQQCPFDEHVKLVNE"
                "GTKPVETVTKLVTDLTKVHTECCHGDLLECADDRADLAKYICDNQDTISSKLKECCDKPLLEKSHCIAE"
                "VFCKYKEHKEMPFPKCCETSLVNRRPCFSALTPDETYVPKAFDEKLFTFHADICTLPDTEKQIKKQTAL"
                "VELVKHKPKATKEQLKAVMDDFAAFVEKCCKADDKETCFAEEGKKLVAASQAALGL"
            ),
            "msa": {
                "uniref90": {
                    "a3m": {
                        "format": "a3m",
                        "alignment": (
                            ">seq1\nMKWVTFISLLFLFSSAYSRGVFRRDTHKSEIAHRFKDLGEENFKALVLIAFAQYLQQCPFDEHVKLVNE"
                            "GTKPVETVTKLVTDLTKVHTECCHGDLLECADDRADLAKYICDNQDTISSKLKECCDKPLLEKSHCIAE"
                            "VFCKYKEHKEMPFPKCCETSLVNRRPCFSALTPDETYVPKAFDEKLFTFHADICTLPDTEKQIKKQTAL"
                            "VELVKHKPKATKEQLKAVMDDFAAFVEKCCKADDKETCFAEEGKKLVAASQAALGL\n"
                            ">seq2\nMKWVTFISLLFLFSSAYSRGVFRRDTHKSEIAHRFKDLGEENFKALVLIAFAQYLQQCPFDEHVKLVNE"
                            "GTKPVETVTKLVTDLTKVHTECCHGDLLECADDRADLAKYICDNQDTISSKLKECCDKPLLEKSHCIAE"
                            "VFCKYKEHKEMPFPKCCETSLVNRRPCFSALTPDETYVPKAFDEKLFTFHADICTLPDTEKQIKKQTAL"
                            "VELVKHKPKATKEQLKAVMDDFAAFVEKCCKADDKETCFAEEGKKLVAASQAALGL"
                        )
                    }
                }
            }
        }
    ]

    responses = []

    # For each protein, submit the request and store the response
    for protein in proteins:
        data = {
            "polymers": [
                {
                    "id": "A",
                    "molecule_type": "protein",
                    "sequence": protein["sequence"],
                    "msa": protein["msa"]
                }
            ],
            "recycling_steps": 3,
            "sampling_steps": 50,
            "diffusion_samples": 1,
            "step_scale": 1.638,
            "output_format": "mmcif"
        }
        response = None  # Initialize response
        response_data = None  # Initialize response_data
        try:
            # Use the 'json' parameter instead of 'data'
            response = requests.post(url, json=data)
            
            # Attempt to parse the JSON response
            response_data = response.json()
            
            if response.ok:
                print(f"Structure prediction for {protein['name']} succeeded.")
            else:
                print(f"Structure prediction for {protein['name']} failed: {response.status_code} {response.text}")
            
        except requests.exceptions.RequestException as req_err:
            # Catch any request-related errors
            print(f"Structure prediction for {protein['name']} failed: {req_err}")
        except json.JSONDecodeError:
            # Catch JSON parsing errors
            print(f"Response from {protein['name']} could not be decoded as JSON.")
        # Store the response along with the protein name and sequence
        responses.append({
            "protein": protein["name"],
            "sequence": protein["sequence"],
            "response": response_data,
            "status_code": response.status_code if response else None,
            "text": response.text if response else None
        })

    # Print the responses
    for res in responses:
        print(f"Protein: {res['protein']}")
        print(f"Status Code: {res['status_code']}")
        if res['response']:
            print("Response Data:", json.dumps(res['response'], indent=2))
        else:
            print("No response data available.")
        print("-" * 40)

if __name__ == "__main__":
    main()