Optimization with Boltz2-NIM#
This section details the options available for optimizing the Boltz-2 NIM. Note that achieving optimal performance depends on many factors and may require settings unique to your individual deployment.
Note
For most users, the default settings of the NIM will provide good performance that balances the throughput, latency, resource utilization, and complexity of using the NIM. We recommend only changing these options after consulting with an expert about your specific use case and any unique performance requirements it may have.
Automatic Profile Selection#
The Boltz-2 NIM is designed to automatically select the most suitable profile for the detected hardware from the list of available profiles. By default, the NIM will attempt to use TensorRT-LLM and NVIDIA TensorFloat32 (TF32) for inference for maximum performance. If the attached GPU does not support TensorRT-LLM, the NIM will automatically fall back to PyTorch and issue a warning.
Selecting a Profile Manually#
While the NIM automatically selects a profile, you can manually override this selection to use a specific model configuration. This can be useful for ensuring reproducibility or forcing a particular performance characteristic. Use the NIM_MODEL_PROFILE
environment variable at startup to specify the desired profile.
# Example: Start the NIM with a specific profile
export NIM_MODEL_PROFILE=<profile_name>
docker run --rm --name boltz2 --runtime=nvidia \
-e NGC_API_KEY \
-e NIM_MODEL_PROFILE \
-v $LOCAL_NIM_CACHE:/opt/nim/.cache \
-p 8000:8000 \
nvcr.io/nim/mit/boltz2:1.0.0
Note
Refer to the Support Matrix for a list of available profiles and their characteristics.
Enabling or Disabling TensorRT and TensorFloat32#
For fine-grained control over performance, you can enable or disable specific acceleration features using the following environment variables.
NIM_BOLTZ_PAIRFORMER_BACKEND
#
Default:
trt
Values:
trt
,torch
Description: Sets the backend for the pairformer module.
trt
uses the TensorRT backend for maximum inference performance.torch
uses the native PyTorch backend, which serves as a fallback and may be useful for debugging.
NIM_BOLTZ_ENABLE_DIFFUSION_TF32
#
Default:
1
Values:
1
(enabled),0
(disabled)Description: Enables (
1
) or disables (0
) TensorFloat32 precision for the diffusion model. Enabling TF32 provides a significant performance boost on NVIDIA Ampere and newer GPUs with minimal impact on accuracy. Disabling it can provide more deterministic numerical results.
Usage Example#
The following example starts the NIM with TF32 disabled and the pairformer backend set to PyTorch.
# Example: Disable TF32 and use the PyTorch backend for the pairformer
export NIM_BOLTZ_ENABLE_DIFFUSION_TF32=0
export NIM_BOLTZ_PAIRFORMER_BACKEND=torch
docker run --rm --name boltz2 --runtime=nvidia \
-e NGC_API_KEY \
-e NIM_BOLTZ_ENABLE_DIFFUSION_TF32 \
-e NIM_BOLTZ_PAIRFORMER_BACKEND \
-v $LOCAL_NIM_CACHE:/opt/nim/.cache \
-p 8000:8000 \
nvcr.io/nim/mit/boltz2:1.0.0
Deploying the NIM on a multi-GPU System#
The Boltz-2 NIM is designed to run with one or more NVIDIA GPUs. When increasing the number of GPUs allocated to the NIM, it is recommended to also increase the allocated number of CPU cores and RAM. As a rule of thumb, for each additional GPU allocated, you should also allocate another additional 12 CPU cores and 32 GB of additional system RAM.
Adjusting Start-Time NIM Input Limits#
The Boltz-2 NIM can be configured at startup time using environment variables to control input limits and resource usage. These settings help prevent excessively large requests that could impact performance or cause out-of-memory errors.
Environment Variables#
NIM_MAX_POLYMER_INPUTS
#
Default:
12
Type: Integer
Description: Sets the maximum number of polymer chains (DNA, RNA, or protein) that can be included in a single prediction request.
NIM_MAX_LIGAND_INPUTS
#
Default:
20
Type: Integer
Description: Sets the maximum number of ligands that can be included in a single prediction request.
NIM_MAX_POLYMER_LENGTH
#
Default:
4096
Type: Integer
Description: Sets the maximum allowed length for individual polymer sequences (number of residues/nucleotides).
Usage#
Set these environment variables before starting the NIM to customize the input limits:
export NIM_MAX_POLYMER_INPUTS=8
export NIM_MAX_LIGAND_INPUTS=15
export NIM_MAX_POLYMER_LENGTH=2048
# Start the NIM with custom limits
docker run --rm --name boltz2-nim --runtime=nvidia \
-e NGC_API_KEY \
-e NIM_MAX_POLYMER_INPUTS \
-e NIM_MAX_LIGAND_INPUTS \
-e NIM_MAX_POLYMER_LENGTH \
-v $LOCAL_NIM_CACHE:/opt/nim/.cache \
-p 8000:8000 \
nvcr.io/nim/mit/boltz2:1.0.0
Note
Increasing these values may result in runtime instability of the NIM, especially with regards to memory usage. Note that these limits are applied at startup and cannot be changed without restarting the NIM. Choose values that balance your performance requirements with the computational resources available to your deployment.
Optimization Parameters#
The Boltz-2 NIM provides several parameters that can be tuned to optimize performance for your specific use case:
Recycling Steps#
The recycling_steps
parameter (range: 1-6, default: 3) controls the number of iterative refinement steps. Higher values generally improve accuracy but increase computation time.
Sampling Steps#
The sampling_steps
parameter (range: 10-1,000, default: 50) controls the number of diffusion sampling steps. More steps can improve quality but significantly increase runtime.
Diffusion Samples#
The diffusion_samples
parameter (range: 1-5, default: 1) controls how many independent structure predictions are generated. Multiple samples provide diversity but multiply the computational cost.
Step Scale#
The step_scale
parameter (range: 0.5-5.0, default: 1.638) affects the sampling temperature. Lower values increase diversity among samples, while higher values may improve convergence.
Note
These parameters offer a tradeoff between prediction quality and computational cost. For production workloads, consider starting with default values and adjusting based on your specific quality and latency requirements.
The following examples highlight how to use the various optimization parameters.
import requests
# Higher recycling_steps for improved accuracy
payload = {
"polymers": [
{
"id": "A",
"molecule_type": "protein",
"sequence": "YOUR_PROTEIN_SEQUENCE_HERE"
}
],
"recycling_steps": 5 # Default is 3. Higher values may improve accuracy.
}
response = requests.post(
"http://localhost:8000/biology/mit/boltz2/predict",
json=payload
)
response.raise_for_status()
print(response.json())
import requests
# Higher sampling_steps for improved quality
payload = {
"polymers": [
{
"id": "A",
"molecule_type": "protein",
"sequence": "YOUR_PROTEIN_SEQUENCE_HERE"
}
],
"sampling_steps": 100 # Default is 50. More steps can improve quality but increase runtime.
}
response = requests.post(
"http://localhost:8000/biology/mit/boltz2/predict",
json=payload
)
response.raise_for_status()
print(response.json())
import requests
# Generate multiple distinct structures
payload = {
"polymers": [
{
"id": "A",
"molecule_type": "protein",
"sequence": "YOUR_PROTEIN_SEQUENCE_HERE"
}
],
"diffusion_samples": 3 # Default is 1. Generates 3 candidate structures.
}
response = requests.post(
"http://localhost:8000/biology/mit/boltz2/predict",
json=payload
)
response.raise_for_status()
print(response.json())
import requests
# Lower step_scale for more diversity among samples
payload = {
"polymers": [
{
"id": "A",
"molecule_type": "protein",
"sequence": "YOUR_PROTEIN_SEQUENCE_HERE"
}
],
"step_scale": 1.5 # Default is 1.638. Lower values increase diversity.
}
response = requests.post(
"http://localhost:8000/biology/mit/boltz2/predict",
json=payload
)
response.raise_for_status()
print(response.json())
Querying the NIM Repeatedly#
In computational biology and drug discovery, it is common to need to analyze hundreds, thousands, or even millions of protein sequences to find candidates with the desired properties. This section details some useful patterns for users that want to analyze more than one input sequence.
Note
The Boltz-2 NIM has been optimized for a balance of throughput and latency, and the performance of repeated queries may not be in line with published benchmarks due to the complexity of scheduling such workloads in the NIM. Factors, such as the underlying hardware and software stack, number of concurrent users, and system load, may impact the latency and throughput of the NIM.
Running Repeated Queries Against the NIM Serially#
Repeated queries against the NIM can be submitted as subsequent requests. For example, requests from a list of sequences
can be submitted using a for
loop. Each request is blocking, which means a maximum of one request from a single
submitter will run at a time when calling the NIM in this manner. The following is an example of submitting multiple requests to the NIM:
Note
In the following examples, we use small multiple sequence alignments to demonstrate the usage of the API. Actual multiple sequence alignments may be much larger.
import requests
import json
def main():
url = "http://localhost:8000/biology/mit/boltz2/predict"
proteins = [
{
"name": "Green Fluorescent Protein (GFP)",
"sequence": (
"MSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTFCYGD"
"QIQEQYKGIPLDGDQVQAVNGHEFEIEGEGEGRPYEGTQTAQ"
),
"msa": {
"uniref90": {
"a3m": {
"format": "a3m",
"alignment": (
">seq1\nMSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTFCYGD"
"QIQEQYKGIPLDGDQVQAVNGHEFEIEGEGEGRPYEGTQTAQ\n"
">seq2\nMSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTFCYGD"
"QIQEQYKGIPLDGDQVQAVNGHEFEIEGEGEGRPYEGTQTAQ"
)
}
}
}
},
{
"name": "Tumor Protein p53",
"sequence": (
"MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGPDEAPRMPEA"
"APPVAPAPAAPTPAAPAPAPSWPLSSSVPSQKTYQGSYGFRLGFLHSGTAKSVTCTYSPALNKMFCQLA"
"KTCPVQLWVDSTPPPGTRVRAMAIYKKSQHMTEVVRRCPHHERCSDSDGLAPPQHLIRVEGNLRVEYLD"
"DPSKYLQW"
),
"msa": {
"uniref90": {
"a3m": {
"format": "a3m",
"alignment": (
">seq1\nMEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGPDEAPRMPEA"
"APPVAPAPAAPTPAAPAPAPSWPLSSSVPSQKTYQGSYGFRLGFLHSGTAKSVTCTYSPALNKMFCQLA"
"KTCPVQLWVDSTPPPGTRVRAMAIYKKSQHMTEVVRRCPHHERCSDSDGLAPPQHLIRVEGNLRVEYLD"
"DPSKYLQW\n"
">seq2\nMEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGPDEAPRMPEA"
"APPVAPAPAAPTPAAPAPAPSWPLSSSVPSQKTYQGSYGFRLGFLHSGTAKSVTCTYSPALNKMFCQLA"
"KTCPVQLWVDSTPPPGTRVRAMAIYKKSQHMTEVVRRCPHHERCSDSDGLAPPQHLIRVEGNLRVEYLD"
"DPSKYLQW"
)
}
}
}
},
{
"name": "Lactose Operon Repressor (LacI)",
"sequence": (
"MKPVTLYDVAEYAGVSYQTVSRVVNQASHVSAKTREKVEAAMAELNYIPNRVAQQLAGKQNLKDGDPTR"
"ADKKSIEYSASVSRQQSYSIKKNLIDQFEAQKPSLTGMSADSQIGQVTKDAQAMIKAIGVNLLQFPRQ"
"SPGDLEQGVNLTPCTLNTVTQTSLSVRGDKLIAEIGDKVAASEN"
),
"msa": {
"uniref90": {
"a3m": {
"format": "a3m",
"alignment": (
">seq1\nMKPVTLYDVAEYAGVSYQTVSRVVNQASHVSAKTREKVEAAMAELNYIPNRVAQQLAGKQNLKDGDPTR"
"ADKKSIEYSASVSRQQSYSIKKNLIDQFEAQKPSLTGMSADSQIGQVTKDAQAMIKAIGVNLLQFPRQ"
"SPGDLEQGVNLTPCTLNTVTQTSLSVRGDKLIAEIGDKVAASEN\n"
">seq2\nMKPVTLYDVAEYAGVSYQTVSRVVNQASHVSAKTREKVEAAMAELNYIPNRVAQQLAGKQNLKDGDPTR"
"ADKKSIEYSASVSRQQSYSIKKNLIDQFEAQKPSLTGMSADSQIGQVTKDAQAMIKAIGVNLLQFPRQ"
"SPGDLEQGVNLTPCTLNTVTQTSLSVRGDKLIAEIGDKVAASEN"
)
}
}
}
},
{
"name": "Bovine Serum Albumin (BSA)",
"sequence": (
"MKWVTFISLLFLFSSAYSRGVFRRDTHKSEIAHRFKDLGEENFKALVLIAFAQYLQQCPFDEHVKLVNE"
"GTKPVETVTKLVTDLTKVHTECCHGDLLECADDRADLAKYICDNQDTISSKLKECCDKPLLEKSHCIAE"
"VFCKYKEHKEMPFPKCCETSLVNRRPCFSALTPDETYVPKAFDEKLFTFHADICTLPDTEKQIKKQTAL"
"VELVKHKPKATKEQLKAVMDDFAAFVEKCCKADDKETCFAEEGKKLVAASQAALGL"
),
"msa": {
"uniref90": {
"a3m": {
"format": "a3m",
"alignment": (
">seq1\nMKWVTFISLLFLFSSAYSRGVFRRDTHKSEIAHRFKDLGEENFKALVLIAFAQYLQQCPFDEHVKLVNE"
"GTKPVETVTKLVTDLTKVHTECCHGDLLECADDRADLAKYICDNQDTISSKLKECCDKPLLEKSHCIAE"
"VFCKYKEHKEMPFPKCCETSLVNRRPCFSALTPDETYVPKAFDEKLFTFHADICTLPDTEKQIKKQTAL"
"VELVKHKPKATKEQLKAVMDDFAAFVEKCCKADDKETCFAEEGKKLVAASQAALGL\n"
">seq2\nMKWVTFISLLFLFSSAYSRGVFRRDTHKSEIAHRFKDLGEENFKALVLIAFAQYLQQCPFDEHVKLVNE"
"GTKPVETVTKLVTDLTKVHTECCHGDLLECADDRADLAKYICDNQDTISSKLKECCDKPLLEKSHCIAE"
"VFCKYKEHKEMPFPKCCETSLVNRRPCFSALTPDETYVPKAFDEKLFTFHADICTLPDTEKQIKKQTAL"
"VELVKHKPKATKEQLKAVMDDFAAFVEKCCKADDKETCFAEEGKKLVAASQAALGL"
)
}
}
}
}
]
responses = []
# For each protein, submit the request and store the response
for protein in proteins:
data = {
"polymers": [
{
"id": "A",
"molecule_type": "protein",
"sequence": protein["sequence"],
"msa": protein["msa"]
}
],
"recycling_steps": 3,
"sampling_steps": 50,
"diffusion_samples": 1,
"step_scale": 1.638,
"output_format": "mmcif"
}
response = None # Initialize response
response_data = None # Initialize response_data
try:
# Use the 'json' parameter instead of 'data'
response = requests.post(url, json=data)
# Attempt to parse the JSON response
response_data = response.json()
if response.ok:
print(f"Structure prediction for {protein['name']} succeeded.")
else:
print(f"Structure prediction for {protein['name']} failed: {response.status_code} {response.text}")
except requests.exceptions.RequestException as req_err:
# Catch any request-related errors
print(f"Structure prediction for {protein['name']} failed: {req_err}")
except json.JSONDecodeError:
# Catch JSON parsing errors
print(f"Response from {protein['name']} could not be decoded as JSON.")
# Store the response along with the protein name and sequence
responses.append({
"protein": protein["name"],
"sequence": protein["sequence"],
"response": response_data,
"status_code": response.status_code if response else None,
"text": response.text if response else None
})
# Print the responses
for res in responses:
print(f"Protein: {res['protein']}")
print(f"Status Code: {res['status_code']}")
if res['response']:
print("Response Data:", json.dumps(res['response'], indent=2))
else:
print("No response data available.")
print("-" * 40)
if __name__ == "__main__":
main()