Deploy Using Docker#

This section describes how to deploy using Docker.

Note

This page assumes you have prepared your environment as detailed in the Getting Started section (see the sections on setting up Docker, NGC CLI, and NGC registry access).

  1. Pull the Boltz-2 NIM container with the following command.

docker pull nvcr.io/nim/mit/boltz2:1.0.0
  1. Run the NIM container with the following command.

export LOCAL_NIM_CACHE=~/.cache/nim
export NGC_API_KEY=<Your NGC API Key>

docker run --rm --name boltz2 --runtime=nvidia \
    -e NGC_API_KEY \
    -v $LOCAL_NIM_CACHE:/opt/nim/.cache \
    -p 8000:8000 \
    nvcr.io/nim/mit/boltz2:1.0.0

This command will start the NIM container and expose port 8000 for the user to interact with the NIM. It will pull the model to the cache at $LOCAL_NIM_CACHE on the local filesystem. Note: This download can take a very long time (2-10 hours on a 100+Mbps internet connection).

  1. Open a new terminal, leaving the current terminal open with the launched service.

  2. In the new terminal, wait until the health check end point returns 200 true before proceeding. This may take a couple of minutes. You can use the following command to query the health check.

curl -X 'GET' \
    'http://localhost:8000/v1/health/ready' \
    -H 'accept: application/json'

To check the NIM’s status using Python, use the requests module, after installation, using pip install requests:

import requests

url = "http://localhost:8000/v1/health/ready"  # Replace with the actual URL

headers = {
    "content-type": "application/json"
}
try:
    response = requests.get(url, headers=headers)

    # Check if the request was successful
    if response.ok:
        print("Request succeeded:", response.json())
    else:
        print("Request failed:", response.status_code, response.text)
except Exception as E:
    print("Request failed:", E)
  1. Run the inference to get a multiple sequence alignment for an amino acid sequence. The following is an example of using cURL in BASH:

#!/bin/bash

# Exit immediately if a command exits with a non-zero status.
set -e

# Define the data payload as JSON using a subshell to prevent read issues
data=$(cat <<EOF
{
    "polymers": [
        {
            "id": "A",
            "molecule_type": "protein", 
            "sequence": "MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG",
            "msa": {
                "uniref90": {
                    "a3m": {
                        "alignment": ">seq1\nMKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG\n>seq2\nMKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG",
                        "format": "a3m"
                    }
                }
            }
        }
    ]
}
EOF
)

# Make the POST request
echo "Making request to local NIM..."

# Perform the POST request with curl and save response to output.json
curl -s -X POST "http://localhost:8000/biology/mit/boltz2/predict" \
    -H "Content-Type: application/json" \
    -d "$data" > output.json

echo "Response saved to output.json"

The following is an example, in Python, that demonstrates some of the other arguments available to the NIM:

import requests
import json

url = "http://localhost:8000/biology/mit/boltz2/predict"

headers = {
    "content-type": "application/json"
}

data = {
    "polymers": [
        {
            "id": "A",
            "molecule_type": "protein", 
            "sequence": "MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG",
            "msa": {
                "uniref90": {
                    "a3m": {
                        "alignment": ">seq1\nMKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG\n>seq2\nMKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG",
                        "format": "a3m"
                    }
                }
            }
        }
    ]
}

response = requests.post(url, headers=headers, data=json.dumps(data))

# Check if the request was successful
if response.ok:
    print("Request succeeded:", response.json())
else:
    print("Request failed:", response.status_code, response.text)
  1. View the outputs. You can use the cat tool to print the outputs to the command line:

cat output.json

You can also view the output in a text viewer, or pretty-print it using Python’s built-in JSON module:

python -m json.tool output.json

Multiple Sequence Alignment Formatting#

The Boltz-2 NIM uses a specific JSON structure for its MSA inputs to enable compatibility with the MSA NIM and other tools. The NIM accepts MSA inputs in the following formats:

Format

Description

A3M

A3M format is a variant of FASTA format that uses lowercase letters to indicate insertions relative to the first sequence. The first sequence in the alignment should be the query sequence.

CSV

CSV format requires a header row with columns “key” and “sequence”. Each row should contain an integer key (which may be -1) and the corresponding sequence.

These formats can be generated using either the Boltz-2 python package or the ColabFold server (CSV), or by using an MSA search tool of your choice (A3M).

The NIM accepts MSAs in a dictionary structure that maps database names to MSA formats and their corresponding alignments. The structure is as follows:

{
    "polymers": [
        {
            "id": "A",
            "molecule_type": "protein",
            "sequence": "SEQUENCE",
            "msa": {
                "database_name": {           # e.g., "uniref90", "pdb70"
                    "format_name": {         # e.g., "a3m", "csv"
                        "format": "FORMAT",  # e.g., "a3m", "csv"
                        "alignment": "DATA"  # The actual alignment data
                    }
                }
            }
        }
    ]
}

Example with actual data:

{
    "polymers": [
        {
            "id": "A",
            "molecule_type": "protein",
            "sequence": "MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG",
            "msa": {
                "uniref90": {
                    "a3m": {
                        "format": "a3m",
                        "alignment": ">seq1\nMKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG\n>seq2\nMKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG"
                    }
                }
            }
        }
    ]
}

Note

For very large multiple sequence alignments, we recommend using functions like the following to read them directly from their formats on disk.

def load_a3m(filename: str, database_name: str = "uniref90") -> dict:
    """
    Load an A3M format multiple sequence alignment from a file.
    
    Args:
        filename: Path to the A3M file
        database_name: Name of the database to use as the outer key (default: "uniref90")
        
    Returns:
        Dictionary containing the MSA data in the format expected by the Boltz2 API
    """
    try:
        with open(filename, 'r') as f:
            # Read the file and remove any null bytes
            content = f.read().replace('\0', '')
            
        # Create the MSA structure
        msa_data = {
            database_name: {
                "a3m": {
                    "format": "a3m",
                    "alignment": content.strip()  # Keep original A3M content
                }
            }
        }
        return msa_data
    except Exception as e:
        raise Exception(f"Error loading A3M file {filename}: {str(e)}")

def load_csv(filename: str, database_name: str = "uniref90") -> dict:
    """
    Load a CSV format multiple sequence alignment from a file.
    
    Args:
        filename: Path to the CSV file
        database_name: Name of the database to use as the outer key (default: "uniref90")
        
    Returns:
        Dictionary containing the MSA data in the format expected by the Boltz2 API
    """
    try:
        with open(filename, 'r') as f:
            # Read the file and remove any null bytes
            content = f.read().replace('\0', '')
            
        # Parse CSV content
        csv_reader = csv.DictReader(content.splitlines())
        
        # Create the MSA structure with CSV format
        msa_data = {
            database_name: {
                "csv": {
                    "format": "csv",
                    "alignment": content.strip()  # Keep original CSV content
                }
            }
        }
        return msa_data
    except Exception as e:
        raise Exception(f"Error loading CSV file {filename}: {str(e)}")

# Example usage:
if __name__ == "__main__":
    import json
    import csv
    
    # Example 1: Load MSA from A3M file with default database name
    a3m_data = load_a3m("path/to/alignment.a3m")
    
    # Example 2: Load MSA from A3M file with custom database name
    a3m_data_custom = load_a3m("path/to/alignment.a3m", database_name="pdb70")
    
    # Example 3: Load MSA from CSV file with default database name
    csv_data = load_csv("path/to/alignment.csv")
    
    # Example 4: Load MSA from CSV file with custom database name
    csv_data_custom = load_csv("path/to/alignment.csv", database_name="custom_db")
    
    # Example 5: Create request payload with loaded MSA
    request_data = {
        "polymers": [
            {
                "id": "A",
                "molecule_type": "protein",
                "sequence": "MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG",
                "msa": a3m_data  # or csv_data, a3m_data_custom, csv_data_custom
            }
        ],
        "recycling_steps": 3,
        "sampling_steps": 50,
        "diffusion_samples": 1,
        "step_scale": 1.638,
        "output_format": "mmcif"
    }
    
    # Print the formatted request data
    print(json.dumps(request_data, indent=2))

Advanced Usage#

For Advanced Usage of the NIM, see the sections on Optimization and Performance.

For details about available endpoints, see the API Reference section.