Example Requests#

Using the OpenFold3 NIM#

OpenFold3 NIM predicts 3D structures of biomolecular complexes including proteins, DNA, RNA, and ligands. The NIM supports multiple prediction modes: protein-only structure prediction, protein-DNA/RNA complexes, protein-ligand interactions, and template-guided predictions.

Try It Out#

Here’s a simple example to get started with protein structure prediction. This script predicts the structure of a small protein and saves the result.

import requests
import json

# NIM endpoint
url = "http://localhost:8000/biology/openfold/openfold3/predict"
headers = {"Content-Type": "application/json"}

# Define protein sequence
protein_sequence = "MKTVRQERLKSIVR"

# Create minimal MSA (just the query sequence)
msa_content = f">query\n{protein_sequence}"

# Build the request
data = {
    "inputs": [{
        "input_id": "my_first_prediction",
        "molecules": [{
            "type": "protein",
            "sequence": protein_sequence,
            "msa": {
                "main": {
                    "a3m": {
                        "alignment": msa_content,
                        "format": "a3m"
                    }
                }
            }
        }],
        "output_format": "pdb"
    }]
}

# Submit prediction request
response = requests.post(url, json=data, headers=headers, timeout=300)

# Extract and save the predicted structure
if response.ok:
    result = response.json()
    structure = result['outputs'][0]['structures_with_scores'][0]['structure']
    
    # Save to file
    with open("predicted_structure.pdb", "w") as f:
        f.write(structure)
    
    print("✓ Prediction complete! Structure saved to predicted_structure.pdb")
else:
    print(f"✗ Prediction failed: {response.status_code} - {response.text}")

Save this script as predict.py and run it:

python predict.py

Try Out Other Prediction Use Cases#

The following examples show how to customize the data field for different prediction scenarios. To try them, replace the data dictionary in the script above with these examples.

Protein-DNA Complex#

Use this example for modeling protein-DNA interactions. For double-stranded DNA (dsDNA), both complementary strands must be entered as separate molecules.

# Replace the 'data' variable with this
data = {
    "inputs": [{
        "input_id": "protein_dna_complex",
        "molecules": [
            {
                "type": "protein",
                "id": "A",
                "sequence": "MKTVRQERLKSIVR",
                "msa": {
                    "main": {
                        "a3m": {
                            "alignment": ">query\nMKTVRQERLKSIVR",
                            "format": "a3m"
                        }
                    }
                }
            },
            {
                "type": "dna",
                "id": "B",
                "sequence": "ATCGATCG"
            },
            {
                "type": "dna",
                "id": "C",
                "sequence": "CGATCGAT"  # Complementary strand
            }
        ],
        "output_format": "pdb"
    }]
}

Protein-Ligand with SMILES#

For modeling protein-ligand interactions using SMILES notation:

# Replace the 'data' variable with this
data = {
    "inputs": [{
        "input_id": "protein_ligand",
        "molecules": [
            {
                "type": "protein",
                "sequence": "MKTVRQERLKSIVR",
                "msa": {
                    "main": {
                        "a3m": {
                            "alignment": ">query\nMKTVRQERLKSIVR",
                            "format": "a3m"
                        }
                    }
                }
            },
            {
                "type": "ligand",
                "smiles": "CC(=O)OC1=CC=CC=C1C(=O)O"  # Aspirin
            }
        ],
        "output_format": "pdb"
    }]
}

Protein-Ligand with CCD Code#

For modeling protein-ligand interactions using Chemical Component Dictionary codes:

# Replace the 'data' variable with this
data = {
    "inputs": [{
        "input_id": "protein_ligand_ccd",
        "molecules": [
            {
                "type": "protein",
                "sequence": "MKTVRQERLKSIVR",
                "msa": {
                    "main": {
                        "a3m": {
                            "alignment": ">query\nMKTVRQERLKSIVR",
                            "format": "a3m"
                        }
                    }
                }
            },
            {
                "type": "ligand",
                "ccd_codes": "ATP"  # Adenosine triphosphate
            }
        ],
        "output_format": "pdb"
    }]
}

Protein with Structural Templates#

For template-guided protein structure prediction using experimental or predicted structures:

Note

Structural templates can significantly improve prediction accuracy. For detailed information on template processing, selection strategies, and best practices, refer to Template Processing.```

# First, read a template structure from a CIF file
with open("template.cif", "r") as f:
    template_cif_content = f.read()

# Replace the 'data' variable with this
data = {
    "inputs": [{
        "input_id": "template_guided_prediction",
        "molecules": [{
            "type": "protein",
            "sequence": "MKTVRQERLKSIVR",
            "msa": {
                "main": {
                    "a3m": {
                        "alignment": ">query\nMKTVRQERLKSIVR",
                        "format": "a3m"
                    }
                }
            },
            "structural_templates": [
                {
                    "structure": template_cif_content,
                    "format": "cif",
                    "name": "template_1"
                }
            ]
        }],
        "output_format": "pdb"
    }]
}

Multi-Chain Protein Complex with MSAs#

For predicting complex biomolecular assemblies with multiple protein chains and detailed MSAs:

# Multi-chain protein sequences
protein1_sequence = "VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR"

protein2_sequence = "VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH"

# MSA for protein 1 with multiple homologs (a3m format)
protein1_msa_a3m = """>101
VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR
>UniRef100_A0A068F5F5
VLSAKDKTNIKTAWGKIGGHAAEYGAEALERMFVVYPTTKTYFPHFDVSHGSAQVKAHGKKVADALTNAVGHLDDLPGALSALSDLHAHKLRVDPVNFKLLSHCLLVTLANHIPADFTPAVHASLDKFLASVSTVLTSKYR
>UniRef100_A0A091II63
-LTQAEKAAVVAIWAKVAPQIDAIGAESLERLFFTYPQTKTYFPHFDLSHSSPQLRGHGSKVMNAIGEAVKNLDDLRGALVKLSELHAYILRVDPVNFKLLSHCILCSLAAHYPKDFTPEAHAAWDKFLSSVSSVLTEKYR
>UniRef100_UPI00162A5CD8
VLSPADKTNIKAAWDKVGGNVGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVGDALTNAVAHIDDLPGALSALSDLHAYKLRVDPVNFKLLSHCLLVTLASHLPSDFTPAVHASLDKFLASVSTVLTSKYR"""

# Paired MSA for protein 1 for modeling chain-chain interactions
protein1_paired_msa_a3m = """>101
VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR
>UniRef100_UPI001EFC6C5E
-LTDEQKRLIQKSYAEIDRQSSNFAAIFYDCLFAMAPLIRPMFK-----SERPVFEYHFNELISTAATKVFQFEEIKPRLVVLGRKH-RGYGVTPAQFDVVRSALMLSIQDCLRDACNPAIEQAWSSYYDEIAKVM-----
>UniRef100_UPI0018E7FE8A
-LTEIEKEAITSSFTLINHQEQQFASFFYDCLFDLAPLIKPMFKR-----DRKLIEEHFYMIFCAAVDNIHHLDTIRSTLLELGSRH-RNYGVKVSHFPIVKSALILAIQHELKGQSNTDIENAWSNYYDELAAII-----"""

# MSA for protein 2 with multiple homologs (a3m format)
protein2_msa_a3m = """>102
VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH
>UniRef100_UPI0018F66524
VHLSAEEKSAVNSLWGKVNVEEHGGEALARLLVVYPWTQRFFDSFGNLSSASAILGNPKVKAHGKKVLTSFGDAVKNLDNLKGTFAKLSELHCDKLHVDPENFRLLGNVLVVVLARHFGKDFTPEVQAAWQKLVAGVASALAHKYH
>UniRef100_UPI001CFCA915
VHFTAEEKSTITSLWGKVNVEETGGEALGRLLVVYPWTQRFFDSFGNLSSPSAILGNPKVKAHGKKVLTSLGDAVKNLDNLKGAFSKLSELHCDKLHVDPENFRLLGNVLIVVLAAHFGKEFTPEVQAAWQKLVTGVASALAHKYH
>UniRef100_A0A8C6RZS6
VNFTPEEKSLVTSLWSKVNVEEAGGEALGRLLVVYPWTQRFFDSFGNLSSASAIMGNPRVKAHGKKVLTSFGEAVKNMDNLKATFSKLSELHCDKLHVDPENFKLLGNVLVVVLASHFGKEFTPEVQAAWQKLVAGVANALSHKYH"""

# Paired MSA for protein 2 for modeling chain-chain interactions
protein2_paired_msa_a3m = """>102
VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH
>UniRef100_UPI001EFC6C5E
--LTDEQKRLIQKSYAEIDrqSSNFAAIFYDCLFAMAPLIRPMFKS-----------ERPVFEYHFNELISTAATKVFQFEEIKPRLVVLGRKH-RGYGVTPAQFDVVRSALMLSIQDCLRDACNPAIEQAWSSYYDEIAKVM-----
>UniRef100_UPI0018E7FE8A
--LTEIEKEAITSSFTLINHQEqqFASFFYDCLFDLAPLIKPMFKRDRKL-----------IEEHFYMIFCAAVDNIHHLDTIRSTLLELGSRH-RNYGVKVSHFPIVKSALILAIQHELKGQSNTDIENAWSNYYDELAAIILEG--"""

# Replace the 'data' variable with this
data = {
    "request_id": "1a3n",
    "inputs": [{
        "input_id": "1a3n",
        "molecules": [
            {
                "type": "protein",
                "id": ["A", "C"],
                "sequence": protein1_sequence,
                "msa": {
                    "uniref": {
                        "a3m": {
                            "alignment": protein1_msa_a3m,
                            "format": "a3m"
                        }
                    }
                },
                "paired_msa": {
                    "paired": {
                        "a3m": {
                            "alignment": protein1_paired_msa_a3m,
                            "format": "a3m"
                        }
                    }
                }
            },
            {
                "type": "protein",
                "id": ["B", "D"],
                "sequence": protein2_sequence,
                "msa": {
                    "uniref": {
                        "a3m": {
                            "alignment": protein2_msa_a3m,
                            "format": "a3m"
                        }
                    }
                },
                "paired_msa": {
                    "paired": {
                        "a3m": {
                            "alignment": protein2_paired_msa_a3m,
                            "format": "a3m"
                        }
                    }
                }
            }
        ],
        "diffusion_samples": 1,
        "output_format": "pdb"
    }]
}

Understanding the API#

This section provides detailed information about the OpenFold3 NIM API structure, requirements, and validation rules.

Endpoint#

/biology/openfold/openfold3/predict: Predicts the 3D structure of a biomolecular complex from input sequences

Request Structure#

A complete request consists of:

Request-Level Fields:

request_id (optional): Identifier for the entire request (max 128 characters)
inputs (required): List containing exactly one input specification

Input-Level Fields:

input_id (optional): Unique identifier for this structure prediction (max 128 characters, default: “input_id_0”)
molecules (required): List of molecules to predict as a complex (minimum 1)
diffusion_samples (optional): Number of independent structures to generate (1-5, default: 1)
output_format (optional): Output format - "cif" (default) or "pdb"

Molecule Specification#

Each molecule in the molecules list requires:

Common Fields:

type (required): Must be "protein", "dna", "rna", or "ligand"
id (optional): Chain identifier(s) - single string or list of strings (1-4 alphanumeric characters each)

For Proteins, DNA, RNA:

sequence (required): Amino acid or nucleotide sequence (1-4096 characters)
- Proteins: Standard single-letter amino acid codes
- DNA: A, T, C, G
- RNA: A, U, C, G

For Proteins (MSA required):

msa (conditional): Single-chain multiple sequence alignment (required unless paired_msa provided)
paired_msa (optional): Joint MSA for modeling chain-chain interactions
structural_templates (optional): List of structural templates in CIF format

For RNA (MSA required):

msa (required): Single-chain multiple sequence alignment

For Ligands (one required):

ccd_codes (conditional): Chemical Component Dictionary code (e.g., “ATP”, “CL”)
smiles (conditional): SMILES string (e.g., “CC(=O)OC1=CC=CC=C1C(=O)O”)

Note

For proteins, either msa or paired_msa is required. If you want to provide an MSA with 0 hits, include an MSA containing only the query sequence.

MSA Structure#

The msa and paired_msa fields use a nested dictionary structure:

"msa": {
    "database_name": {        # Arbitrary name (e.g., "uniref90", "main_db")
        "format_name": {      # Must be "a3m" or "csv" (lowercase)
            "alignment": "...",   # MSA content as string
            "format": "csv"       # Must match format_name
        }
    }
}

MSA Requirements:

First sequence must exactly match the input protein/RNA sequence (without gaps)
Format names must be lowercase: "a3m" or "csv"
Maximum 3 MSA databases per protein
Gaps represented by - characters
A3M format: lowercase letters represent insertions relative to query

A3M Format Example:

>query
MKTVRQERLKSIVR
>hit1
MKTVRQERLKSIVR
>hit2
MKTVR-ERLKSIVR

CSV Format Example:

key,sequence
-1,MKTVRQERLKSIVR
-1,MKTVRQERLKSIVR
-1,MKTVR-ERLKSIVR

Structural Templates#

For template-guided predictions, each template requires:

structure (required): CIF file contents as string
format (required): Must be "cif"
name (optional): Template identifier
chain_id (optional): Specific chain to use from multi-chain CIF (1-10 alphanumeric characters)

Preparing MSAs#

For production use, generate MSAs using sequence homology search tools:

Recommended Tools:

ColabFold - Fast MSA generation
- Note: For multi-chain inputs with pairing, post-process the pair.a3m file by filtering null characters and splitting by chain
OpenFold3 MSA script - Includes colabfold mode with automatic chain splitting
HHBlits - Traditional MSA generation

Workflow:

Run MSA tools for each distinct protein/RNA sequence
Convert results to A3M or CSV format
Ensure first sequence matches your input sequence exactly
For multi-chain complexes, optionally generate paired MSAs

Response Structure#

The response contains:

request_id: Echo of the request ID
outputs: List with a single output object:
- input_id: Echo of the input ID
- structures_with_scores: List of predicted structures (ranked, best first):
  - structure: Predicted structure in CIF or PDB format (string)
  - format: Output format ("cif" or "pdb")
  - confidence_score: Sample ranking score
  - complex_plddt_score: Average pLDDT score for the complex
  - complex_pde_score: Average PDE score for the complex
  - ptm_score: Predicted TM score for the complex
  - iptm_score: Predicted TM score for interfaces
- runtime_metrics (optional): Performance metrics

Field Reference Tables#

Request-Level Fields#

Field	Required	Type	Description
`request_id`	No	string	Identifier for the entire request (max 128 characters)
`inputs`	Yes	list	Must contain exactly 1 input specification

Input-Level Fields#

Field	Required	Type	Default	Description
`input_id`	No	string	“input_id_0”	Unique identifier for this structure prediction (max 128 characters)
`molecules`	Yes	list	-	List of molecules to predict (minimum 1)
`diffusion_samples`	No	integer	1	Number of independent structures to generate (1-5)
`output_format`	No	string	“cif”	Output format: `"cif"` or `"pdb"`

Molecule Fields#

Field	Required	Type	Valid For	Description
`type`	Yes	string	All	Must be `"protein"`, `"dna"`, `"rna"`, or `"ligand"`
`sequence`	Conditional	string	Protein/DNA/RNA	Amino acid or nucleotide sequence (1-4096 characters)
`msa`	Conditional	dict	Protein/RNA	Required for proteins (unless paired_msa provided) and RNA. Nested dict: database → format → AlignmentFileRecord
`paired_msa`	No	dict	Protein	Joint MSA for modeling chain-chain interactions
`structural_templates`	No	list	Protein	List of structural templates in CIF format to guide prediction
`ccd_codes`	Conditional	string	Ligand	CCD code (1-5 uppercase letters/numbers). Mutually exclusive with `smiles`
`smiles`	Conditional	string	Ligand	SMILES string. Mutually exclusive with `ccd_codes`
`id`	No	string or list	All	Chain identifier(s). 1-4 alphanumeric characters each

AlignmentFileRecord Fields#

Field	Required	Type	Description
`alignment`	Yes	string	MSA content as a string
`format`	Yes	string	Format type: `"a3m"`, `"csv"` (lowercase required)
`rank`	No	integer	Ordering rank for concatenating alignments (default: -1)

StructuralTemplate Fields#

Field	Required	Type	Description
`structure`	Yes	string	The contents of the file containing the structural template, in CIF format
`format`	Yes	string	Format type: must be `"cif"`
`name`	No	string	Optional name to identify the template
`chain_id`	No	string	Optional chain ID to use from multi-chain CIF files. If not specified, the best matching chain is automatically selected. Supports CIF format (e.g., ‘A’, ‘A1’, ‘B2’). 1-10 alphanumeric characters

Important Validation Rules#

Ensure you follow these rules when specifying field values:

Protein MSAs: The first sequence in the MSA must exactly match the input protein sequence (without gaps)
RNA MSAs: The first sequence in the MSA must exactly match the input RNA sequence (without gaps)
Chain IDs: Must be 1-4 alphanumeric characters. PDB output format only supports single-character IDs
MSA Format Names: Must be lowercase ("a3m", not "A3M")
Ligand Specification: Must provide either ccd_codes or smiles, not both
Sequence Length: Maximum 4096 characters per molecule
Database Limit: Maximum 3 MSA databases per protein
Structural Templates: Only allowed for protein molecules and must be in CIF format