Example Requests#

Using the OpenFold3 NIM#

OpenFold3 NIM predicts 3D structures of biomolecular complexes including proteins, DNA, RNA, and ligands. The NIM supports multiple prediction modes: protein-only structure prediction, protein-DNA/RNA complexes, protein-ligand interactions, and template-guided predictions.

Try It Out#

Here’s a simple example to get started with protein structure prediction. This script predicts the structure of a small protein and saves the result.

import requests
import json

# NIM endpoint
url = "http://localhost:8000/biology/openfold/openfold3/predict"
headers = {"Content-Type": "application/json"}

# Define protein sequence
protein_sequence = "MKTVRQERLKSIVR"

# Create minimal MSA (just the query sequence)
msa_content = f">query\n{protein_sequence}"

# Build the request
data = {
    "inputs": [{
        "input_id": "my_first_prediction",
        "molecules": [{
            "type": "protein",
            "sequence": protein_sequence,
            "msa": {
                "main": {
                    "a3m": {
                        "alignment": msa_content,
                        "format": "a3m"
                    }
                }
            }
        }],
        "output_format": "pdb"
    }]
}

# Submit prediction request
response = requests.post(url, json=data, headers=headers, timeout=300)

# Extract and save the predicted structure
if response.ok:
    result = response.json()
    structure = result['outputs'][0]['structures_with_scores'][0]['structure']
    
    # Save to file
    with open("predicted_structure.pdb", "w") as f:
        f.write(structure)
    
    print("✓ Prediction complete! Structure saved to predicted_structure.pdb")
else:
    print(f"✗ Prediction failed: {response.status_code} - {response.text}")

Save this script as predict.py and run it:

python predict.py

Try Out Other Prediction Use Cases#

The following examples show how to customize the data field for different prediction scenarios. To try them, replace the data dictionary in the script above with these examples.

Protein-DNA Complex#

Use this example for modeling protein-DNA interactions. For double-stranded DNA (dsDNA), both complementary strands must be entered as separate molecules.

# Replace the 'data' variable with this
data = {
    "inputs": [{
        "input_id": "protein_dna_complex",
        "molecules": [
            {
                "type": "protein",
                "id": "A",
                "sequence": "MKTVRQERLKSIVR",
                "msa": {
                    "main": {
                        "a3m": {
                            "alignment": ">query\nMKTVRQERLKSIVR",
                            "format": "a3m"
                        }
                    }
                }
            },
            {
                "type": "dna",
                "id": "B",
                "sequence": "ATCGATCG"
            },
            {
                "type": "dna",
                "id": "C",
                "sequence": "CGATCGAT"  # Complementary strand
            }
        ],
        "output_format": "pdb"
    }]
}

Protein-Ligand with SMILES#

For modeling protein-ligand interactions using SMILES notation:

# Replace the 'data' variable with this
data = {
    "inputs": [{
        "input_id": "protein_ligand",
        "molecules": [
            {
                "type": "protein",
                "sequence": "MKTVRQERLKSIVR",
                "msa": {
                    "main": {
                        "a3m": {
                            "alignment": ">query\nMKTVRQERLKSIVR",
                            "format": "a3m"
                        }
                    }
                }
            },
            {
                "type": "ligand",
                "smiles": "CC(=O)OC1=CC=CC=C1C(=O)O"  # Aspirin
            }
        ],
        "output_format": "pdb"
    }]
}

Protein-Ligand with CCD Code#

For modeling protein-ligand interactions using Chemical Component Dictionary codes:

# Replace the 'data' variable with this
data = {
    "inputs": [{
        "input_id": "protein_ligand_ccd",
        "molecules": [
            {
                "type": "protein",
                "sequence": "MKTVRQERLKSIVR",
                "msa": {
                    "main": {
                        "a3m": {
                            "alignment": ">query\nMKTVRQERLKSIVR",
                            "format": "a3m"
                        }
                    }
                }
            },
            {
                "type": "ligand",
                "ccd_codes": "ATP"  # Adenosine triphosphate
            }
        ],
        "output_format": "pdb"
    }]
}

Protein with Structural Templates#

For template-guided protein structure prediction using experimental or predicted structures:

Note

Structural templates can significantly improve prediction accuracy. For detailed information on template processing, selection strategies, and best practices, refer to Template Processing.```

# First, read a template structure from a CIF file
with open("template.cif", "r") as f:
    template_cif_content = f.read()

# Replace the 'data' variable with this
data = {
    "inputs": [{
        "input_id": "template_guided_prediction",
        "molecules": [{
            "type": "protein",
            "sequence": "MKTVRQERLKSIVR",
            "msa": {
                "main": {
                    "a3m": {
                        "alignment": ">query\nMKTVRQERLKSIVR",
                        "format": "a3m"
                    }
                }
            },
            "structural_templates": [
                {
                    "structure": template_cif_content,
                    "format": "cif",
                    "name": "template_1"
                }
            ]
        }],
        "output_format": "pdb"
    }]
}

Multi-Chain Protein Complex with MSAs#

For predicting complex biomolecular assemblies with multiple protein chains and detailed MSAs:

# Multi-chain protein sequences
protein1_sequence = "VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR"

protein2_sequence = "VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH"

# MSA for protein 1 with multiple homologs (a3m format)
protein1_msa_a3m = """>101
VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR
>UniRef100_A0A068F5F5
VLSAKDKTNIKTAWGKIGGHAAEYGAEALERMFVVYPTTKTYFPHFDVSHGSAQVKAHGKKVADALTNAVGHLDDLPGALSALSDLHAHKLRVDPVNFKLLSHCLLVTLANHIPADFTPAVHASLDKFLASVSTVLTSKYR
>UniRef100_A0A091II63
-LTQAEKAAVVAIWAKVAPQIDAIGAESLERLFFTYPQTKTYFPHFDLSHSSPQLRGHGSKVMNAIGEAVKNLDDLRGALVKLSELHAYILRVDPVNFKLLSHCILCSLAAHYPKDFTPEAHAAWDKFLSSVSSVLTEKYR
>UniRef100_UPI00162A5CD8
VLSPADKTNIKAAWDKVGGNVGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVGDALTNAVAHIDDLPGALSALSDLHAYKLRVDPVNFKLLSHCLLVTLASHLPSDFTPAVHASLDKFLASVSTVLTSKYR"""

# Paired MSA for protein 1 for modeling chain-chain interactions
protein1_paired_msa_a3m = """>101
VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR
>UniRef100_UPI001EFC6C5E
-LTDEQKRLIQKSYAEIDRQSSNFAAIFYDCLFAMAPLIRPMFK-----SERPVFEYHFNELISTAATKVFQFEEIKPRLVVLGRKH-RGYGVTPAQFDVVRSALMLSIQDCLRDACNPAIEQAWSSYYDEIAKVM-----
>UniRef100_UPI0018E7FE8A
-LTEIEKEAITSSFTLINHQEQQFASFFYDCLFDLAPLIKPMFKR-----DRKLIEEHFYMIFCAAVDNIHHLDTIRSTLLELGSRH-RNYGVKVSHFPIVKSALILAIQHELKGQSNTDIENAWSNYYDELAAII-----"""

# MSA for protein 2 with multiple homologs (a3m format)
protein2_msa_a3m = """>102
VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH
>UniRef100_UPI0018F66524
VHLSAEEKSAVNSLWGKVNVEEHGGEALARLLVVYPWTQRFFDSFGNLSSASAILGNPKVKAHGKKVLTSFGDAVKNLDNLKGTFAKLSELHCDKLHVDPENFRLLGNVLVVVLARHFGKDFTPEVQAAWQKLVAGVASALAHKYH
>UniRef100_UPI001CFCA915
VHFTAEEKSTITSLWGKVNVEETGGEALGRLLVVYPWTQRFFDSFGNLSSPSAILGNPKVKAHGKKVLTSLGDAVKNLDNLKGAFSKLSELHCDKLHVDPENFRLLGNVLIVVLAAHFGKEFTPEVQAAWQKLVTGVASALAHKYH
>UniRef100_A0A8C6RZS6
VNFTPEEKSLVTSLWSKVNVEEAGGEALGRLLVVYPWTQRFFDSFGNLSSASAIMGNPRVKAHGKKVLTSFGEAVKNMDNLKATFSKLSELHCDKLHVDPENFKLLGNVLVVVLASHFGKEFTPEVQAAWQKLVAGVANALSHKYH"""

# Paired MSA for protein 2 for modeling chain-chain interactions
protein2_paired_msa_a3m = """>102
VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH
>UniRef100_UPI001EFC6C5E
--LTDEQKRLIQKSYAEIDrqSSNFAAIFYDCLFAMAPLIRPMFKS-----------ERPVFEYHFNELISTAATKVFQFEEIKPRLVVLGRKH-RGYGVTPAQFDVVRSALMLSIQDCLRDACNPAIEQAWSSYYDEIAKVM-----
>UniRef100_UPI0018E7FE8A
--LTEIEKEAITSSFTLINHQEqqFASFFYDCLFDLAPLIKPMFKRDRKL-----------IEEHFYMIFCAAVDNIHHLDTIRSTLLELGSRH-RNYGVKVSHFPIVKSALILAIQHELKGQSNTDIENAWSNYYDELAAIILEG--"""

# Replace the 'data' variable with this
data = {
    "request_id": "1a3n",
    "inputs": [{
        "input_id": "1a3n",
        "molecules": [
            {
                "type": "protein",
                "id": ["A", "C"],
                "sequence": protein1_sequence,
                "msa": {
                    "uniref": {
                        "a3m": {
                            "alignment": protein1_msa_a3m,
                            "format": "a3m"
                        }
                    }
                },
                "paired_msa": {
                    "paired": {
                        "a3m": {
                            "alignment": protein1_paired_msa_a3m,
                            "format": "a3m"
                        }
                    }
                }
            },
            {
                "type": "protein",
                "id": ["B", "D"],
                "sequence": protein2_sequence,
                "msa": {
                    "uniref": {
                        "a3m": {
                            "alignment": protein2_msa_a3m,
                            "format": "a3m"
                        }
                    }
                },
                "paired_msa": {
                    "paired": {
                        "a3m": {
                            "alignment": protein2_paired_msa_a3m,
                            "format": "a3m"
                        }
                    }
                }
            }
        ],
        "diffusion_samples": 1,
        "output_format": "pdb"
    }]
}

Understanding the API#

This section provides detailed information about the OpenFold3 NIM API structure, requirements, and validation rules.

Endpoint#

  • /biology/openfold/openfold3/predict: Predicts the 3D structure of a biomolecular complex from input sequences

Request Structure#

A complete request consists of:

Request-Level Fields:

  • request_id (optional): Identifier for the entire request (max 128 characters)

  • inputs (required): List containing exactly one input specification

Input-Level Fields:

  • input_id (optional): Unique identifier for this structure prediction (max 128 characters, default: “input_id_0”)

  • molecules (required): List of molecules to predict as a complex (minimum 1)

  • diffusion_samples (optional): Number of independent structures to generate (1-5, default: 1)

  • output_format (optional): Output format - "cif" (default) or "pdb"

Molecule Specification#

Each molecule in the molecules list requires:

Common Fields:

  • type (required): Must be "protein", "dna", "rna", or "ligand"

  • id (optional): Chain identifier(s) - single string or list of strings (1-4 alphanumeric characters each)

For Proteins, DNA, RNA:

  • sequence (required): Amino acid or nucleotide sequence (1-4096 characters)

    • Proteins: Standard single-letter amino acid codes

    • DNA: A, T, C, G

    • RNA: A, U, C, G

For Proteins (MSA required):

  • msa (conditional): Single-chain multiple sequence alignment (required unless paired_msa provided)

  • paired_msa (optional): Joint MSA for modeling chain-chain interactions

  • structural_templates (optional): List of structural templates in CIF format

For RNA (MSA required):

  • msa (required): Single-chain multiple sequence alignment

For Ligands (one required):

  • ccd_codes (conditional): Chemical Component Dictionary code (e.g., “ATP”, “CL”)

  • smiles (conditional): SMILES string (e.g., “CC(=O)OC1=CC=CC=C1C(=O)O”)

Note

For proteins, either msa or paired_msa is required. If you want to provide an MSA with 0 hits, include an MSA containing only the query sequence.

MSA Structure#

The msa and paired_msa fields use a nested dictionary structure:

"msa": {
    "database_name": {        # Arbitrary name (e.g., "uniref90", "main_db")
        "format_name": {      # Must be "a3m" or "csv" (lowercase)
            "alignment": "...",   # MSA content as string
            "format": "csv"       # Must match format_name
        }
    }
}

MSA Requirements:

  • First sequence must exactly match the input protein/RNA sequence (without gaps)

  • Format names must be lowercase: "a3m" or "csv"

  • Maximum 3 MSA databases per protein

  • Gaps represented by - characters

  • A3M format: lowercase letters represent insertions relative to query

A3M Format Example:

>query
MKTVRQERLKSIVR
>hit1
MKTVRQERLKSIVR
>hit2
MKTVR-ERLKSIVR

CSV Format Example:

key,sequence
-1,MKTVRQERLKSIVR
-1,MKTVRQERLKSIVR
-1,MKTVR-ERLKSIVR

Structural Templates#

For template-guided predictions, each template requires:

  • structure (required): CIF file contents as string

  • format (required): Must be "cif"

  • name (optional): Template identifier

  • chain_id (optional): Specific chain to use from multi-chain CIF (1-10 alphanumeric characters)

Preparing MSAs#

For production use, generate MSAs using sequence homology search tools:

Recommended Tools:

  • ColabFold - Fast MSA generation

    • Note: For multi-chain inputs with pairing, post-process the pair.a3m file by filtering null characters and splitting by chain

  • OpenFold3 MSA script - Includes colabfold mode with automatic chain splitting

  • HHBlits - Traditional MSA generation

Workflow:

  1. Run MSA tools for each distinct protein/RNA sequence

  2. Convert results to A3M or CSV format

  3. Ensure first sequence matches your input sequence exactly

  4. For multi-chain complexes, optionally generate paired MSAs

Response Structure#

The response contains:

  • request_id: Echo of the request ID

  • outputs: List with a single output object:

    • input_id: Echo of the input ID

    • structures_with_scores: List of predicted structures (ranked, best first):

      • structure: Predicted structure in CIF or PDB format (string)

      • format: Output format ("cif" or "pdb")

      • confidence_score: Sample ranking score

      • complex_plddt_score: Average pLDDT score for the complex

      • complex_pde_score: Average PDE score for the complex

      • ptm_score: Predicted TM score for the complex

      • iptm_score: Predicted TM score for interfaces

    • runtime_metrics (optional): Performance metrics

Field Reference Tables#

Request-Level Fields#

Field

Required

Type

Description

request_id

No

string

Identifier for the entire request (max 128 characters)

inputs

Yes

list

Must contain exactly 1 input specification

Input-Level Fields#

Field

Required

Type

Default

Description

input_id

No

string

“input_id_0”

Unique identifier for this structure prediction (max 128 characters)

molecules

Yes

list

-

List of molecules to predict (minimum 1)

diffusion_samples

No

integer

1

Number of independent structures to generate (1-5)

output_format

No

string

“cif”

Output format: "cif" or "pdb"

Molecule Fields#

Field

Required

Type

Valid For

Description

type

Yes

string

All

Must be "protein", "dna", "rna", or "ligand"

sequence

Conditional

string

Protein/DNA/RNA

Amino acid or nucleotide sequence (1-4096 characters)

msa

Conditional

dict

Protein/RNA

Required for proteins (unless paired_msa provided) and RNA. Nested dict: database → format → AlignmentFileRecord

paired_msa

No

dict

Protein

Joint MSA for modeling chain-chain interactions

structural_templates

No

list

Protein

List of structural templates in CIF format to guide prediction

ccd_codes

Conditional

string

Ligand

CCD code (1-5 uppercase letters/numbers). Mutually exclusive with smiles

smiles

Conditional

string

Ligand

SMILES string. Mutually exclusive with ccd_codes

id

No

string or list

All

Chain identifier(s). 1-4 alphanumeric characters each

AlignmentFileRecord Fields#

Field

Required

Type

Description

alignment

Yes

string

MSA content as a string

format

Yes

string

Format type: "a3m", "csv" (lowercase required)

rank

No

integer

Ordering rank for concatenating alignments (default: -1)

StructuralTemplate Fields#

Field

Required

Type

Description

structure

Yes

string

The contents of the file containing the structural template, in CIF format

format

Yes

string

Format type: must be "cif"

name

No

string

Optional name to identify the template

chain_id

No

string

Optional chain ID to use from multi-chain CIF files. If not specified, the best matching chain is automatically selected. Supports CIF format (e.g., ‘A’, ‘A1’, ‘B2’). 1-10 alphanumeric characters

Important Validation Rules#

Ensure you follow these rules when specifying field values:

  • Protein MSAs: The first sequence in the MSA must exactly match the input protein sequence (without gaps)

  • RNA MSAs: The first sequence in the MSA must exactly match the input RNA sequence (without gaps)

  • Chain IDs: Must be 1-4 alphanumeric characters. PDB output format only supports single-character IDs

  • MSA Format Names: Must be lowercase ("a3m", not "A3M")

  • Ligand Specification: Must provide either ccd_codes or smiles, not both

  • Sequence Length: Maximum 4096 characters per molecule

  • Database Limit: Maximum 3 MSA databases per protein

  • Structural Templates: Only allowed for protein molecules and must be in CIF format