Example Requests#
OpenFold3 NIM provides the following endpoint:
/biology/openfold/openfold3/predict: Predicts the 3D structure of a biomolecular complex from input sequences, including proteins, DNA, RNA, and ligands.
Using the OpenFold3 NIM#
This section provides real examples of requests that should run when the NIM is correctly configured.
Note: We recommend interacting with this endpoint using the Python requests module or similar HTTP client libraries for proper handling of complex JSON structures.
Predict Structure#
The endpoint accepts requests where the data is formatted as specified in the OpenAPI Specification.
Request Structure#
A request consists of:
request_id(optional): Identifier for the entire request.inputs(required): List containing a single input specification. There can be at most one item in the list.
Each input contains:
input_id(optional, recommended): Unique identifier for this structure prediction (for example, “my_protein_001”)molecules(required): List of molecules to predict as a complex (minimum 1 molecule)diffusion_samples(optional): Number of independent structures to generate (1-5, default: 1)output_format(optional): Output format -"cif"(default) or"pdb"
Molecule Specification#
Each molecule in the molecules list has a required field:
type: Must be"protein","dna","rna", or"ligand"
Each molecule can have the following optional field:
id: Molecule / chain identifier(s) - single string or list of strings (1-4 alphanumeric characters each)Use a list of distinct identifiers if the input is composed of multiple molecules with the same sequence, CCD code, or SMILES string.
The following field is available and required for proteins, DNA, and RNA:
sequence: The amino acid or nucleotide sequence (1-4096 characters)
The following field is available for protein and RNA sequences (see MSA Requirements):
msa: Single-chain multiple sequence alignment, refered to asmain MSAin ColabFold
The following field is available for protein sequences (see MSA Requirements):
paired_msa: Single-chain MSA, where the rows have been re-ordered in concert with the rows in the MSAs for other chains
The following fields are available for Ligands: Only use either ccd-codes or smiles
ccd_codes: Chemical Component Dictionary code (for example, “ATP”, “CL”)smiles: SMILES string representation (for example, “CC(=O)OC1=CC=CC=C1C(=O)O”)
MSA Structure#
The msa field is a nested dictionary with the following structure:
{
  "database_name": {
    "format": {
      "alignment": "alignment content as string",
      "format": "a3m" or "csv"
    }
  }
}
MSA Requirements:
For each protein sequence, either the
msaorpaired_msafield is required. If your intent is to provide an MSA with 0 hits, then provide either anmsaorpaired_msainput containing only the query sequence.For each RNA sequence, the
msais required. If your intent is to provide an MSA with 0 hits, then provide anmsainput containing only the query sequence.The first sequence in either the
msaorpaired_msamust exactly match the input protein or RNA sequence
Recommended Workflow#
To predict the structure of a biomolecular complex:
Prepare the protein and RNA MSAs:
Run MSA tools for the distinct protein sequences in the complex, for example
- 
Note: If your input is composed of multiple non-identical protein sequences, and you select a mode that uses pairing, you will have to post-process the
pair.a3mfile before inputting to the OpenFold3 NIM request. First filter null characters, then split by protein chain.
 - 
Note: Has a colabfold mode, which for inputs with multiple non-identical protein sequences, writes the output of pairing separately for each chain.
 
- 
 Run MSA tools for the distinct RNA sequences in the complex
Convert results to A3M or CSV format
Ensure the first sequence in each MSA exactly matches your input sequence
Prepare the input sequences:
Proteins: Standard single-letter amino acid codes
DNA: Use A, T, C, G
RNA: Use A, U, C, G
Ligands: Find CCD codes from PDB or provide SMILES strings
Structure Your request:
Create a
moleculeslist with all components of your complexAdd MSAs to each protein molecule and to each RNA molecule
Optionally assign molecule/chain IDs
Set
diffusion_samples> 1 if you want multiple predictions
Submit the request and retrieve the structures:
POST the request to
/biology/openfold/openfold3/predictParse the response to extract predicted structures
Structures are returned in ranked order (best first)
import requests
import json
# --------------------------------
# Parameters
# --------------------------------
url = "http://localhost:8000/biology/openfold/openfold3/predict"
headers = {"Content-Type": "application/json"}
# Protein sequence
protein_sequence = (
    "LLYTRNDVSDSEKKATVELLNRQVIQFIDLSLITKQAHWNMRGANFIAVHEMLDGFRTALIDHLDTMAERAVQLGGVALGTTQVINSKTPLKSYPLDIHNVQDHLKELADRYAIVANDVRKAIGEAKDDDTADILTAASRDLDKFLWFIECNLDL"
)
# MSA in CSV format - first sequence must match the query sequence
protein_msa_csv = """key,sequence
-1,LLYTRNDVSDSEKKATVELLNRQVIQFIDLSLITKQAHWNMRGANFIAVHEMLDGFRTALIDHLDTMAERAVQLGGVALGTTQVINSKTPLKSYPLDIHNVQDHLKELADRYAIVANDVRKAIGEAKDDDTADILTAASRDLDKFLWFIECNLDL
-1,-FDTRHDLPAEVRSRMIALLNQQLADTFDLMSQTKQAHWNVKGPQFIALHEMFDEFAEGLAGYVDEIAERITALGGYAAGTVRMAAKASTLPDYPADVIADMDHVRALADRYAALAASTRKAIDEAGDLDTADLFTEVSRDLDKWLWFLEAH---
-1,LHPTRIDIPAEVRSQIAEILNQSLASTLDLKTQVKQAHWNVKGMDFYQLHELFDEMATELEEFIDLIAERITALGGVALGTARIAAERSTLPEYPIDILDGKSHVTALAERYAPYAKLVRDAIDSLGDADTADLYTEVSRAIDKRLWFLEAHL--
-1,-YPTRIDLPVEVRSQVVNLLNQTLAATLDLKTQSKQAHWNVKGMDFYQLHELFDELASELEEYVDMVAERVTALGGTALGTARIAAAESILPEYPLEAIDGADHVTALAERFAVYAKHLREAIDEAGDADTADLYTEISRTIDKRLWFLEAHL--
-1,LYPTRIDIPAEARKQIAGILNQTLAATSDLKSQAKQAHWNVKGTDFYQLHELFDEIAGELEEYIDMFAERITALGGYACGTVRMAAANSFLPEYPTDILMGMEHVTALAERFAPYAKQLREAIDDLGDADTADLYTEVSRTIDKRLWFLEAHL--
-1,MFRTKNDLSESIRGKAVELLNARLADAIDLQTQTKQAHWNIKGPNFIALHELFDKVNEDVEDYVDEIAERAVQLGGIAEGTARMAAKRSSLNEYPANTADGRSHVEALSSALAAFGKTARKAIDELGDADTADIFTEISRGIDKWLWFVEAHL--
-1,-FPTRVGIPAETRRRMINLLNQHLADAFDLYSQTKQAHWNVKGLQFIALHELFDKLAEDLEEAIDDMAERVTALGGTALGTVRIVAASSSIAEYPLDITDGPQHIEALAERFGRLAGMVRSAIDEAGDADTADLFTEVSRMLDKNLWFLEAH---
-1,-HKTRIDLAPEVREAMIELLNQQLADTFDLFSQTKQAHWNVKGPEFIALHGLFDDLADQLRDYVDKIAERATTLGGTAAGTVRMAASSTRLPEYPVEVFDGMAHVEALAERYAALAETTRAAIEEAGDISTADLFTEVSRGLDKALWLLEAHL--
-1,MYPTRNDLPESARIKLVELLNARLADAIDLQTQCKQAHWNVKGPDFIALHKLFDEVNDAVEEYVDLIAERAVQLGGVADGTARVAAKRSSLPEYPVRRGDGREHVEALSAVLSAFGKLVRAEIDELSDADTADLFTEVSRGVDKWLWFVEAHL--
-1,MRPTRIDLAPATREAMVELLNRQLADTLDLYTQTKQAHWNVTGPQFQQLHELFDELAGQLIGHLDLLAERATALGGAARGTLRMAAAVSRLPEMPAGFHDGLAVVRLLADRYAALAASTRAAIERAGDAATADLFTEISRALDKALWFLEAHL--"""
# Paired MSA showing protein-protein interactions
paired_msa_csv = """key,sequence
-1,LLYTRNDVSDSEKKATVELLNRQVIQFIDLSLITKQAHWNMRGANFIAVHEMLDGFRTALIDHLDTMAERAVQLGGVALGTTQVINSKTPLKSYPLDIHNVQDHLKELADRYAIVANDVRKAIGEAKDDDTADILTAASRDLDKFLWFIECNLDL
-1,-----IGIPTEKRTAIAEGLSRLLADTYTLYLKTHNYHWNVTGPMFQTLHTMFMTQYTELSLAVDEVAERIRALGHPAPGSYAAFARLSSIAE-EEAVPPSREMIANLVKGHEAVTRTARQVAESASDEPTCDLLTQRMQVSEKTAWMLRSLLD-
-1,-----IGIEQADREAIAAGLNQLLADTYSLYLKTHSFHWNVTGPMFNTLHLMFEGQYTELALAVDVIAERVRALGARALGSYSAYAKLTQISE-DNGVSSAKAMIQELLEGQEIVIRNARPLVQKADDEATADLLTQRIQLHEKTAWMLRSLLE-
-1,MYRSPSPLPEKTRASVVESLNARLADGLDLHSQIKVAHWNIKGPQFAALHPLFETFAVSLANHNDSIAERAVTLGGKAYGTTRHVGKASRLPEYPQETTKDLEHVKLLAERIEVYLDGLRESFVEVDDADSEDLATGIIVEFEKHAWFLRASLE-
-1,-FPSHVNLPTDAREELIDSLNTLLADAIDLHWQIKQAHWNIRGRHFYSRHELFDDLAKHVRKQADEFAERAGTLGGYAEGTIRLAAKNSELPEYDLKAVDGDDHLKALVDRFARYGASIRTGIDELNDPVTSDLLTQTLGEVELDLWFLESHL--
-1,LYKSPSPLSEQARTAIAATLNERLSDGLDLHSQIKVAHWNIKGPQFAALHPLFETFAVSLANHNDSIAERAVTLGGRAYGTSRHVGKNSRLPEYPQETSRDLEHVKLLAERIEVYLSGLREAVEGHKDTDTVDLFTGIITEFEKHAWFLRASLE-
-1,MYRSPSPLSEQVRAPLAASLNERLADGLDLHSQIKVAHWNIKGPQFAALHPLFETFAVSLSNHNDAIAERAVTLGGRAYGTSRHVGKASRIPEYQQETVKDLDHVKLLAERFDVYLAGLRESGEQHQDTDTVDLLTGAITEFEKHTWFLRATL--
-1,-FPSHINLPREARSELIDLLNTCLATAVDLHWQVKQAHWNIRGNHFISRHLLFDKVADHVRDHADEFAERAGALGGYAEGTIRLATKNSELEEYDLSAVNGDDHVRVIVDRVSRYAATIRDGIDELNDPVTADLLTQTLGTVEEDLWFLESHL--
-1,MYRSPSPLSEQTRSAVSATLNERLADGLDLHSQIKVAHWNIKGPQFAALHPLFETFAVSLAAHNDSVAERAVTLGGRAYGTSRHVAKTSRLPDYPQDTSKDLEHVRLLADRIESYLTGVRKVAEQHQDTDTVDLLTGIITEFEKHAWFLRASLE-
-1,MYRSPSPLPAEARSQIVETLNARLIDGLDLHSQIKVAHWNIKGPHFAALHPLFETFAVSLAEFNDAIAERAVTLGGQIRATARHVASTSTIPDYDQSATRDLDHARLLADRFQKYLEGLRSIVDRLGDVDTSDLLTGIIGTFEKHTWFLRSTVE-"""
# --------------------------------
# Assemble request content
# --------------------------------
data = {
    # Request-level fields
    "request_id": "5xgo",  # Optional: identifier for the entire request
    
    # Must contain exactly 1 input
    "inputs": [
        {
            # Input-level fields
            "input_id": "5xgo",  # Optional: identifier for this structure prediction
            
            # Molecules list - can contain multiple molecules that form a complex
            "molecules": [
                {
                    # Protein molecule
                    "type": "protein",
                    "id": ["A1", "A2"],  # This protein spans two chains
                    "sequence": protein_sequence,
                    
                    # MSA structure: {database_name: {format: {alignment, format}}}
                    "msa": {
                        "main_db": {  # Arbitrary database name
                            "csv": {  # Format name (must be lowercase)
                                "alignment": protein_msa_csv,  # MSA content as string
                                "format": "csv"  # Format specification
                            }
                        }
                    },
                    
                    # Paired MSA for modeling chain-chain interactions
                    "paired_msa": {
                        "paired_db": {  # Arbitrary database name
                            "csv": {
                                "alignment": paired_msa_csv,
                                "format": "csv"
                            }
                        }
                    }
                },
                {
                    # Ligand molecule using CCD code
                    "type": "ligand",
                    "id": ["M1", "M2"],  # Two instances of this ligand
                    "ccd_codes": "CL"  # Chloride ion
                },
                {
                    # Another ligand molecule
                    "type": "ligand",
                    "id": ["N1", "N2"],  # Two more instances
                    "ccd_codes": "CL"
                }
            ],
            
            # Optional parameters
            "diffusion_samples": 1,  # Number of structures to generate (1-5)
            "output_format": "cif"  # Output format: "cif" or "pdb"
        }
    ]
}
# --------------------------------
# Post to server
# --------------------------------
response = requests.post(
    url=url,
    data=json.dumps(data),
    headers=headers,
    timeout=600,  # Longer timeout for complex structures
)
# Check if the request was successful
if response.ok:
    result = response.json()
    print("Request succeeded!")
    print(f"Generated {len(result.get('outputs', []))} structure(s)")
else:
    print("Request failed:", response.status_code, response.text)
Response Structure#
The response contains:
request_id: Echo of the request IDoutputs: List with a single output object containing:input_id: Echo of the input IDstructures_with_scores: List of predicted structures in ranked order (best first)structure: The predicted structure in CIF or PDB format (as a string).format: The format,ciforpdb.confidence_score: The ‘sample_ranking_score’ i.e. confidence score.complex_plddt_score: Average pLDDT score for the complex.complex_pde_score: Average PDE score for the complex.ptm_score: Predicted TM score for the complex.iptm_score: Predicted TM score for the interfaces.
Additional Examples#
Protein-Only Structure Prediction#
For predicting a simple protein structure without ligands:
data = {
    "inputs": [
        {
            "input_id": "my_protein",
            "molecules": [
                {
                    "type": "protein",
                    "sequence": "MKTVRQERLKSIVR",
                    "msa": {
                        "uniref90": {
                            "a3m": {
                                "alignment": """>query
                                MKTVRQERLKSIVR
                                >hit1
                                MKTVRQERLKSIVR
                                >hit2
                                MKTVR-ERLKSIVR""",
                                "format": "a3m"
                            }
                        }
                    }
                }
            ]
        }
    ]
}
Protein-DNA Complex#
For modeling protein-DNA interactions, see below. For double-stranded DNA (dsDNA), both complementary strands must be entered as separate molecules.
data = {
    "inputs": [
        {
            "input_id": "protein_dna_complex",
            "molecules": [
                {
                    "type": "protein",
                    "id": "A",
                    "sequence": "MKTVRQERLKSIVR",
                    "msa": {
                        # ... MSA content ...
                    }
                },
                {
                    "type": "dna",
                    "id": "B",
                    "sequence": "ATCGATCG"
                },
                {
                    "type": "dna",
                    "id": "C",
                    "sequence": "TAGCTAGC"
                }
            ]
        }
    ]
}
Protein-Ligand with SMILES#
For modeling protein-ligand interactions using SMILES:
data = {
    "inputs": [
        {
            "input_id": "protein_ligand",
            "molecules": [
                {
                    "type": "protein",
                    "sequence": "MKTVRQERLKSIVR",
                    "msa": {
                        # ... MSA content ...
                    }
                },
                {
                    "type": "ligand",
                    "smiles": "CC(=O)OC1=CC=CC=C1C(=O)O"  # Aspirin
                }
            ]
        }
    ]
}
Field Reference#
The following sections describes the request-level, input-level, molecule-level, and AlignmentFileRecord fields, and the important validation rules.
Request-Level Fields#
Field  | 
Required  | 
Type  | 
Description  | 
|---|---|---|---|
  | 
No  | 
string  | 
Identifier for the entire request (max 128 characters)  | 
  | 
Yes  | 
list  | 
Must contain exactly 1 input specification  | 
Input-Level Fields#
Field  | 
Required  | 
Type  | 
Default  | 
Description  | 
|---|---|---|---|---|
  | 
No  | 
string  | 
“input_id_0”  | 
Unique identifier for this structure prediction (max 128 characters)  | 
  | 
Yes  | 
list  | 
-  | 
List of molecules to predict (minimum 1)  | 
  | 
No  | 
integer  | 
1  | 
Number of independent structures to generate (1-5)  | 
  | 
No  | 
string  | 
“cif”  | 
Output format:   | 
Molecule Fields#
Field  | 
Required  | 
Type  | 
Valid For  | 
Description  | 
|---|---|---|---|---|
  | 
Yes  | 
string  | 
All  | 
Must be   | 
  | 
Conditional  | 
string  | 
Protein/DNA/RNA  | 
Amino acid or nucleotide sequence (1-4096 characters)  | 
  | 
Conditional  | 
dict  | 
Protein  | 
Required for proteins. Nested dict: database → format → AlignmentFileRecord  | 
  | 
No  | 
dict  | 
Protein/DNA/RNA  | 
Joint MSA for modeling chain-chain interactions  | 
  | 
Conditional  | 
string  | 
Ligand  | 
CCD code (1-5 uppercase letters/numbers). Mutually exclusive with   | 
  | 
Conditional  | 
string  | 
Ligand  | 
SMILES string. Mutually exclusive with   | 
  | 
No  | 
string or list  | 
All  | 
Chain identifier(s). 1-4 alphanumeric characters each  | 
AlignmentFileRecord Fields#
Field  | 
Required  | 
Type  | 
Description  | 
|---|---|---|---|
  | 
Yes  | 
string  | 
MSA content as a string  | 
  | 
Yes  | 
string  | 
Format type:   | 
  | 
No  | 
integer  | 
Ordering rank for concatenating alignments (default: -1)  | 
Important Validation Rules#
Ensure you follow the rules when indicating values for the fields below:
Protein MSAs: The first sequence in the MSA must exactly match the input protein sequence (without gaps)
Chain IDs: Must be 1-4 alphanumeric characters. PDB output format only supports single-character IDs
MSA Format Names: Must be lowercase (
"a3m", not"A3M")Ligand Specification: Must provide either
ccd_codesorsmiles, not bothSequence Length: Maximum 4096 characters per molecule
Database Limit: Maximum 3 MSA databases per protein
MSA Format Requirements#
For proteins, the MSA is required and must meet these criteria:
The first sequence in the MSA must exactly match the input protein sequence
Gaps are represented by
-charactersIn A3M format, lowercase letters represent insertions relative to the query sequence
Both A3M and CSV formats are supported
A3M Format Example:
>query
MKTVRQERLKSIVR
>hit1
MKTVRQERLKSIVR
>hit2
MKTVR-ERLKSIVR
CSV Format Example:
key,sequence
-1,MKTVRQERLKSIVR
-1,MKTVRQERLKSIVR
-1,MKTVR-ERLKSIVR