Example Requests#
Using the OpenFold3 NIM#
OpenFold3 NIM predicts 3D structures of biomolecular complexes including proteins, DNA, RNA, and ligands. The NIM supports multiple prediction modes: protein-only structure prediction, protein-DNA/RNA complexes, protein-ligand interactions, and template-guided predictions.
Try It Out#
Here’s a simple example to get started with protein structure prediction. This script predicts the structure of a small protein and saves the result.
import requests
import json
# NIM endpoint
url = "http://localhost:8000/biology/openfold/openfold3/predict"
headers = {"Content-Type": "application/json"}
# Define protein sequence
protein_sequence = "MKTVRQERLKSIVR"
# Create minimal MSA (just the query sequence)
msa_content = f">query\n{protein_sequence}"
# Build the request
data = {
"inputs": [{
"input_id": "my_first_prediction",
"molecules": [{
"type": "protein",
"sequence": protein_sequence,
"msa": {
"main": {
"a3m": {
"alignment": msa_content,
"format": "a3m"
}
}
}
}],
"output_format": "pdb"
}]
}
# Submit prediction request
response = requests.post(url, json=data, headers=headers, timeout=300)
# Extract and save the predicted structure
if response.ok:
result = response.json()
structure = result['outputs'][0]['structures_with_scores'][0]['structure']
# Save to file
with open("predicted_structure.pdb", "w") as f:
f.write(structure)
print("✓ Prediction complete! Structure saved to predicted_structure.pdb")
else:
print(f"✗ Prediction failed: {response.status_code} - {response.text}")
Save this script as predict.py and run it:
python predict.py
Try Out Other Prediction Use Cases#
The following examples show how to customize the data field for different prediction scenarios. To try them, replace the data dictionary in the script above with these examples.
Protein-DNA Complex#
Use this example for modeling protein-DNA interactions. For double-stranded DNA (dsDNA), both complementary strands must be entered as separate molecules.
# Replace the 'data' variable with this
data = {
"inputs": [{
"input_id": "protein_dna_complex",
"molecules": [
{
"type": "protein",
"id": "A",
"sequence": "MKTVRQERLKSIVR",
"msa": {
"main": {
"a3m": {
"alignment": ">query\nMKTVRQERLKSIVR",
"format": "a3m"
}
}
}
},
{
"type": "dna",
"id": "B",
"sequence": "ATCGATCG"
},
{
"type": "dna",
"id": "C",
"sequence": "CGATCGAT" # Complementary strand
}
],
"output_format": "pdb"
}]
}
Protein-Ligand with SMILES#
For modeling protein-ligand interactions using SMILES notation:
# Replace the 'data' variable with this
data = {
"inputs": [{
"input_id": "protein_ligand",
"molecules": [
{
"type": "protein",
"sequence": "MKTVRQERLKSIVR",
"msa": {
"main": {
"a3m": {
"alignment": ">query\nMKTVRQERLKSIVR",
"format": "a3m"
}
}
}
},
{
"type": "ligand",
"smiles": "CC(=O)OC1=CC=CC=C1C(=O)O" # Aspirin
}
],
"output_format": "pdb"
}]
}
Protein-Ligand with CCD Code#
For modeling protein-ligand interactions using Chemical Component Dictionary codes:
# Replace the 'data' variable with this
data = {
"inputs": [{
"input_id": "protein_ligand_ccd",
"molecules": [
{
"type": "protein",
"sequence": "MKTVRQERLKSIVR",
"msa": {
"main": {
"a3m": {
"alignment": ">query\nMKTVRQERLKSIVR",
"format": "a3m"
}
}
}
},
{
"type": "ligand",
"ccd_codes": "ATP" # Adenosine triphosphate
}
],
"output_format": "pdb"
}]
}
Protein with Structural Templates#
For template-guided protein structure prediction using experimental or predicted structures:
Note
Structural templates can significantly improve prediction accuracy. For detailed information on template processing, selection strategies, and best practices, refer to Template Processing.```
# First, read a template structure from a CIF file
with open("template.cif", "r") as f:
template_cif_content = f.read()
# Replace the 'data' variable with this
data = {
"inputs": [{
"input_id": "template_guided_prediction",
"molecules": [{
"type": "protein",
"sequence": "MKTVRQERLKSIVR",
"msa": {
"main": {
"a3m": {
"alignment": ">query\nMKTVRQERLKSIVR",
"format": "a3m"
}
}
},
"structural_templates": [
{
"structure": template_cif_content,
"format": "cif",
"name": "template_1"
}
]
}],
"output_format": "pdb"
}]
}
Multi-Chain Protein Complex with MSAs#
For predicting complex biomolecular assemblies with multiple protein chains and detailed MSAs:
# Multi-chain protein sequences
protein1_sequence = "VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR"
protein2_sequence = "VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH"
# MSA for protein 1 with multiple homologs (a3m format)
protein1_msa_a3m = """>101
VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR
>UniRef100_A0A068F5F5
VLSAKDKTNIKTAWGKIGGHAAEYGAEALERMFVVYPTTKTYFPHFDVSHGSAQVKAHGKKVADALTNAVGHLDDLPGALSALSDLHAHKLRVDPVNFKLLSHCLLVTLANHIPADFTPAVHASLDKFLASVSTVLTSKYR
>UniRef100_A0A091II63
-LTQAEKAAVVAIWAKVAPQIDAIGAESLERLFFTYPQTKTYFPHFDLSHSSPQLRGHGSKVMNAIGEAVKNLDDLRGALVKLSELHAYILRVDPVNFKLLSHCILCSLAAHYPKDFTPEAHAAWDKFLSSVSSVLTEKYR
>UniRef100_UPI00162A5CD8
VLSPADKTNIKAAWDKVGGNVGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVGDALTNAVAHIDDLPGALSALSDLHAYKLRVDPVNFKLLSHCLLVTLASHLPSDFTPAVHASLDKFLASVSTVLTSKYR"""
# Paired MSA for protein 1 for modeling chain-chain interactions
protein1_paired_msa_a3m = """>101
VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR
>UniRef100_UPI001EFC6C5E
-LTDEQKRLIQKSYAEIDRQSSNFAAIFYDCLFAMAPLIRPMFK-----SERPVFEYHFNELISTAATKVFQFEEIKPRLVVLGRKH-RGYGVTPAQFDVVRSALMLSIQDCLRDACNPAIEQAWSSYYDEIAKVM-----
>UniRef100_UPI0018E7FE8A
-LTEIEKEAITSSFTLINHQEQQFASFFYDCLFDLAPLIKPMFKR-----DRKLIEEHFYMIFCAAVDNIHHLDTIRSTLLELGSRH-RNYGVKVSHFPIVKSALILAIQHELKGQSNTDIENAWSNYYDELAAII-----"""
# MSA for protein 2 with multiple homologs (a3m format)
protein2_msa_a3m = """>102
VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH
>UniRef100_UPI0018F66524
VHLSAEEKSAVNSLWGKVNVEEHGGEALARLLVVYPWTQRFFDSFGNLSSASAILGNPKVKAHGKKVLTSFGDAVKNLDNLKGTFAKLSELHCDKLHVDPENFRLLGNVLVVVLARHFGKDFTPEVQAAWQKLVAGVASALAHKYH
>UniRef100_UPI001CFCA915
VHFTAEEKSTITSLWGKVNVEETGGEALGRLLVVYPWTQRFFDSFGNLSSPSAILGNPKVKAHGKKVLTSLGDAVKNLDNLKGAFSKLSELHCDKLHVDPENFRLLGNVLIVVLAAHFGKEFTPEVQAAWQKLVTGVASALAHKYH
>UniRef100_A0A8C6RZS6
VNFTPEEKSLVTSLWSKVNVEEAGGEALGRLLVVYPWTQRFFDSFGNLSSASAIMGNPRVKAHGKKVLTSFGEAVKNMDNLKATFSKLSELHCDKLHVDPENFKLLGNVLVVVLASHFGKEFTPEVQAAWQKLVAGVANALSHKYH"""
# Paired MSA for protein 2 for modeling chain-chain interactions
protein2_paired_msa_a3m = """>102
VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH
>UniRef100_UPI001EFC6C5E
--LTDEQKRLIQKSYAEIDrqSSNFAAIFYDCLFAMAPLIRPMFKS-----------ERPVFEYHFNELISTAATKVFQFEEIKPRLVVLGRKH-RGYGVTPAQFDVVRSALMLSIQDCLRDACNPAIEQAWSSYYDEIAKVM-----
>UniRef100_UPI0018E7FE8A
--LTEIEKEAITSSFTLINHQEqqFASFFYDCLFDLAPLIKPMFKRDRKL-----------IEEHFYMIFCAAVDNIHHLDTIRSTLLELGSRH-RNYGVKVSHFPIVKSALILAIQHELKGQSNTDIENAWSNYYDELAAIILEG--"""
# Replace the 'data' variable with this
data = {
"request_id": "1a3n",
"inputs": [{
"input_id": "1a3n",
"molecules": [
{
"type": "protein",
"id": ["A", "C"],
"sequence": protein1_sequence,
"msa": {
"uniref": {
"a3m": {
"alignment": protein1_msa_a3m,
"format": "a3m"
}
}
},
"paired_msa": {
"paired": {
"a3m": {
"alignment": protein1_paired_msa_a3m,
"format": "a3m"
}
}
}
},
{
"type": "protein",
"id": ["B", "D"],
"sequence": protein2_sequence,
"msa": {
"uniref": {
"a3m": {
"alignment": protein2_msa_a3m,
"format": "a3m"
}
}
},
"paired_msa": {
"paired": {
"a3m": {
"alignment": protein2_paired_msa_a3m,
"format": "a3m"
}
}
}
}
],
"diffusion_samples": 1,
"output_format": "pdb"
}]
}
Understanding the API#
This section provides detailed information about the OpenFold3 NIM API structure, requirements, and validation rules.
Endpoint#
/biology/openfold/openfold3/predict: Predicts the 3D structure of a biomolecular complex from input sequences
Request Structure#
A complete request consists of:
Request-Level Fields:
request_id(optional): Identifier for the entire request (max 128 characters)inputs(required): List containing exactly one input specification
Input-Level Fields:
input_id(optional): Unique identifier for this structure prediction (max 128 characters, default: “input_id_0”)molecules(required): List of molecules to predict as a complex (minimum 1)diffusion_samples(optional): Number of independent structures to generate (1-5, default: 1)output_format(optional): Output format -"cif"(default) or"pdb"
Molecule Specification#
Each molecule in the molecules list requires:
Common Fields:
type(required): Must be"protein","dna","rna", or"ligand"id(optional): Chain identifier(s) - single string or list of strings (1-4 alphanumeric characters each)
For Proteins, DNA, RNA:
sequence(required): Amino acid or nucleotide sequence (1-4096 characters)Proteins: Standard single-letter amino acid codes
DNA: A, T, C, G
RNA: A, U, C, G
For Proteins (MSA required):
msa(conditional): Single-chain multiple sequence alignment (required unlesspaired_msaprovided)paired_msa(optional): Joint MSA for modeling chain-chain interactionsstructural_templates(optional): List of structural templates in CIF format
For RNA (MSA required):
msa(required): Single-chain multiple sequence alignment
For Ligands (one required):
ccd_codes(conditional): Chemical Component Dictionary code (e.g., “ATP”, “CL”)smiles(conditional): SMILES string (e.g., “CC(=O)OC1=CC=CC=C1C(=O)O”)
Note
For proteins, either msa or paired_msa is required. If you want to provide an MSA with 0 hits, include an MSA containing only the query sequence.
MSA Structure#
The msa and paired_msa fields use a nested dictionary structure:
"msa": {
"database_name": { # Arbitrary name (e.g., "uniref90", "main_db")
"format_name": { # Must be "a3m" or "csv" (lowercase)
"alignment": "...", # MSA content as string
"format": "csv" # Must match format_name
}
}
}
MSA Requirements:
First sequence must exactly match the input protein/RNA sequence (without gaps)
Format names must be lowercase:
"a3m"or"csv"Maximum 3 MSA databases per protein
Gaps represented by
-charactersA3M format: lowercase letters represent insertions relative to query
A3M Format Example:
>query
MKTVRQERLKSIVR
>hit1
MKTVRQERLKSIVR
>hit2
MKTVR-ERLKSIVR
CSV Format Example:
key,sequence
-1,MKTVRQERLKSIVR
-1,MKTVRQERLKSIVR
-1,MKTVR-ERLKSIVR
Structural Templates#
For template-guided predictions, each template requires:
structure(required): CIF file contents as stringformat(required): Must be"cif"name(optional): Template identifierchain_id(optional): Specific chain to use from multi-chain CIF (1-10 alphanumeric characters)
Preparing MSAs#
For production use, generate MSAs using sequence homology search tools:
Recommended Tools:
ColabFold - Fast MSA generation
Note: For multi-chain inputs with pairing, post-process the
pair.a3mfile by filtering null characters and splitting by chain
OpenFold3 MSA script - Includes colabfold mode with automatic chain splitting
HHBlits - Traditional MSA generation
Workflow:
Run MSA tools for each distinct protein/RNA sequence
Convert results to A3M or CSV format
Ensure first sequence matches your input sequence exactly
For multi-chain complexes, optionally generate paired MSAs
Response Structure#
The response contains:
request_id: Echo of the request IDoutputs: List with a single output object:input_id: Echo of the input IDstructures_with_scores: List of predicted structures (ranked, best first):structure: Predicted structure in CIF or PDB format (string)format: Output format ("cif"or"pdb")confidence_score: Sample ranking scorecomplex_plddt_score: Average pLDDT score for the complexcomplex_pde_score: Average PDE score for the complexptm_score: Predicted TM score for the complexiptm_score: Predicted TM score for interfaces
runtime_metrics(optional): Performance metrics
Field Reference Tables#
Request-Level Fields#
Field |
Required |
Type |
Description |
|---|---|---|---|
|
No |
string |
Identifier for the entire request (max 128 characters) |
|
Yes |
list |
Must contain exactly 1 input specification |
Input-Level Fields#
Field |
Required |
Type |
Default |
Description |
|---|---|---|---|---|
|
No |
string |
“input_id_0” |
Unique identifier for this structure prediction (max 128 characters) |
|
Yes |
list |
- |
List of molecules to predict (minimum 1) |
|
No |
integer |
1 |
Number of independent structures to generate (1-5) |
|
No |
string |
“cif” |
Output format: |
Molecule Fields#
Field |
Required |
Type |
Valid For |
Description |
|---|---|---|---|---|
|
Yes |
string |
All |
Must be |
|
Conditional |
string |
Protein/DNA/RNA |
Amino acid or nucleotide sequence (1-4096 characters) |
|
Conditional |
dict |
Protein/RNA |
Required for proteins (unless paired_msa provided) and RNA. Nested dict: database → format → AlignmentFileRecord |
|
No |
dict |
Protein |
Joint MSA for modeling chain-chain interactions |
|
No |
list |
Protein |
List of structural templates in CIF format to guide prediction |
|
Conditional |
string |
Ligand |
CCD code (1-5 uppercase letters/numbers). Mutually exclusive with |
|
Conditional |
string |
Ligand |
SMILES string. Mutually exclusive with |
|
No |
string or list |
All |
Chain identifier(s). 1-4 alphanumeric characters each |
AlignmentFileRecord Fields#
Field |
Required |
Type |
Description |
|---|---|---|---|
|
Yes |
string |
MSA content as a string |
|
Yes |
string |
Format type: |
|
No |
integer |
Ordering rank for concatenating alignments (default: -1) |
StructuralTemplate Fields#
Field |
Required |
Type |
Description |
|---|---|---|---|
|
Yes |
string |
The contents of the file containing the structural template, in CIF format |
|
Yes |
string |
Format type: must be |
|
No |
string |
Optional name to identify the template |
|
No |
string |
Optional chain ID to use from multi-chain CIF files. If not specified, the best matching chain is automatically selected. Supports CIF format (e.g., ‘A’, ‘A1’, ‘B2’). 1-10 alphanumeric characters |
Important Validation Rules#
Ensure you follow these rules when specifying field values:
Protein MSAs: The first sequence in the MSA must exactly match the input protein sequence (without gaps)
RNA MSAs: The first sequence in the MSA must exactly match the input RNA sequence (without gaps)
Chain IDs: Must be 1-4 alphanumeric characters. PDB output format only supports single-character IDs
MSA Format Names: Must be lowercase (
"a3m", not"A3M")Ligand Specification: Must provide either
ccd_codesorsmiles, not bothSequence Length: Maximum 4096 characters per molecule
Database Limit: Maximum 3 MSA databases per protein
Structural Templates: Only allowed for protein molecules and must be in CIF format