Example Requests#

OpenFold3 NIM provides the following endpoint:

  • /biology/openfold/openfold3/predict: Predicts the 3D structure of a biomolecular complex from input sequences, including proteins, DNA, RNA, and ligands.

Using the OpenFold3 NIM#

This section provides real examples of requests that should run when the NIM is correctly configured.

Note: We recommend interacting with this endpoint using the Python requests module or similar HTTP client libraries for proper handling of complex JSON structures.

Predict Structure#

The endpoint accepts requests where the data is formatted as specified in the OpenAPI Specification.

Request Structure#

A request consists of:

  • request_id (optional): Identifier for the entire request.

  • inputs (required): List containing a single input specification. There can be at most one item in the list.

Each input contains:

  • input_id (optional, recommended): Unique identifier for this structure prediction (for example, “my_protein_001”)

  • molecules (required): List of molecules to predict as a complex (minimum 1 molecule)

  • diffusion_samples (optional): Number of independent structures to generate (1-5, default: 1)

  • output_format (optional): Output format - "cif" (default) or "pdb"

Molecule Specification#

Each molecule in the molecules list has a required field:

  • type: Must be "protein", "dna", "rna", or "ligand"

Each molecule can have the following optional field:

  • id: Molecule / chain identifier(s) - single string or list of strings (1-4 alphanumeric characters each)

    • Use a list of distinct identifiers if the input is composed of multiple molecules with the same sequence, CCD code, or SMILES string.

The following field is available and required for proteins, DNA, and RNA:

  • sequence: The amino acid or nucleotide sequence (1-4096 characters)

The following field is available for protein and RNA sequences (see MSA Requirements):

  • msa: Single-chain multiple sequence alignment, refered to as main MSA in ColabFold

The following field is available for protein sequences (see MSA Requirements):

  • paired_msa: Single-chain MSA, where the rows have been re-ordered in concert with the rows in the MSAs for other chains

The following fields are available for Ligands: Only use either ccd-codes or smiles

  • ccd_codes: Chemical Component Dictionary code (for example, “ATP”, “CL”)

  • smiles: SMILES string representation (for example, “CC(=O)OC1=CC=CC=C1C(=O)O”)

MSA Structure#

The msa field is a nested dictionary with the following structure:

{
  "database_name": {
    "format": {
      "alignment": "alignment content as string",
      "format": "a3m" or "csv"
    }
  }
}

MSA Requirements:

  • For each protein sequence, either the msa or paired_msa field is required. If your intent is to provide an MSA with 0 hits, then provide either an msa or paired_msa input containing only the query sequence.

  • For each RNA sequence, the msa is required. If your intent is to provide an MSA with 0 hits, then provide an msa input containing only the query sequence.

  • The first sequence in either the msa or paired_msa must exactly match the input protein or RNA sequence

Response Structure#

The response contains:

  • request_id: Echo of the request ID

  • outputs: List with a single output object containing:

    • input_id: Echo of the input ID

    • structures_with_scores: List of predicted structures in ranked order (best first)

      • structure: The predicted structure in CIF or PDB format (as a string).

      • format: The format, cif or pdb.

      • confidence_score: The ‘sample_ranking_score’ i.e. confidence score.

      • complex_plddt_score: Average pLDDT score for the complex.

      • complex_pde_score: Average PDE score for the complex.

      • ptm_score: Predicted TM score for the complex.

      • iptm_score: Predicted TM score for the interfaces.

Additional Examples#

Protein-Only Structure Prediction#

For predicting a simple protein structure without ligands:

data = {
    "inputs": [
        {
            "input_id": "my_protein",
            "molecules": [
                {
                    "type": "protein",
                    "sequence": "MKTVRQERLKSIVR",
                    "msa": {
                        "uniref90": {
                            "a3m": {
                                "alignment": """>query
                                MKTVRQERLKSIVR
                                >hit1
                                MKTVRQERLKSIVR
                                >hit2
                                MKTVR-ERLKSIVR""",
                                "format": "a3m"
                            }
                        }
                    }
                }
            ]
        }
    ]
}

Protein-DNA Complex#

For modeling protein-DNA interactions, see below. For double-stranded DNA (dsDNA), both complementary strands must be entered as separate molecules.

data = {
    "inputs": [
        {
            "input_id": "protein_dna_complex",
            "molecules": [
                {
                    "type": "protein",
                    "id": "A",
                    "sequence": "MKTVRQERLKSIVR",
                    "msa": {
                        # ... MSA content ...
                    }
                },
                {
                    "type": "dna",
                    "id": "B",
                    "sequence": "ATCGATCG"
                },
                {
                    "type": "dna",
                    "id": "C",
                    "sequence": "TAGCTAGC"
                }
            ]
        }
    ]
}

Protein-Ligand with SMILES#

For modeling protein-ligand interactions using SMILES:

data = {
    "inputs": [
        {
            "input_id": "protein_ligand",
            "molecules": [
                {
                    "type": "protein",
                    "sequence": "MKTVRQERLKSIVR",
                    "msa": {
                        # ... MSA content ...
                    }
                },
                {
                    "type": "ligand",
                    "smiles": "CC(=O)OC1=CC=CC=C1C(=O)O"  # Aspirin
                }
            ]
        }
    ]
}

Field Reference#

The following sections describes the request-level, input-level, molecule-level, and AlignmentFileRecord fields, and the important validation rules.

Request-Level Fields#

Field

Required

Type

Description

request_id

No

string

Identifier for the entire request (max 128 characters)

inputs

Yes

list

Must contain exactly 1 input specification

Input-Level Fields#

Field

Required

Type

Default

Description

input_id

No

string

“input_id_0”

Unique identifier for this structure prediction (max 128 characters)

molecules

Yes

list

-

List of molecules to predict (minimum 1)

diffusion_samples

No

integer

1

Number of independent structures to generate (1-5)

output_format

No

string

“cif”

Output format: "cif" or "pdb"

Molecule Fields#

Field

Required

Type

Valid For

Description

type

Yes

string

All

Must be "protein", "dna", "rna", or "ligand"

sequence

Conditional

string

Protein/DNA/RNA

Amino acid or nucleotide sequence (1-4096 characters)

msa

Conditional

dict

Protein

Required for proteins. Nested dict: database → format → AlignmentFileRecord

paired_msa

No

dict

Protein/DNA/RNA

Joint MSA for modeling chain-chain interactions

ccd_codes

Conditional

string

Ligand

CCD code (1-5 uppercase letters/numbers). Mutually exclusive with smiles

smiles

Conditional

string

Ligand

SMILES string. Mutually exclusive with ccd_codes

id

No

string or list

All

Chain identifier(s). 1-4 alphanumeric characters each

AlignmentFileRecord Fields#

Field

Required

Type

Description

alignment

Yes

string

MSA content as a string

format

Yes

string

Format type: "a3m", "csv" (lowercase required)

rank

No

integer

Ordering rank for concatenating alignments (default: -1)

Important Validation Rules#

Ensure you follow the rules when indicating values for the fields below:

  • Protein MSAs: The first sequence in the MSA must exactly match the input protein sequence (without gaps)

  • Chain IDs: Must be 1-4 alphanumeric characters. PDB output format only supports single-character IDs

  • MSA Format Names: Must be lowercase ("a3m", not "A3M")

  • Ligand Specification: Must provide either ccd_codes or smiles, not both

  • Sequence Length: Maximum 4096 characters per molecule

  • Database Limit: Maximum 3 MSA databases per protein

MSA Format Requirements#

For proteins, the MSA is required and must meet these criteria:

  • The first sequence in the MSA must exactly match the input protein sequence

  • Gaps are represented by - characters

  • In A3M format, lowercase letters represent insertions relative to the query sequence

  • Both A3M and CSV formats are supported

A3M Format Example:

>query
MKTVRQERLKSIVR
>hit1
MKTVRQERLKSIVR
>hit2
MKTVR-ERLKSIVR

CSV Format Example:

key,sequence
-1,MKTVRQERLKSIVR
-1,MKTVRQERLKSIVR
-1,MKTVR-ERLKSIVR