Template Processing#

OpenFold3 NIM supports template-guided structure prediction using pre-existing protein structures in CIF format. This page explains how templates are processed and used to guide predictions.

Overview#

Template-guided prediction allows you to leverage other structures (experimental or predicted) to improve the accuracy of your structure predictions. The OpenFold3 NIM uses a simplified CIF direct mode that automatically processes template structures without requiring pre-computed alignments.

How Template Processing Works#

When you provide structural templates, the NIM automatically performs the following steps:

Chain Extraction#

The system parses each CIF file and extracts all protein chains along with their amino acid sequences.

Sequence Alignment#

Each template chain is aligned to your query sequence using sequence alignment algorithms. This identifies which regions of the template correspond to your query protein.

Chain Scoring#

Each template chain is scored using the formula:

score = sequence_identity × coverage

The variables are:

  • sequence_identity: Percentage of identical residues between template and query

  • coverage: Fraction of the query sequence covered by the alignment

Template Selection#

For each CIF file:

  • If a chain_id is specified, that specific chain is validated against the minimum score threshold

  • If no chain_id is specified, the chain with the highest score is automatically selected

  • All chains must have scores above the minimum threshold (default: 0.1, configurable via CIF_DIRECT_MIN_SCORE)

  • If no chain meets the threshold, the template is excluded

Structure Guidance#

Selected templates are used during the diffusion process to guide the model toward structures similar to the provided templates.

Best Practices#

When to Use Templates#

Templates are most effective when:

  • You have structures of homologous proteins (>30% sequence identity)

  • Experimental structures are available for similar protein sequences

  • You want to constrain predictions to a specific conformation

  • High-accuracy predictions from other tools are available

Template Selection#

For best results:

  • High sequence identity: Use templates with >40% identity to your query

  • Good coverage: Templates should cover most of your query sequence

  • Structure quality: Prefer high-resolution experimental structures

  • Multiple templates: Provide 1-4 diverse templates when available

Multi-Chain CIF Files#

When using CIF files containing multiple chains:

  • By default, the system automatically selects the best matching chain based on sequence similarity

  • You can optionally specify which chain to use via the chain_id field in the template

  • Each template file contributes at most one chain

  • Provide multiple CIF files if you want templates from multiple chains

Template Requirements#

Format

  • Templates must be in CIF format (.cif files)

  • PDB format is not currently supported

Content

  • Template files must contain valid protein structure coordinates

  • Standard amino acid residues should be properly annotated

  • Chain identifiers should be present in the CIF file

Limitations

  • Templates are only supported for protein molecules

  • DNA, RNA, and ligand templates are not currently supported

  • Each protein molecule can have multiple templates

Configuration#

The template processing behavior can be customized using environment variables:

CIF_DIRECT_MIN_SCORE

Controls the minimum score threshold for chain selection (default: 0.1). Refer to Template Configuration Options for details.