Is this page helpful?

Template Processing#

OpenFold3 NIM supports template-guided structure prediction using pre-existing protein structures in CIF format. This page explains how templates are processed and used to guide predictions.

Overview#

Template-guided prediction allows you to leverage other structures (experimental or predicted) to improve the accuracy of your structure predictions. The OpenFold3 NIM uses a simplified CIF direct mode that automatically processes template structures without requiring pre-computed alignments.

How Template Processing Works#

When you provide structural templates, the NIM automatically performs the following steps:

Chain Extraction#

The system parses each CIF file and extracts all protein chains along with their amino acid sequences.

Sequence Alignment#

Each template chain is aligned to your query sequence using sequence alignment algorithms. This identifies which regions of the template correspond to your query protein.

Chain Scoring#

Each template chain is scored using the formula:

score = sequence_identity × coverage

The variables are:

sequence_identity: Percentage of identical residues between template and query
coverage: Fraction of the query sequence covered by the alignment

Template Selection#

For each CIF file:

If a chain_id is specified, that specific chain is validated against the minimum score threshold
If no chain_id is specified, the chain with the highest score is automatically selected
All chains must have scores above the minimum threshold (default: 0.1, configurable via CIF_DIRECT_MIN_SCORE)
If no chain meets the threshold, the template is excluded

Structure Guidance#

Selected templates are used during the diffusion process to guide the model toward structures similar to the provided templates.

Best Practices#

When to Use Templates#

Templates are most effective when:

You have structures of homologous proteins (>30% sequence identity)
Experimental structures are available for similar protein sequences
You want to constrain predictions to a specific conformation
High-accuracy predictions from other tools are available

Template Selection#

For best results:

High sequence identity: Use templates with >40% identity to your query
Good coverage: Templates should cover most of your query sequence
Structure quality: Prefer high-resolution experimental structures
Multiple templates: Provide 1-4 diverse templates when available

Multi-Chain CIF Files#

When using CIF files containing multiple chains:

By default, the system automatically selects the best matching chain based on sequence similarity
You can optionally specify which chain to use via the chain_id field in the template
Each template file contributes at most one chain
Provide multiple CIF files if you want templates from multiple chains

Template Requirements#

Format

Templates must be in CIF format (.cif files)
PDB format is not currently supported

Content

Template files must contain valid protein structure coordinates
Standard amino acid residues should be properly annotated
Chain identifiers should be present in the CIF file

Limitations

Templates are only supported for protein molecules
DNA, RNA, and ligand templates are not currently supported
Each protein molecule can have multiple templates

Configuration#

The template processing behavior can be customized using environment variables:

CIF_DIRECT_MIN_SCORE

Controls the minimum score threshold for chain selection (default: 0.1). Refer to Template Configuration Options for details.

Template Processing#

Overview#

How Template Processing Works#

Chain Extraction#

Sequence Alignment#

Chain Scoring#

Template Selection#

Structure Guidance#

Best Practices#

When to Use Templates#

Template Selection#

Multi-Chain CIF Files#

Template Requirements#

Configuration#

Related Documentation#