Template Processing#
OpenFold3 NIM supports template-guided structure prediction using pre-existing protein structures in CIF format. This page explains how templates are processed and used to guide predictions.
Overview#
Template-guided prediction allows you to leverage other structures (experimental or predicted) to improve the accuracy of your structure predictions. The OpenFold3 NIM uses a simplified CIF direct mode that automatically processes template structures without requiring pre-computed alignments.
How Template Processing Works#
When you provide structural templates, the NIM automatically performs the following steps:
Chain Extraction#
The system parses each CIF file and extracts all protein chains along with their amino acid sequences.
Sequence Alignment#
Each template chain is aligned to your query sequence using sequence alignment algorithms. This identifies which regions of the template correspond to your query protein.
Chain Scoring#
Each template chain is scored using the formula:
score = sequence_identity × coverage
The variables are:
sequence_identity: Percentage of identical residues between template and query
coverage: Fraction of the query sequence covered by the alignment
Template Selection#
For each CIF file:
If a
chain_idis specified, that specific chain is validated against the minimum score thresholdIf no
chain_idis specified, the chain with the highest score is automatically selectedAll chains must have scores above the minimum threshold (default: 0.1, configurable via
CIF_DIRECT_MIN_SCORE)If no chain meets the threshold, the template is excluded
Structure Guidance#
Selected templates are used during the diffusion process to guide the model toward structures similar to the provided templates.
Best Practices#
When to Use Templates#
Templates are most effective when:
You have structures of homologous proteins (>30% sequence identity)
Experimental structures are available for similar protein sequences
You want to constrain predictions to a specific conformation
High-accuracy predictions from other tools are available
Template Selection#
For best results:
High sequence identity: Use templates with >40% identity to your query
Good coverage: Templates should cover most of your query sequence
Structure quality: Prefer high-resolution experimental structures
Multiple templates: Provide 1-4 diverse templates when available
Multi-Chain CIF Files#
When using CIF files containing multiple chains:
By default, the system automatically selects the best matching chain based on sequence similarity
You can optionally specify which chain to use via the
chain_idfield in the templateEach template file contributes at most one chain
Provide multiple CIF files if you want templates from multiple chains
Template Requirements#
Format
Templates must be in CIF format (
.ciffiles)PDB format is not currently supported
Content
Template files must contain valid protein structure coordinates
Standard amino acid residues should be properly annotated
Chain identifiers should be present in the CIF file
Limitations
Templates are only supported for protein molecules
DNA, RNA, and ligand templates are not currently supported
Each protein molecule can have multiple templates
Configuration#
The template processing behavior can be customized using environment variables:
CIF_DIRECT_MIN_SCORE
Controls the minimum score threshold for chain selection (default: 0.1). Refer to Template Configuration Options for details.