Template Processing in OpenFold2#
OpenFold2 NIM supports multiple approaches for incorporating structural templates into protein structure prediction. This document provides technical details about how different template input formats are processed to generate template features.
Overview#
Template processing in OpenFold2 follows three main approaches:
No template input: Template atomic coordinates are not used for structure prediction
Database-driven template processing: Uses HHR strings containing template search results to access a curated structure database
Direct template processing: Uses mmCIF format strings provided directly in the request
This document focuses on the technical differences between database-driven and direct template processing, which represent fundamentally different computational pipelines.
Database-Driven Template Processing#
This section helps you understand the core concepts of database-driven template processing.
Required Fields#
use_templates:
true
(required to enable template processing)templates: HHR (HHSearch) format strings containing template search results
Processing Pipeline#
The data-driven template process uses the following pipeline workflow.
Parse HHR strings: Extract template hits with alignment information and statistical confidence scores
Database lookup: Use template identifiers from HHR to locate corresponding structural data in the pre-installed database
Sequence alignment: Perform sophisticated alignment between the query sequence and template structures, including validation and realignment, when needed
Feature integration: Combine structural data from the database with alignment and confidence information from the HHR results
Template Feature Composition#
The resulting template features integrate multiple data sources:
Structural information from the database (atomic coordinates, residue types)
HHR-derived alignment with quality validation
Statistical confidence from HHR search results
Robust error handling with fallback mechanisms
Use Cases#
Production environments with curated template databases, such as PDB70
Leveraging pre-computed template search results from HHSearch/HHblits
When statistical confidence and alignment quality are critical
Example Request Structure#
{
"sequence": "GGSKENEISHHAKEIERLQKEIERHKQ...",
"use_templates": true,
"templates": {
"pdb70": {
"hhr": {
"format": "hhr",
"templates": "HHR_STRING_CONTENT_HERE"
}
}
}
}
Direct Template Processing#
This section explains the fundamental ideas behind direct template processing.
Required Fields#
use_templates:
true
(required to enable template processing)explicit_templates: mmCIF format strings containing structural template data
Processing Pipeline#
Parse mmCIF strings: Extract structural data directly from the provided mmCIF content
Simple chain selection: Automatically select chains with
_atom_site.label_asym_id=A
from the template structure for featurizationBasic mapping: Create simple 1:1 residue mapping between query and template sequences
Direct feature extraction: Generate template features using only the structural information
Template Feature Composition#
The resulting template features are structurally-focused:
Structural information: From mmCIF content (atomic coordinates, residue types)
No HHR alignment: Bypasses sophisticated sequence alignment algorithms
Fixed confidence: Template confidence set to maximum value
No realignment: Uses direct sequence mapping without validation
Use Cases#
Custom template structures not available in curated databases
User-provided experimental or computational structures
Rapid prototyping without database infrastructure dependencies
When structural geometry is more important than alignment statistics
Example Request Structure#
{
"sequence": "GGSKENEISHHAKEIERLQKEIERHKQ...",
"use_templates": true,
"explicit_templates": [
{
"name": "my_custom_template",
"format": "mmcif",
"structure": "MMCIF_STRING_CONTENT_HERE"
}
]
}
Key Technical Differences#
Aspect |
Database-Driven |
Direct Processing |
---|---|---|
Data Sources |
HHR strings + pre-installed database |
mmCIF strings only |
Alignment Method |
Sophisticated alignment with validation |
Simple index mapping |
Confidence Scoring |
Statistical scores from HHSearch |
Fixed maximum confidence |
Sequence Validation |
Quality thresholds with realignment |
No validation |
Dependencies |
Pre-installed structure database |
Self-contained |
Error Handling |
Robust with fallbacks for alignment failures |
Basic mmCIF parsing errors |
Processing Complexity |
Multi-step with database integration |
Direct single-step processing |
Impact on Structure Prediction#
This section highlights the impact different processing types have on structure prediction.
Database-Driven Advantages#
Statistical confidence: HHR search scores enable model to weight template reliability appropriately
Alignment validation: Advanced alignment ensures sequence-structure consistency
Proven templates: Curated databases provide validated structural templates
Robust mappings: Sophisticated alignment handles sequence variations
Direct Processing Advantages#
Flexibility: Accepts any structural template, including novel or experimental structures
Speed: Bypasses database lookup and complex alignment algorithms
Custom structures: Incorporates user-generated models, NMR structures, or computational predictions
Simplicity: No external dependencies beyond mmCIF parsing
Performance Considerations#
Database-driven processing provides statistical confidence scoring and validated alignments from curated databases
Direct processing provides maximum flexibility and speed but uses simplified confidence weighting
Best Practices#
This section describes the best practices when using database-driven and direct processing.
When to Use Database-Driven Processing#
Production environments with established template databases, such as PDB70
When template search results are available from HHSearch/HHblits
When validated template alignments and statistical confidence scores are available
When statistical confidence scores are important for model confidence
When to Use Direct Processing#
Development and testing with experimental structures
Novel template structures not available in curated databases
Rapid prototyping without database infrastructure setup
Custom computational models or user-provided structures
Technical Implementation Notes#
Both processing approaches ultimately produce identical template feature data structures, ensuring full compatibility with the OpenFold2 model. The following are their critical differences:
Confidence weighting: Database processing provides statistical confidence scores from HHR search, while direct processing uses fixed maximum confidence
Alignment sophistication: Database processing uses advanced alignment with validation, while direct processing uses simple sequence mapping
Error resilience: Database processing has robust fallbacks for alignment failures, while direct processing has basic mmCIF parsing error handling
The choice between approaches should be based on whether statistical template confidence and sophisticated alignment are more important than flexibility and processing speed for your specific use case.