Template Processing in OpenFold2#

OpenFold2 NIM supports multiple approaches for incorporating structural templates into protein structure prediction. This document provides technical details about how different template input formats are processed to generate template features.

Overview#

Template processing in OpenFold2 follows three main approaches:

  • No template input: Template atomic coordinates are not used for structure prediction

  • Database-driven template processing: Uses HHR strings containing template search results to access a curated structure database

  • Direct template processing: Uses mmCIF format strings provided directly in the request

This document focuses on the technical differences between database-driven and direct template processing, which represent fundamentally different computational pipelines.

Database-Driven Template Processing#

This section helps you understand the core concepts of database-driven template processing.

Required Fields#

  • use_templates: true (required to enable template processing)

  • templates: HHR (HHSearch) format strings containing template search results

Processing Pipeline#

The data-driven template process uses the following pipeline workflow.

  1. Parse HHR strings: Extract template hits with alignment information and statistical confidence scores

  2. Database lookup: Use template identifiers from HHR to locate corresponding structural data in the pre-installed database

  3. Sequence alignment: Perform sophisticated alignment between the query sequence and template structures, including validation and realignment, when needed

  4. Feature integration: Combine structural data from the database with alignment and confidence information from the HHR results

Template Feature Composition#

The resulting template features integrate multiple data sources:

  • Structural information from the database (atomic coordinates, residue types)

  • HHR-derived alignment with quality validation

  • Statistical confidence from HHR search results

  • Robust error handling with fallback mechanisms

Use Cases#

  • Production environments with curated template databases, such as PDB70

  • Leveraging pre-computed template search results from HHSearch/HHblits

  • When statistical confidence and alignment quality are critical

Example Request Structure#

{
    "sequence": "GGSKENEISHHAKEIERLQKEIERHKQ...",
    "use_templates": true,
    "templates": {
        "pdb70": {
            "hhr": {
                "format": "hhr",
                "templates": "HHR_STRING_CONTENT_HERE"
            }
        }
    }
}

Direct Template Processing#

This section explains the fundamental ideas behind direct template processing.

Required Fields#

  • use_templates: true (required to enable template processing)

  • explicit_templates: mmCIF format strings containing structural template data

Processing Pipeline#

  1. Parse mmCIF strings: Extract structural data directly from the provided mmCIF content

  2. Simple chain selection: Automatically select chains with _atom_site.label_asym_id=A from the template structure for featurization

  3. Basic mapping: Create simple 1:1 residue mapping between query and template sequences

  4. Direct feature extraction: Generate template features using only the structural information

Template Feature Composition#

The resulting template features are structurally-focused:

  • Structural information: From mmCIF content (atomic coordinates, residue types)

  • No HHR alignment: Bypasses sophisticated sequence alignment algorithms

  • Fixed confidence: Template confidence set to maximum value

  • No realignment: Uses direct sequence mapping without validation

Use Cases#

  • Custom template structures not available in curated databases

  • User-provided experimental or computational structures

  • Rapid prototyping without database infrastructure dependencies

  • When structural geometry is more important than alignment statistics

Example Request Structure#

{
    "sequence": "GGSKENEISHHAKEIERLQKEIERHKQ...",
    "use_templates": true,
    "explicit_templates": [
        {
            "name": "my_custom_template",
            "format": "mmcif",
            "structure": "MMCIF_STRING_CONTENT_HERE"
        }
    ]
}

Key Technical Differences#

Aspect

Database-Driven

Direct Processing

Data Sources

HHR strings + pre-installed database

mmCIF strings only

Alignment Method

Sophisticated alignment with validation

Simple index mapping

Confidence Scoring

Statistical scores from HHSearch

Fixed maximum confidence

Sequence Validation

Quality thresholds with realignment

No validation

Dependencies

Pre-installed structure database

Self-contained

Error Handling

Robust with fallbacks for alignment failures

Basic mmCIF parsing errors

Processing Complexity

Multi-step with database integration

Direct single-step processing

Impact on Structure Prediction#

This section highlights the impact different processing types have on structure prediction.

Database-Driven Advantages#

  • Statistical confidence: HHR search scores enable model to weight template reliability appropriately

  • Alignment validation: Advanced alignment ensures sequence-structure consistency

  • Proven templates: Curated databases provide validated structural templates

  • Robust mappings: Sophisticated alignment handles sequence variations

Direct Processing Advantages#

  • Flexibility: Accepts any structural template, including novel or experimental structures

  • Speed: Bypasses database lookup and complex alignment algorithms

  • Custom structures: Incorporates user-generated models, NMR structures, or computational predictions

  • Simplicity: No external dependencies beyond mmCIF parsing

Performance Considerations#

  • Database-driven processing provides statistical confidence scoring and validated alignments from curated databases

  • Direct processing provides maximum flexibility and speed but uses simplified confidence weighting

Best Practices#

This section describes the best practices when using database-driven and direct processing.

When to Use Database-Driven Processing#

  • Production environments with established template databases, such as PDB70

  • When template search results are available from HHSearch/HHblits

  • When validated template alignments and statistical confidence scores are available

  • When statistical confidence scores are important for model confidence

When to Use Direct Processing#

  • Development and testing with experimental structures

  • Novel template structures not available in curated databases

  • Rapid prototyping without database infrastructure setup

  • Custom computational models or user-provided structures

Technical Implementation Notes#

Both processing approaches ultimately produce identical template feature data structures, ensuring full compatibility with the OpenFold2 model. The following are their critical differences:

  • Confidence weighting: Database processing provides statistical confidence scores from HHR search, while direct processing uses fixed maximum confidence

  • Alignment sophistication: Database processing uses advanced alignment with validation, while direct processing uses simple sequence mapping

  • Error resilience: Database processing has robust fallbacks for alignment failures, while direct processing has basic mmCIF parsing error handling

The choice between approaches should be based on whether statistical template confidence and sophisticated alignment are more important than flexibility and processing speed for your specific use case.