Migrate from HHR-based Templates to Explicit mmCIF Templates#

Purposes of 2.0.0 Changes#

OpenFold2 version 2.0.0 removes support for an internal search for structural templates based on HHR-format input, but maintains support for explicitly provided mmCIF template structures. The functionality to search a database of structure files will be moved to the NVIDIA MSA-NIM, and supported in the MSA-NIM in future releases. This change consolidates the management of sequence and structure databases to the MSA-NIM. This change provides several benefits for the OpenFold2:

  • Simpler dependency set: No bundled PDB70 database required

  • Lower startup time & disk footprint: Reduced container size and faster initialization

  • Greater flexibility: Direct control over template structures without database limitations

Removed Features#

The following features have been removed in version 2.0.0:

  • All request fields named templates or sub-keys expecting HHSearch (HHR) data

  • Environment variables and NIM cache content for PDB70 are ignored

  • CLI flags or examples that mentioned use_templates=true with HHR format

  • Database-driven template processing pipeline

Step-by-Step Migration#

Follow the steps below to migrate the HHR-based template to explicit mmCIF templates.

1. Prepare Your Local mmCIF Template Structure#

This guide assumes you already have mmCIF template files available locally. If you need to convert existing PDB files to mmCIF format, you can use tools like BioPython:

# Using BioPython to convert PDB to mmCIF format
python -c "
from Bio.PDB import PDBParser, MMCIFIO
parser = PDBParser()
structure = parser.get_structure('protein', 'input.pdb')
io = MMCIFIO()
io.set_structure(structure)
io.save('output.cif')
"

2. Update Your JSON Request Format#

Replace the old JSON request format to the new format. Old format (≤1.2.0):

{
    "sequence": "GGSKENEISHHAKEIERLQKEIERHKQ...",
    "use_templates": true,
    "templates": {
        "pdb70": {
            "hhr": {
                "format": "hhr",
                "templates": "HHR_STRING_CONTENT_HERE"
            }
        }
    }
}

New format (2.0.0):

{
    "sequence": "GGSKENEISHHAKEIERLQKEIERHKQ...",
    "use_templates": true,
    "explicit_templates": [
        {
            "name": "2yz6_A",
            "format": "mmcif",  // or "mmcif.gz" for compressed
            "structure": "STRINGIFIED_MMCIF_CONTENT_HERE",
            "source": "pdb",
            "rank": -1  // Default value for OpenFold2 monomer use-case
        }
    ]
}

Note

Both mmcif (uncompressed) and mmcif.gz (gzip-compressed) formats are supported. Use a compressed format for large template files to reduce request payload size. For detailed information about supported formats, refer to the Template Processing.

3. Update Python Code Examples#

Update your Python code examples as follows: Complete Python example:

import requests
import json
import gzip

# Read local mmCIF file content
with open('path/to/your/template.cif', 'r') as f:
    mmcif_content = f.read()

# Optional: compress for large files
# compressed_mmcif = gzip.compress(mmcif_content.encode('utf-8'))

# Prepare request
data = {
    "sequence": "GGSKENEISHHAKEIERLQKEIERHKQ...",
    "input_id": "example_protein_001",
    "selected_models": [1, 2],
    "alignments": {
        "uniref90": {
            "a3m": {
                "alignment": "YOUR_A3M_ALIGNMENT_HERE",
                "format": "a3m"
            }
        }
    },
    "use_templates": True,
    "explicit_templates": [
        {
            "name": "my_template",
            "format": "mmcif",  # or "mmcif.gz" if using compressed_mmcif
            "structure": mmcif_content,  # or compressed_mmcif for compressed
            "source": "user_provided",
            "rank": -1  # Default value for OpenFold2 monomer use-case
        }
    ]
}

# Submit request
url = "http://localhost:8000/biology/openfold/openfold2/predict-structure-from-msa-and-template"
headers = {"Content-Type": "application/json"}
response = requests.post(url, data=json.dumps(data), headers=headers, timeout=300)

4. Clean Cached Data (Optional)#

Remove old PDB70 cache data that is no longer needed:

# Remove old template database cache
rm -rf ~/.cache/nim/openfold2/pdb70
rm -rf ~/.cache/nim/openfold2/templates

Common Issues and Solutions#

Symptom

Likely cause

Solution

“templates field not allowed”

Using old JSON schema

Update to explicit_templates format

Sequence and template don’t align

Chain mismatch in mmCIF

Ensure template has label_asym_id=A

Large mmCIF string causes errors

Request payload too large

Use mmcif.gz format or smaller templates

Template not being used

use_templates set to false

Set use_templates: true

Missing structural information

Incomplete mmCIF file

Validate mmCIF contains atomic coordinates

Performance Considerations#

Ensure you consider the following improvements and impact when migrating.

Expected Improvements in 2.0.0:#

  • Faster startup: No database loading required

  • Reduced memory usage: No PDB70 database in memory

  • Simplified confidence scoring: Fixed maximum confidence instead of HHR scores

  • Direct processing: Single-step template feature extraction

Accuracy Impact#

The following are ways 2.0.0 impacts accuracy:

  • Empirical tests show <1% RMSD change in most cases

  • pLDDT scores may vary slightly due to different confidence weighting

  • Overall structure quality remains comparable

Advanced Usage Patterns#

This section explores other usage patterns.

Multiple Templates#

"explicit_templates": [
    {
        "name": "template_1",
        "format": "mmcif",
        "structure": "MMCIF_CONTENT_1",
        "source": "user_provided",
        "rank": -1
    },
    {
        "name": "template_2", 
        "format": "mmcif",
        "structure": "MMCIF_CONTENT_2",
        "source": "user_provided",
        "rank": -1
    }
]

Programmatic Template Management#

def load_template_from_file(file_path, template_name):
    """Load a local mmCIF template file for OpenFold2"""
    with open(file_path, 'r') as f:
        mmcif_content = f.read()
    
    return {
        "name": template_name,
        "format": "mmcif", 
        "structure": mmcif_content,
        "source": "user_provided",
        "rank": -1
    }

# Usage
template = load_template_from_file("path/to/template.cif", "my_template")

Helpful Resources#

  • Refer to the updated Template Processing documentation for technical details

  • Review Example Requests for complete working examples

  • Contact NVIDIA support for migration assistance

Migration Checklist#

  • [ ] Identify all HHR-based template usage in your code

  • [ ] Ensure you have template structures in mmCIF format locally available

  • [ ] Update JSON request format to use explicit_templates

  • [ ] Test with a small example to verify functionality

  • [ ] Update any automated pipelines or scripts

  • [ ] Clean up old PDB70 cache directories

  • [ ] Validate that structure prediction quality meets your requirements