Migrate from HHR-based Templates to Explicit mmCIF Templates#
Purposes of 2.0.0 Changes#
OpenFold2 version 2.0.0 removes support for an internal search for structural templates based on HHR-format input, but maintains support for explicitly provided mmCIF template structures. The functionality to search a database of structure files will be moved to the NVIDIA MSA-NIM, and supported in the MSA-NIM in future releases. This change consolidates the management of sequence and structure databases to the MSA-NIM. This change provides several benefits for the OpenFold2:
Simpler dependency set: No bundled PDB70 database required
Lower startup time & disk footprint: Reduced container size and faster initialization
Greater flexibility: Direct control over template structures without database limitations
Removed Features#
The following features have been removed in version 2.0.0:
All request fields named
templates
or sub-keys expecting HHSearch (HHR) dataEnvironment variables and NIM cache content for PDB70 are ignored
CLI flags or examples that mentioned
use_templates=true
with HHR formatDatabase-driven template processing pipeline
Step-by-Step Migration#
Follow the steps below to migrate the HHR-based template to explicit mmCIF templates.
1. Prepare Your Local mmCIF Template Structure#
This guide assumes you already have mmCIF template files available locally. If you need to convert existing PDB files to mmCIF format, you can use tools like BioPython:
# Using BioPython to convert PDB to mmCIF format
python -c "
from Bio.PDB import PDBParser, MMCIFIO
parser = PDBParser()
structure = parser.get_structure('protein', 'input.pdb')
io = MMCIFIO()
io.set_structure(structure)
io.save('output.cif')
"
2. Update Your JSON Request Format#
Replace the old JSON request format to the new format. Old format (≤1.2.0):
{
"sequence": "GGSKENEISHHAKEIERLQKEIERHKQ...",
"use_templates": true,
"templates": {
"pdb70": {
"hhr": {
"format": "hhr",
"templates": "HHR_STRING_CONTENT_HERE"
}
}
}
}
New format (2.0.0):
{
"sequence": "GGSKENEISHHAKEIERLQKEIERHKQ...",
"use_templates": true,
"explicit_templates": [
{
"name": "2yz6_A",
"format": "mmcif", // or "mmcif.gz" for compressed
"structure": "STRINGIFIED_MMCIF_CONTENT_HERE",
"source": "pdb",
"rank": -1 // Default value for OpenFold2 monomer use-case
}
]
}
Note
Both mmcif
(uncompressed) and mmcif.gz
(gzip-compressed) formats are supported. Use a compressed format for large template files to reduce request payload size. For detailed information about supported formats, refer to the Template Processing.
3. Update Python Code Examples#
Update your Python code examples as follows: Complete Python example:
import requests
import json
import gzip
# Read local mmCIF file content
with open('path/to/your/template.cif', 'r') as f:
mmcif_content = f.read()
# Optional: compress for large files
# compressed_mmcif = gzip.compress(mmcif_content.encode('utf-8'))
# Prepare request
data = {
"sequence": "GGSKENEISHHAKEIERLQKEIERHKQ...",
"input_id": "example_protein_001",
"selected_models": [1, 2],
"alignments": {
"uniref90": {
"a3m": {
"alignment": "YOUR_A3M_ALIGNMENT_HERE",
"format": "a3m"
}
}
},
"use_templates": True,
"explicit_templates": [
{
"name": "my_template",
"format": "mmcif", # or "mmcif.gz" if using compressed_mmcif
"structure": mmcif_content, # or compressed_mmcif for compressed
"source": "user_provided",
"rank": -1 # Default value for OpenFold2 monomer use-case
}
]
}
# Submit request
url = "http://localhost:8000/biology/openfold/openfold2/predict-structure-from-msa-and-template"
headers = {"Content-Type": "application/json"}
response = requests.post(url, data=json.dumps(data), headers=headers, timeout=300)
4. Clean Cached Data (Optional)#
Remove old PDB70 cache data that is no longer needed:
# Remove old template database cache
rm -rf ~/.cache/nim/openfold2/pdb70
rm -rf ~/.cache/nim/openfold2/templates
Common Issues and Solutions#
Symptom |
Likely cause |
Solution |
---|---|---|
“templates field not allowed” |
Using old JSON schema |
Update to |
Sequence and template don’t align |
Chain mismatch in mmCIF |
Ensure template has |
Large mmCIF string causes errors |
Request payload too large |
Use |
Template not being used |
|
Set |
Missing structural information |
Incomplete mmCIF file |
Validate mmCIF contains atomic coordinates |
Performance Considerations#
Ensure you consider the following improvements and impact when migrating.
Expected Improvements in 2.0.0:#
Faster startup: No database loading required
Reduced memory usage: No PDB70 database in memory
Simplified confidence scoring: Fixed maximum confidence instead of HHR scores
Direct processing: Single-step template feature extraction
Accuracy Impact#
The following are ways 2.0.0 impacts accuracy:
Empirical tests show <1% RMSD change in most cases
pLDDT scores may vary slightly due to different confidence weighting
Overall structure quality remains comparable
Advanced Usage Patterns#
This section explores other usage patterns.
Multiple Templates#
"explicit_templates": [
{
"name": "template_1",
"format": "mmcif",
"structure": "MMCIF_CONTENT_1",
"source": "user_provided",
"rank": -1
},
{
"name": "template_2",
"format": "mmcif",
"structure": "MMCIF_CONTENT_2",
"source": "user_provided",
"rank": -1
}
]
Programmatic Template Management#
def load_template_from_file(file_path, template_name):
"""Load a local mmCIF template file for OpenFold2"""
with open(file_path, 'r') as f:
mmcif_content = f.read()
return {
"name": template_name,
"format": "mmcif",
"structure": mmcif_content,
"source": "user_provided",
"rank": -1
}
# Usage
template = load_template_from_file("path/to/template.cif", "my_template")
Helpful Resources#
Refer to the updated Template Processing documentation for technical details
Review Example Requests for complete working examples
Contact NVIDIA support for migration assistance
Migration Checklist#
[ ] Identify all HHR-based template usage in your code
[ ] Ensure you have template structures in mmCIF format locally available
[ ] Update JSON request format to use
explicit_templates
[ ] Test with a small example to verify functionality
[ ] Update any automated pipelines or scripts
[ ] Clean up old PDB70 cache directories
[ ] Validate that structure prediction quality meets your requirements