Release Notes#

Release 2.2.0#

Summary#

This release introduces a dedicated Structural Template Search endpoint for finding homologous protein structures, case-insensitive database name matching, along with bug fixes and documentation improvements.

New Features#

Structural Template Search#

A new Structural Template Search feature enables template search for protein structure prediction:

Template discovery: Search PDB70 and/or PDB100 and other structural databases to find proteins with similar structures to your query sequence
Structure retrieval: Automatically retrieve mmCIF files for the top template hits
Combined MSA output: Returns both structural templates and MSA alignments in a single request
Flexible database selection: Choose which structural databases to search and which sequence databases to use for MSA generation

This feature is essential for template-based protein structure prediction methods.

Case-Insensitive Database Names#

Database names are now matched case-insensitively across all endpoints. You can specify database names in any case (for example, uniref30_2302, Uniref30_2302, or UNIREF30_2302) and the API will find the correct database. The response preserves the exact case you specified in the request.

This change simplifies API usage by eliminating case-sensitivity errors and provides flexibility in how database names are specified.

Note on ["all"] database resolution: When using databases: ["all"] (or not specifying MSA databases), the response currently uses display names configured via NIM_MSA_DB_NAME_MAPPINGS (for example, Uniref30_2302, PDB70_220313) for backwards compatibility. In MSA 3.0, ["all"] will resolve to all-lowercase names (for example, uniref30_2302, pdb70_220313). If your code parses database names from responses, we recommend using case-insensitive comparisons to ensure forward compatibility.

Bug Fixes#

Consistent Profile Generation Database Selection#

Fixed an issue where the database used for profile generation depended on filesystem path ordering when using ["all"] for database selection. The fix ensures the configured default database for profile generation (uniref30_2302 by default) is always used when available, regardless of filesystem ordering.

Merged Alignment Now Includes All Requested Databases#

Fixed an issue where the merged “colabfold” alignment output only included results from expandable databases. Doing so omits results from non-expandable databases (like pdb70_220313) even when those were explicitly requested.

Previously, if a user requested databases: ["uniref30_2302", "pdb70_220313"], all databases were searched but only expandable database results were merged into the combined “colabfold” output. The fix ensures all requested database results are included in the merged alignment, better matching user expectations and providing a more intuitive result.

Documentation#

Added Task-Specific Profiles and Custom Databases guide covering database profiles for reduced storage, custom database configuration, and manually downloading individual databases from NGC.
Added Indexing Databases for GPU Server guide for users who need to index custom or third-party databases for GPU Server support.
Added Observability guide covering metrics and telemetry configuration. Telemetry is disabled by default and can be enabled through NIM_TELEMETRY_MODE.

Deprecations#

/biology/colabfold/msa-search/config/msa-database-configs endpoint: This endpoint is deprecated in favor of /v1/metadata and will be removed in MSA 3.0. Use /v1/metadata to retrieve information about available databases and model profiles.

Migration Notes#

No migration steps required.

Release 2.1.0#

Summary#

This release introduces Paired MSA Search for protein complexes (multimers), enabling accurate co-evolutionary analysis across multiple chains. Paired search is now a first-class feature with its own API endpoint.

Key Changes#

Paired MSA Search for Protein Complexes#

A new endpoint /biology/colabfold/msa-search/paired/predict enables paired MSA search for protein complexes:

Multi-chain input: Submit multiple protein sequences (one per chain) in a single request
Species-based pairing: Homologous sequences are paired by species across chains, preserving co-evolutionary signals essential for accurate complex structure prediction
Pairing strategies: Choose between “greedy” (maximize coverage) or “complete” (require all chains) pairing modes
Flexible output: Return per-chain alignments or raw combined output via the unpack parameter

This feature is essential for accurate structure prediction of protein complexes using tools like AlphaFold-Multimer, Boltz, and similar methods.

API Changes#

New endpoint: /biology/colabfold/msa-search/paired/predict for paired/multimer MSA search
New parameters for paired search:
- sequences: List of protein sequences (one per chain)
- pairing_strategy: Cross-chain pairing mode (“greedy” or “complete”)
- unpack: Control output format (per-chain or combined)
Output format: Paired search always returns A3M format

New Features and Improvements#

Paired MSA search for protein complexes with 2+ chains
Support for greedy and complete pairing strategies
Per-chain or combined alignment output options
Improved API documentation and examples

Release 2.0.0#

Summary#

This release represents a major update to the MSA Search NIM, featuring significant performance improvements and streamlined database management. The GPU Server is now enabled by default, providing better out-of-the-box performance with the included ColabFold databases. This release also includes an upgrade to MMSeqs2 version 18, enhanced stability, and expanded hardware support.

Key Changes#

GPU Server Enabled by Default#

The GPU Server is now enabled by default, improving performance for the default ColabFold databases. The previous release allows enabling the GPU Server required manual configuration and custom database indices created with the makepaddedseqdb command. This release includes the following features:

Reduced latency: The GPU Server intelligently stores and loads target sequence databases in GPU memory, reducing query latency
Simplified deployment: No additional configuration or separate database indexing steps needed to achieve optimal performance with default databases
Pre-indexed databases: The included ColabFold databases (Uniref30_2302, colabfold_envdb_202108, PDB70_220313) come pre-indexed and optimized for GPU Server use
Better resource utilization: Efficient GPU memory management enables faster search and alignment operations

Database Changes#

With the GPU Server now optimized and enabled by default, this release focuses on the ColabFold databases for optimal performance:

Default databases: ColabFold databases (Uniref30_2302, colabfold_envdb_202108, PDB70_220313) provide better performance and are recommended for most use cases
AlphaFold2 databases: The CPU-only AlphaFold2 databases (uniref90, small_bfd, mgnify) are not included for GPU Server operation. However, both alphafold2 (iterative) and colabfold (cascaded) search types remain fully supported when used with the provided Colabfold databases.
External database support: Users can still use custom or external databases by mounting them via the MODEL_PATH environment variable

MMSeqs2 Version Upgrade#

This release upgrades to MMSeqs2 version 18, bringing:

Enhanced performance optimizations
Better stability and error handling

Hardware Support#

Added support for NVIDIA B200 GPUs, expanding deployment options for the latest NVIDIA hardware.

New Features and Improvements#

GPU Server enabled by default for optimal performance with ColabFold databases
Upgraded to MMSeqs2 version 18 with improved accuracy and performance
Enhanced stability and error handling throughout the pipeline
Added NVIDIA B200 GPU support
Optimized database loading and GPU memory management

Performance Enhancements#

Reduced query latency through default GPU Server activation
Faster database loading with pre-indexed ColabFold databases
Better throughput scaling across multiple GPUs

Migration Notes#

Configuration Changes#

GPU Server: GPU Server is now enabled by default; to disable it, set NIM_DISABLE_GPU_SERVER=True
MSA Depth: With GPU Server enabled by default, the maximum MSA depth must be set globally for all requests through the NIM_GLOBAL_MAX_MSA_DEPTH environment variable at container startup for performance reasons. This parameter cannot be changed per-request. If you require a specific MSA depth, configure this environment variable accordingly.

Hardware Requirements#

48GB GPUs: For 48GB GPUs (L40S, RTX 6000 Ada), the minimum requirement is now 2 GPUs when using GPU Server (default). This is because GPU Server holds databases in memory for optimal performance.

For Existing Users#

No migration steps are required. The default ColabFold databases will automatically benefit from the enabled GPU Server, providing improved performance.

Note

The alphafold2 search type (iterative search) is still fully supported when using appropriate databases like the included ColabFold databases.

Release 1.0.0#

Summary#

This is the first release of MSA Search NIM. MSA Search enables accurate protein structure prediction from an input protein sequence by predicting potential structural similarities with previously-observed proteins.

This NIM is not a deep learning model but makes use of NVIDIA GPUs and software libraries for accelerated multiple sequence alignment. The NIM relies on GPU-accelerated MMSeqs2 to provide fast, accurate sequence database search and multiple sequence alignment.

Supported search_types: alphafold2 and colabfold. Note: Different search types require different database types.
Supported NVIDIA GPUs: At least 48GB of GPU Memory (A100, H100, and L40s). Currently there is no official support for GPUs with less than 48 GB of GPU Memory.

Model Variants#

ColabFold, which contains the databases uniref30_2302, colabfold_envdb_202108, and pdb70_220313.
AlphaFold, which contains the databases uniref90, mgnify, and smallbfd, as used in AlphaFold2.

Notes and Limitations#

Ensure you use this NIM with GPUs with at least 48 GB of VRAM. In addition, this NIM requires roughly 1.4 Terabytes (1400 Gigabytes) of fast NVMe SSD storage to store the databases.

Note: While there are many options for tuning this NIM’s performance, for most users the defaults will provide a balanced performance experience.