Release Notes#

Release 2.1.0#

Summary#

This release introduces Paired MSA Search for protein complexes (multimers), enabling accurate co-evolutionary analysis across multiple chains. Paired search is now a first-class feature with its own API endpoint.

Key Changes#

Paired MSA Search for Protein Complexes#

A new endpoint /biology/colabfold/msa-search/paired/predict enables paired MSA search for protein complexes:

  • Multi-chain input: Submit multiple protein sequences (one per chain) in a single request

  • Species-based pairing: Homologous sequences are paired by species across chains, preserving co-evolutionary signals essential for accurate complex structure prediction

  • Pairing strategies: Choose between “greedy” (maximize coverage) or “complete” (require all chains) pairing modes

  • Flexible output: Return per-chain alignments or raw combined output via the unpack parameter

This feature is essential for accurate structure prediction of protein complexes using tools like AlphaFold-Multimer, Boltz, and similar methods.

API Changes#

  • New endpoint: /biology/colabfold/msa-search/paired/predict for paired/multimer MSA search

  • New parameters for paired search:

    • sequences: List of protein sequences (one per chain)

    • pairing_strategy: Cross-chain pairing mode (“greedy” or “complete”)

    • unpack: Control output format (per-chain or combined)

  • Output format: Paired search always returns A3M format

New Features and Improvements#

  • Paired MSA search for protein complexes with 2+ chains

  • Support for greedy and complete pairing strategies

  • Per-chain or combined alignment output options

  • Improved API documentation and examples


Release 2.0.0#

Summary#

This release represents a major update to the MSA Search NIM, featuring significant performance improvements and streamlined database management. The GPU Server is now enabled by default, providing better out-of-the-box performance with the included ColabFold databases. This release also includes an upgrade to MMSeqs2 version 18, enhanced stability, and expanded hardware support.

Key Changes#

GPU Server Enabled by Default#

The GPU Server is now enabled by default, improving performance for the default ColabFold databases. The previous release allows enabling the GPU Server required manual configuration and custom database indices created with the makepaddedseqdb command. This release includes the following features:

  • Reduced latency: The GPU Server intelligently stores and loads target sequence databases in GPU memory, reducing query latency

  • Simplified deployment: No additional configuration or separate database indexing steps needed to achieve optimal performance with default databases

  • Pre-indexed databases: The included ColabFold databases (Uniref30_2302, colabfold_envdb_202108, PDB70_220313) come pre-indexed and optimized for GPU Server use

  • Better resource utilization: Efficient GPU memory management enables faster search and alignment operations

Database Changes#

With the GPU Server now optimized and enabled by default, this release focuses on the ColabFold databases for optimal performance:

  • Default databases: ColabFold databases (Uniref30_2302, colabfold_envdb_202108, PDB70_220313) provide better performance and are recommended for most use cases

  • AlphaFold2 databases: The CPU-only AlphaFold2 databases (uniref90, small_bfd, mgnify) are not included for GPU Server operation. However, both alphafold2 (iterative) and colabfold (cascaded) search types remain fully supported when used with the provided Colabfold databases.

  • External database support: Users can still use custom or external databases by mounting them via the MODEL_PATH environment variable

MMSeqs2 Version Upgrade#

This release upgrades to MMSeqs2 version 18, bringing:

  • Enhanced performance optimizations

  • Better stability and error handling

Hardware Support#

Added support for NVIDIA B200 GPUs, expanding deployment options for the latest NVIDIA hardware.

New Features and Improvements#

  • GPU Server enabled by default for optimal performance with ColabFold databases

  • Upgraded to MMSeqs2 version 18 with improved accuracy and performance

  • Enhanced stability and error handling throughout the pipeline

  • Added NVIDIA B200 GPU support

  • Optimized database loading and GPU memory management

Performance Enhancements#

  • Reduced query latency through default GPU Server activation

  • Faster database loading with pre-indexed ColabFold databases

  • Better throughput scaling across multiple GPUs

Migration Notes#

Configuration Changes#

  • GPU Server: GPU Server is now enabled by default; to disable it, set NIM_DISABLE_GPU_SERVER=True

  • MSA Depth: With GPU Server enabled by default, the maximum MSA depth must be set globally for all requests through the NIM_GLOBAL_MAX_MSA_DEPTH environment variable at container startup for performance reasons. This parameter cannot be changed per-request. If you require a specific MSA depth, configure this environment variable accordingly.

Hardware Requirements#

  • 48GB GPUs: For 48GB GPUs (L40S, RTX 6000 Ada), the minimum requirement is now 2 GPUs when using GPU Server (default). This is because GPU Server holds databases in memory for optimal performance.

For Existing Users#

No migration steps are required. The default ColabFold databases will automatically benefit from the enabled GPU Server, providing improved performance.

Note

The alphafold2 search type (iterative search) is still fully supported when using appropriate databases like the included ColabFold databases.


Release 1.0.0#

Summary#

This is the first release of MSA Search NIM. MSA Search enables accurate protein structure prediction from an input protein sequence by predicting potential structural similarities with previously-observed proteins.

This NIM is not a deep learning model but makes use of NVIDIA GPUs and software libraries for accelerated multiple sequence alignment. The NIM relies on GPU-accelerated MMSeqs2 to provide fast, accurate sequence database search and multiple sequence alignment.

  • Supported search_types: alphafold2 and colabfold. Note: Different search types require different database types.

  • Supported NVIDIA GPUs: At least 48GB of GPU Memory (A100, H100, and L40s). Currently there is no official support for GPUs with less than 48 GB of GPU Memory.

Model Variants#

  • ColabFold, which contains the databases Uniref30_2302, colabfold_envdb_202108, and PDB70_220313.

  • AlphaFold, which contains the databases uniref90, mgnify, and smallbfd, as used in AlphaFold2.

Notes and Limitations#

Ensure you use this NIM with GPUs with at least 48 GB of VRAM. In addition, this NIM requires roughly 1.4 Terabytes (1400 Gigabytes) of fast NVMe SSD storage to store the databases.

Note: While there are many options for tuning this NIM’s performance, for most users the defaults will provide a balanced performance experience.