Benchmarking#

Accuracy#

MSA Search NIM performs multiple sequence alignment by searching protein sequence databases for similar sequences to a query and aligning them to establish similar regions. The accuracy of the NIM is measured by comparing the search results against expected alignments and evaluating the sequence identity of the returned matches.

The benchmarking process evaluates the NIM’s ability to find and align relevant sequences across different databases (uniref30_2302, colabfold_envdb_202108, and pdb70_220313) using both search types:

AlphaFold2 search (iterative): Performs single-pass searches per database
ColabFold search (cascaded): Implements cascaded search of generated profiles for higher sensitivity

The model’s performance is assessed by the number of sequences found in each database and the mean sequence identity of the alignments. The accuracy metrics shown in the tables below demonstrate the NIM’s ability to identify homologous sequences across different sequence length ranges.

Accuracy Metrics#

The following tables show the number of sequences found and their mean sequence identity across different databases and sequence length ranges. Column headers indicate input sequence length ranges (in amino acids).

AlphaFold2 Search - Database Coverage#

Mean sequence identity by database:

Database	0-200	200-400	400-600	600-800
pdb70_220313	7%	32%	22%	5%
uniref30_2302	47%	25%	19%	15%
colabfold_envdb_202108	39%	26%	25%	20%

ColabFold Search - Database Coverage#

Mean sequence identity by database:

Database	0-200	200-400	400-600	600-800
pdb70_220313	7%	29%	20%	3%
uniref30_2302	75%	45%	48%	43%
colabfold_envdb_202108	33%	25%	23%	25%
colabfold (final result)	56%	33%	29%	32%

Note

The ColabFold search type demonstrates higher sequence identity percentages in uniref30_2302 due to its cascaded search approach, which builds iterative profiles to find more sensitive matches. The “colabfold (final result)” entry represents the combined cascaded search results across all databases.

Paired MSA Search - Accuracy by Complex#

Paired MSA search accuracy is evaluated on 2-chain protein complexes. The table below shows mean sequence identity per chain and pairing quality metrics:

Complex	Chain A Identity	Chain B Identity	Identity Correlation
1A00	55%	55%	0.67
2A24	71%	63%	0.72
5G5G	38%	34%	0.82
6W2X	54%	52%	0.49
7AXZ	36%	40%	0.83
8CUB	38%	37%	0.88

Chain Identity: Mean sequence identity of alignments found in uniref30_2302 for each chain (with NIM_GLOBAL_MAX_MSA_DEPTH set to 500).
Identity Correlation: Pearson correlation between chain A and chain B sequence identities across paired rows. It is assumed that, for co-evolving protein complexes, sequences from the same species diverge at correlated rates across chains. High correlation values are thought to reflect an underlying co-evolutionary signal and thus provide evidence for proper pairing. The benchmarks include a negative control (“not_real” complex) with unrelated sequences that shows negative identity correlation.

Performance#

MSA Search NIM’s performance primarily depends on:

Sequence length: The length of the input amino acid sequence
Search type: AlphaFold2 (iterative) or ColabFold (cascaded) search
Number of databases: The quantity of databases searched

Separate measurements are conducted for various sequence length bins to report the performance metric as sequences per second (seq/s) at each given length range.

Performance Metrics#

The following tables show performance results for both search types across different GPU configurations. All benchmarks were conducted using the default ColabFold databases (uniref30_2302, colabfold_envdb_202108, pdb70_220313) with GPU Server enabled. Column headers indicate input sequence length ranges (in amino acids).

AlphaFold2 Search Type (Iterative)#

Sequences per second by GPU and sequence length:

GPU	0-200	200-400	400-600	600-800
L40S	1.94	1.01	0.69	0.48
H100	1.28	0.64	0.42	0.30
B200	1.52	0.74	0.48	0.33
A100	0.78	0.37	0.24	0.17

ColabFold Search Type (Cascaded)#

Sequences per second by GPU and sequence length:

GPU	0-200	200-400	400-600	600-800
L40S	0.58	0.31	0.21	0.16
H100	0.38	0.19	0.13	0.10
B200	0.47	0.23	0.15	0.11
A100	0.25	0.12	0.08	0.05

Note

The AlphaFold2 search type performs faster than ColabFold due to its single-pass search strategy per database, while ColabFold implements a more sensitive cascaded search approach that requires multiple search iterations.

Note

All benchmarks were conducted with GPU Server enabled (default in version 2.0.0) and NIM_GLOBAL_MAX_MSA_DEPTH set to 500 sequences.

Paired MSA Search (ColabFold Paired)#

Paired MSA search for protein complexes uses the ColabFold cascaded search approach with additional species-based pairing. Performance depends on:

Number of chains: More chains require more database searches and pairing operations
Sequence lengths: Longer sequences take more time to search
Pairing strategy: “greedy” and “complete” strategies have similar performance for 2-chain complexes

Sequences per second by GPU and sequence length (2-chain complexes):

GPU	0-200	200-400	400-600	600-800
L40S	1.07	0.16	0.12	0.05
H100	0.80	0.12	0.08	0.04
B200	1.05	0.15	0.10	0.05
A100	0.57	0.10	0.06	0.04

Note

Paired MSA search performance is not directly comparable to monomer searches due to different databases being used and varying scaling characteristics. Paired search may perform comparably for short sequences but tends to be slower for longer sequences due to the overhead of searching each chain independently and performing species-based pairing.

Structural Template Search#

Structural template search combines ColabFold cascaded MSA search with template database queries and structure retrieval. Performance depends on the number of structures retrieved.

Sequences per second by GPU and sequence length (with max_structures=20):

GPU	0-200	200-400	400-600	600-800
L40S	1.37	0.45	0.34	0.38
H100	1.06	0.35	0.25	0.27
B200	1.46	0.45	0.31	0.33
A100	0.87	0.28	0.19	0.19

Note

Template search performance varies with the number of template hits found and structures retrieved. Queries with many template hits may be slower due to structure retrieval overhead.

Sample Benchmarking Scripts#

The MSA Search NIM includes benchmarking capabilities that can measure both accuracy and performance.

The benchmarking script is packaged in the NIM’s docker image. To view and study the benchmark, run the following command:

docker run --entrypoint cat nvcr.io/nim/colabfold/msa-search:2 /opt/nim/benchmark.py

To execute the benchmark:

Ensure the NIM is running as described in the Getting Started Guide.
Execute the benchmark by running the following command:

docker run -it --net host --entrypoint "" \
    nvcr.io/nim/colabfold/msa-search:2 \
    /opt/nim/benchmark.py --benchmark-type both