Benchmarking#

Accuracy#

MSA Search NIM performs multiple sequence alignment by searching protein sequence databases for similar sequences to a query and aligning them to establish similar regions. The accuracy of the NIM is measured by comparing the search results against expected alignments and evaluating the sequence identity of the returned matches.

The benchmarking process evaluates the NIM’s ability to find and align relevant sequences across different databases (Uniref30_2302, colabfold_envdb_202108, and PDB70_220313) using both search types:

AlphaFold2 search (iterative): Performs single-pass searches per database
ColabFold search (cascaded): Implements cascaded search of generated profiles for higher sensitivity

The model’s performance is assessed by the number of sequences found in each database and the mean sequence identity of the alignments. The accuracy metrics shown in the tables below demonstrate the NIM’s ability to identify homologous sequences across different sequence length ranges.

Accuracy Metrics#

The following tables show the number of sequences found and their mean sequence identity across different databases and sequence length ranges. Column headers indicate input sequence length ranges (in amino acids).

AlphaFold2 Search - Database Coverage#

Mean sequences found and sequence identity by database:

Database	0-200	200-400	400-600	600-800
PDB70_220313	6 (8%)	46 (34%)	129 (24%)	36 (5%)
Uniref30_2302	56 (50%)	295 (27%)	342 (20%)	304 (16%)
colabfold_envdb_202108	98 (41%)	370 (28%)	402 (26%)	385 (21%)

Values show: mean sequences found (mean sequence identity)

ColabFold Search - Database Coverage#

Mean sequences found and sequence identity by database:

Database	0-200	200-400	400-600	600-800
PDB70_220313	6 (7%)	64 (31%)	171 (21%)	77 (3%)
Uniref30_2302	75 (79%)	185 (48%)	101 (51%)	100 (46%)
colabfold_envdb_202108	76 (36%)	174 (26%)	111 (24%)	99 (26%)
colabfold (final result)	133 (72%)	472 (37%)	213 (38%)	200 (37%)

Values show: mean sequences found (mean sequence identity)

Note

The ColabFold search type demonstrates higher sequence identity percentages in Uniref30_2302 due to its cascaded search approach, which builds iterative profiles to find more sensitive matches. The “colabfold (final result)” entry represents the combined cascaded search results across all databases.

Paired MSA Search - Accuracy by Complex#

Paired MSA search accuracy is evaluated on 2-chain protein complexes. The table below shows mean sequence identity per chain and pairing quality metrics:

Complex	Chain A Identity	Chain B Identity	Identity Correlation
1A00	55%	55%	0.67
2A24	71%	63%	0.72
5G5G	38%	34%	0.82
6W2X	54%	52%	0.49
7AXZ	36%	40%	0.83
8CUB	38%	37%	0.88

Chain Identity: Mean sequence identity of alignments found in Uniref30_2302 for each chain (with NIM_GLOBAL_MAX_MSA_DEPTH set to 500).
Identity Correlation: Pearson correlation between chain A and chain B sequence identities across paired rows. It is assumed that, for co-evolving protein complexes, sequences from the same species diverge at correlated rates across chains. High correlation values are thought to reflect an underlying co-evolutionary signal and thus provide evidence for proper pairing. The benchmarks include a negative control (“not_real” complex) with unrelated sequences that shows negative identity correlation.

Performance#

MSA Search NIM’s performance primarily depends on:

Sequence length: The length of the input amino acid sequence
Search type: AlphaFold2 (iterative) or ColabFold (cascaded) search
Number of databases: The quantity of databases searched

Separate measurements are conducted for various sequence length bins to report the performance metric as sequences per second (seq/s) at each given length range.

Performance Metrics#

The following tables show performance results for both search types across different GPU configurations. All benchmarks were conducted using the default ColabFold databases (Uniref30_2302, colabfold_envdb_202108, PDB70_220313) with GPU Server enabled. Column headers indicate input sequence length ranges (in amino acids).

AlphaFold2 Search Type (Iterative)#

Sequences per second by GPU and sequence length:

GPU	0-200	200-400	400-600	600-800
L40S	1.83	0.98	0.67	0.47
H100	1.19	0.63	0.42	0.30
B200	1.43	0.73	0.48	0.33
A100	0.73	0.36	0.24	0.17

ColabFold Search Type (Cascaded)#

Sequences per second by GPU and sequence length:

GPU	0-200	200-400	400-600	600-800
L40S	0.55	0.29	0.21	0.15
H100	0.35	0.19	0.13	0.09
B200	0.45	0.23	0.15	0.11
A100	0.23	0.11	0.08	0.05

Note

The AlphaFold2 search type performs faster than ColabFold due to its single-pass search strategy per database, while ColabFold implements a more sensitive cascaded search approach that requires multiple search iterations.

Note

All benchmarks were conducted with GPU Server enabled (default in version 2.0.0) and NIM_GLOBAL_MAX_MSA_DEPTH set to 500 sequences.

Paired MSA Search (ColabFold Paired)#

Paired MSA search for protein complexes uses the ColabFold cascaded search approach with additional species-based pairing. Performance depends on:

Number of chains: More chains require more database searches and pairing operations
Sequence lengths: Longer sequences take more time to search
Pairing strategy: “greedy” and “complete” strategies have similar performance for 2-chain complexes

Sequences per second by GPU and sequence length (2-chain complexes):

GPU	0-200	200-400	400-600	600-800
L40S	0.99	0.14	0.12	0.05
H100	0.78	0.11	0.08	0.04
B200	1.05	0.15	0.10	0.05
A100	0.56	0.10	0.06	0.04

Note

Paired MSA search is slower than monomer searches due to the additional overhead of searching for each chain independently and performing species-based pairing to create paired alignments.

Sample Benchmarking Scripts#

The MSA Search NIM includes benchmarking capabilities that can measure both accuracy and performance.

The benchmarking script is packaged in the NIM’s docker image. To view and study the benchmark, run the following command:

docker run --entrypoint cat nvcr.io/nim/colabfold/msa-search:2 /opt/nim/benchmark.py

To execute the benchmark:

Ensure the NIM is running as described in the Getting Started Guide.
Execute the benchmark by running the following command:

docker run -it --net host --entrypoint "" \
    nvcr.io/nim/colabfold/msa-search:2 \
    /opt/nim/benchmark.py --benchmark-type both