Benchmarking#

Accuracy#

MSA Search NIM performs multiple sequence alignment by searching protein sequence databases for similar sequences to a query and aligning them to establish similar regions. The accuracy of the NIM is measured by comparing the search results against expected alignments and evaluating the sequence identity of the returned matches.

The benchmarking process evaluates the NIM’s ability to find and align relevant sequences across different databases (Uniref30_2302, colabfold_envdb_202108, and PDB70_220313) using both search types:

  • AlphaFold2 search (iterative): Performs single-pass searches per database

  • ColabFold search (cascaded): Implements cascaded search of generated profiles for higher sensitivity

The model’s performance is assessed by the number of sequences found in each database and the mean sequence identity of the alignments. The accuracy metrics shown in the tables below demonstrate the NIM’s ability to identify homologous sequences across different sequence length ranges.

Accuracy Metrics#

The following tables show the number of sequences found and their mean sequence identity across different databases and sequence length ranges. Column headers indicate input sequence length ranges (in amino acids).

AlphaFold2 Search - Database Coverage#

Mean sequences found and sequence identity by database:

Database

0-200

200-400

400-600

600-800

PDB70_220313

6 (8%)

46 (34%)

129 (24%)

36 (5%)

Uniref30_2302

56 (50%)

295 (27%)

342 (20%)

304 (16%)

colabfold_envdb_202108

98 (41%)

370 (28%)

402 (26%)

385 (21%)

Values show: mean sequences found (mean sequence identity)

ColabFold Search - Database Coverage#

Mean sequences found and sequence identity by database:

Database

0-200

200-400

400-600

600-800

PDB70_220313

6 (7%)

64 (31%)

171 (21%)

77 (3%)

Uniref30_2302

75 (79%)

185 (48%)

101 (51%)

100 (46%)

colabfold_envdb_202108

76 (36%)

174 (26%)

111 (24%)

99 (26%)

colabfold (final result)

133 (72%)

472 (37%)

213 (38%)

200 (37%)

Values show: mean sequences found (mean sequence identity)

Note

The ColabFold search type demonstrates higher sequence identity percentages in Uniref30_2302 due to its cascaded search approach, which builds iterative profiles to find more sensitive matches. The “colabfold (final result)” entry represents the combined cascaded search results across all databases.

Performance#

MSA Search NIM’s performance primarily depends on:

  • Sequence length: The length of the input amino acid sequence

  • Search type: AlphaFold2 (iterative) or ColabFold (cascaded) search

  • Number of databases: The quantity of databases searched

Separate measurements are conducted for various sequence length bins to report the performance metric as sequences per second (seq/s) at each given length range.

Performance Metrics#

The following tables show performance results for both search types across different GPU configurations. All benchmarks were conducted using the default ColabFold databases (Uniref30_2302, colabfold_envdb_202108, PDB70_220313) with GPU Server enabled. Column headers indicate input sequence length ranges (in amino acids).

AlphaFold2 Search Type (Iterative)#

Sequences per second by GPU and sequence length:

GPU

0-200

200-400

400-600

600-800

L40S

1.83

0.98

0.67

0.47

H100

1.19

0.63

0.42

0.30

B200

1.43

0.73

0.48

0.33

A100

0.73

0.36

0.24

0.17

ColabFold Search Type (Cascaded)#

Sequences per second by GPU and sequence length:

GPU

0-200

200-400

400-600

600-800

L40S

0.55

0.29

0.21

0.15

H100

0.35

0.19

0.13

0.09

B200

0.45

0.23

0.15

0.11

A100

0.23

0.11

0.08

0.05

Note

The AlphaFold2 search type performs faster than ColabFold due to its single-pass search strategy per database, while ColabFold implements a more sensitive cascaded search approach that requires multiple search iterations.

Note

All benchmarks were conducted with GPU Server enabled (default in version 2.0.0) and NIM_GLOBAL_MAX_MSA_DEPTH set to 500 sequences.

Sample Benchmarking Scripts#

The MSA Search NIM includes benchmarking capabilities that can measure both accuracy and performance.

The benchmarking script is packaged in the NIM’s docker image. To view and study the benchmark, run the following command:

docker run --entrypoint cat nvcr.io/nim/colabfold/msa-search:2 /opt/nim/benchmark.py

To execute the benchmark:

  1. Ensure the NIM is running as described in the Getting Started Guide.

  2. Execute the benchmark by running the following command:

docker run -it --net host --entrypoint "" \
    nvcr.io/nim/colabfold/msa-search:2 \
    /opt/nim/benchmark.py --benchmark-type both