Indexing Databases for GPU Server#
The MSA Search NIM includes GPU-accelerated search through the GPU Server, which requires databases to be indexed with specific parameters. This guide explains how to index custom databases using the MMseqs2 tools included in the NIM container.
When to Index#
Databases downloaded from NGC are already pre-indexed and ready for GPU Server use. No additional indexing is required for these databases.
Indexing is only required when:
You have created a custom MMseqs2 database from FASTA sequences
You are using third-party databases not downloaded from NGC
Using MMseqs2 from the NIM Container#
The NIM container includes MMseqs2 with GPU support. Define a helper function to simplify running MMseqs2 commands:
mmseqs_run() {
docker run --rm --runtime=nvidia --gpus all --entrypoint mmseqs \
-v "$PWD":/work -w /work nvcr.io/nim/colabfold/msa-search:2 "$@"
}
You can run any MMseqs2 command with that helper function.
For example, to see all available commands:
mmseqs_run --help
Example: Re-indexing a Downloaded Database#
This example demonstrates the indexing process by downloading the PDB70 database, removing its index files, and re-creating them.
1. Download the database#
DATABASE=pdb70_220313-m18v1
ngc registry model download-version nim/colabfold/msa-search:$DATABASE
cd msa-search_v$DATABASE/pdb70_220313
2. Examine the database files#
ls -la
The directory structure is:
msa-search_vpdb70_220313-m18v1/
└── pdb70_220313/
├── pdb70_220313
├── pdb70_220313.dbtype
├── pdb70_220313.idx
├── pdb70_220313.idx.dbtype
├── pdb70_220313.idx.index
├── pdb70_220313.index
├── pdb70_220313.lookup
├── pdb70_220313_h
├── pdb70_220313_h.dbtype
└── pdb70_220313_h.index
Core database files:
pdb70_220313,pdb70_220313.dbtype,pdb70_220313_h, etc.Index files:
pdb70_220313.idx,pdb70_220313.idx.index, etc.
3. Remove existing index files#
rm -f pdb70_220313.idx*
4. Re-create the index for GPU Server#
mmseqs_run createindex pdb70_220313 tmp --remove-tmp-files 1 --split 1 --index-subset 2
rm -rf tmp
The key parameters for GPU Server indexing are:
--split 1: Single split for GPU memory--index-subset 2: Create GPU-compatible index subset
5. Verify the index was created#
ls -la pdb70_220313.idx*
You should see new index files created.
MMseqs2 Command Reference#
For comprehensive MMseqs2 documentation, run:
mmseqs_run --help
Or for help on a specific command:
mmseqs_run createindex --help
For additional information, refer to the MMseqs2 User Guide.