API Reference#
This documentation contains the API reference for the MSA Search NIM.
OpenAPI Specification#
You can download or view the OpenAPI specification when the NIM is running:
curl http://localhost:8000/openapi.json
You can also navigate to the interactive API documentation at http://localhost:8000/docs in your browser.
Multiple Sequence Alignment Search#
Endpoint path: /biology/colabfold/msa-search/predict
Request type: POST
When GPU Server is enabled (the default in version 2.0.0 and later), each request’s max_msa_sequences must exactly equal the container’s NIM_GLOBAL_MAX_MSA_DEPTH (default 500). If they differ, the request fails. When GPU Server is disabled (NIM_DISABLE_GPU_SERVER=True), you may use any max_msa_sequences value in the documented range.
Input Parameters#
sequence(string, required): A sequence to search against the MSA databases. Must be a valid protein sequence composed of the 20 standard amino acids plus X for unknown residues (ARNDCQEGHILKMFPSTWYVX). Length: 1-4096 characters.Example:
"SGSMKTAISLPDETFDRVSRRASELGMSRSEFFTKAAQR"databases(list[string], optional): Database names to search against. All databases are searched by default. Database names are case-insensitive; the response preserves the case you specify. Default:["all"].Important: For ColabFold search type, the first database in the list is used for profile generation. When using
["all"], uniref30 is automatically placed first.Examples:
["all"],["uniref30_2302"],["uniref30_2302", "pdb70_220313"]search_type(string, optional): Which type of MSA Search to run for alignment production. Default:"colabfold".Options:
"colabfold": Cascaded search with higher sensitivity. The first database is used for profile generation."alphafold2": Single-pass iterative search
Examples:
"colabfold","alphafold2"e_value(float, optional): The e-value threshold for filtering hits when building the Multiple Sequence Alignment. Sequences with an e-value greater than this are not included in the MSA. Range: 0.0-1.0. Default:0.0001.iterations(int, optional): The number of MSA iterations to perform, where more iterations find more distant homologs. Default:1. Note: For cascaded search (search_type="colabfold"), the number of iterations is fixed to 3 and this parameter is ignored.max_msa_sequences(int, optional): Maximum sequences per individual database in the response (N). Each database’s result is trimmed to at most N sequences. The mergedcolabfoldentry is not trimmed. It concatenates untrimmed results from all D databases, so its size can be up to D × U, where U > N. The cascaded pipeline first accepts up tomax_accepttargets (default 100), then computes up toalt_alialternative alignments per target (default 10), giving U ≤max_accept× (1 +alt_ali) = 1100 with defaults. Configurable usingNIM_MMSEQS_PROFILE_ALIGN_MAX_ACCEPT/NIM_MMSEQS_FOLLOWUP_ALIGN_MAX_ACCEPTandNIM_MMSEQS_PROFILE_ALIGN_ALT_ALI/NIM_MMSEQS_FOLLOWUP_ALIGN_ALT_ALI. Range: 1 toNIM_GLOBAL_MAX_MSA_DEPTH(default500); when GPU Server is enabled, you must set this field to exactlyNIM_GLOBAL_MAX_MSA_DEPTH. For more information, refer to Multiple Sequence Alignment Search.output_alignment_formats(list[string], optional): The output format of the MSA. Supported formats:"a3m","fasta". Default:["a3m"].Examples:
["a3m"],["a3m", "fasta"]
Outputs#
alignments(Dictionary[string → Dictionary[string → AlignmentFileRecord]]): The MSA alignments organized by database and format. For example,alignments["uniref30_2302"]["a3m"]contains the A3M alignment for the uniref30 database. For colabfold search type, when multiple databases are searched, an additionalcolabfoldkey contains the merged alignment. When only a single database is searched, nocolabfoldkey is present. The mergedcolabfoldalignment concatenates results from all databases, so the query sequence appears once per source database. Unlike per-database entries, which are trimmed tomax_msa_sequencesN, the mergedcolabfoldentry concatenates untrimmed results from all D databases, so its size can be up to D × U where U > N. The cascaded pipeline first accepts up tomax_accepttargets (default 100), then computes up toalt_alialternative alignments per target (default 10), giving U ≤max_accept× (1 +alt_ali) = 1100 with defaults. Refer toNIM_MMSEQS_PROFILE_ALIGN_MAX_ACCEPT/NIM_MMSEQS_FOLLOWUP_ALIGN_MAX_ACCEPTandNIM_MMSEQS_PROFILE_ALIGN_ALT_ALI/NIM_MMSEQS_FOLLOWUP_ALIGN_ALT_ALI. The merged result is not globally sorted by e-value; sequences from each database are sorted within their block, but blocks are concatenated in database order. The mergedcolabfoldkey is provided for compatibility and may be removed in a future release. Avoid usingcolabfoldkey, if possible.metrics(dictionary, optional): Contains information about the response useful for debugging and measuring performance. May be empty or null.
Example response:
{
"alignments": {
"uniref30_2302": {
"a3m": {
"alignment": ">query\nMVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH\n>UniRef100_UPI0002FEA2E8\nMVLSPADKTNVKAAW...\n",
"format": "a3m"
}
},
"colabfold": {
"a3m": {
"alignment": ">query\nMVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH\n>...\n",
"format": "a3m"
}
}
},
"metrics": {}
}
When a single database is selected, the merged colabfold key is omitted; when multiple databases are used, the colabfold entry may appear as described under alignments above.
Paired Multiple Sequence Alignment Search#
Endpoint path: /biology/colabfold/msa-search/paired/predict
Request type: POST
Paired MSA search finds homologous sequences for each chain of a protein complex and pairs them by species, preserving co-evolutionary signals across chains. This is essential for accurate structure prediction of protein complexes.
Input Parameters#
sequences(list[string] or dict[string, string], required): Protein sequences, one per chain. Must contain at least 2 sequences. Each sequence must be composed of the 20 standard amino acids plus X for unknown residues (ARNDCQEGHILKMFPSTWYVX). Can be provided as a list (chain IDs assigned automatically as identifiers such as “A” and “B”) or as a dictionary keyed by chain ID.Examples:
["VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH", "MHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPYTQRFFESFGDLST"]
{"A": "VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH", "B": "MHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPYTQRFFESFGDLST"}
databases(list[string], optional): Databases to search against. Only databases with taxonomy information can be used for paired search. Database names are case-insensitive; the response preserves the case you specify. Default:["all"].Examples:
["all"],["uniref30_2302"]e_value(float, optional): The e-value threshold for filtering hits when building the Multiple Sequence Alignment. Sequences with an e-value greater than this are not included in the MSA. Range: 0.0-1.0. Default:0.0001.max_msa_sequences(int, optional): Maximum sequences per individual database per chain (N). Each database’s result is trimmed to at most N sequences per chain. Only applies whenunpack=True; whenunpack=False, row count may exceed N as the cascaded search pipeline (expand + align) can produce up to D × U sequences where D is the number of databases and U ≤max_accept× (1 +alt_ali) = 1100 with defaults (refer to monomer search documentation for details). Range: 1 toNIM_GLOBAL_MAX_MSA_DEPTH(default500); when GPU Server is enabled, you must set this field to exactlyNIM_GLOBAL_MAX_MSA_DEPTH. For more information, refer to Multiple Sequence Alignment Search.pairing_strategy(string, optional): Pairing strategy for cross-chain sequence matching. Default:"greedy".Options:
"greedy": Maximizes rows by including species with partial chain coverage, pairing all chains that have hits and leaving gaps in others."complete": Only includes species where all chains have hits, producing fewer rows but with full coverage.
For 2-chain searches, both strategies produce identical results; differences only arise for three or more chains.
unpack(boolean, optional): Controls output format. Default:True.True: Returns N alignments (one per chain) keyed by chain ID (for example, “A”, “B”), each strictly limited tomax_msa_sequencesrows.False: Returns raw MMseqs2 output as a single “all_chains” alignment with chains concatenated using null-byte separators. Row count may exceedNIM_GLOBAL_MAX_MSA_DEPTHdue to MMseqs2 internal pipeline behavior (prefilter limits are per-step, not strict global caps).
Outputs#
Paired MSA search returns alignment payloads in A3M format only.
alignments_by_chain(Dictionary[string → Dictionary[string → Dictionary[string → AlignmentFileRecord]]]): Paired MSA alignments organized by chain ID. Whenunpack=True, structure is{chain_id: {database_name: {format: AlignmentFileRecord}}}. Whenunpack=False, contains a single “all_chains” key with the concatenated paired alignment.Example response (with
unpack=True):{ "alignments_by_chain": { "A": { "uniref30_2302": { "a3m": { "alignment": ">A|-|A\nVLSPADKTNVKAAWGKV...\n>UniRef100_UPI00148F070C...\nVLSPAD...", "format": "a3m" } } }, "B": { "uniref30_2302": { "a3m": { "alignment": ">B|-|B\nMHLTPEEKSAVTALWGKV...\n>UniRef100_UPI0008DEA318...\nMHLTPE...", "format": "a3m" } } } }, "metrics": {} }
To access the A3M-formatted alignment for chain A from the uniref30_2302 database:
alignments_by_chain["A"]["uniref30_2302"]["a3m"]["alignment"]
metrics(dictionary, optional): Contains information about the response useful for debugging and measuring performance. May be empty or null.
Structural Template Search#
Endpoint path: /biology/colabfold/msa-search/structure-templates/predict
Request type: POST
Structural template search finds homologous protein structures by searching PDB-based databases and retrieves the corresponding mmCIF structure files. This endpoint combines MSA generation with template discovery in a single request, providing all inputs needed for template-based structure prediction.
Input Parameters#
sequence(string, required): A protein sequence to search against the databases. Must be composed of the 20 standard amino acids plus X for unknown residues (ARNDCQEGHILKMFPSTWYVX). Length: 1-4096 characters.Example:
"SGSMKTAISLPDETFDRVSRRASELGMSRSEFFTKAAQR"structural_template_databases(list[string], optional): List of databases to search for structural templates. Database names are case-insensitive; the response preserves the case you specify. Default: value ofNIM_MSA_API_DEFAULT_STRUCTURAL_TEMPLATE_DBSenvironment variable (typically["pdb70_220313"]).Examples:
["pdb70_220313"],["pdb70_220313", "pdb100_230517"]msa_databases(list[string], optional): Database names to search for MSA generation. The first database is used for profile generation, which determines template search results. Database names are case-insensitive; the response preserves the case you specify. Default:["all"].Examples:
["all"],["uniref30_2302"]e_value(float, optional): The e-value threshold for filtering hits. Range: 0.0-1.0. Default:0.0001.max_structures(int, optional): Maximum number of PDB structures to return from template search. Default:20.max_msa_sequences(int, optional): Maximum sequences per individual database in the response (N). Each database’s result is trimmed to at most N sequences. The mergedcolabfoldentry is not trimmed. It concatenates untrimmed results from all D databases, so its size can be up to D × U, where U > N. The cascaded pipeline first accepts up tomax_accepttargets (default 100), then computes up toalt_alialternative alignments per target (default 10), giving U ≤max_accept× (1 +alt_ali) = 1100 with defaults. Configurable usingNIM_MMSEQS_PROFILE_ALIGN_MAX_ACCEPT/NIM_MMSEQS_FOLLOWUP_ALIGN_MAX_ACCEPTandNIM_MMSEQS_PROFILE_ALIGN_ALT_ALI/NIM_MMSEQS_FOLLOWUP_ALIGN_ALT_ALI. Range: 1 toNIM_GLOBAL_MAX_MSA_DEPTH(default500); when GPU Server is enabled, you must set this field to exactlyNIM_GLOBAL_MAX_MSA_DEPTH. For more information, refer to Multiple Sequence Alignment Search.output_alignment_formats(list[string], optional): Output formats for the MSA fields inalignments(same as standard MSA search). Supported values:"a3m","fasta". Default:["a3m"]. Does not affect template hit tables (search_hits, M8) or structure payloads (structures, mmCIF).
Outputs#
The response includes the same fields as the standard MSA search endpoint, plus template-specific outputs.
MSA fields in alignments follow the same output_alignment_formats behavior as Multiple Sequence Alignment Search (default A3M; optional FASTA). Template hit strings in search_hits are always M8 (BLAST tabular); structures in structures are always mmCIF. Template search runs the ColabFold cascaded search approach internally, and the profile from the first MSA database (for example, uniref30_2302) is used to find structural templates.
alignments: MSA alignments organized by database and format (same as standard MSA search). Refer to Outputs above for details on the mergedcolabfoldentry.search_hits(Dictionary[string → Dictionary[string → SearchHitRecord]]): Structural template hits organized by database. Each entry contains template hits in M8 (BLAST tabular) format.Each
SearchHitRecordcontains:hits(string): Template hits with columns: query, target, fident, alnlen, mismatch, gapopen, qstart, qend, tstart, tend, evalue, bits, and cigar. The output format can be customized using theNIM_MMSEQS_TEMPLATE_CONVERTALIS_FORMATenvironment variable.format(string): Always"m8".
structures(Dictionary[string → StructuralTemplate]): Retrieved PDB structures for template hits, organized by PDB ID.Each
StructuralTemplatecontains:structure(string): The mmCIF file contentformat(string): Always"mmcif"
metrics(dictionary, optional): Performance and debugging information
Example response:
{
"alignments": {
"uniref30_2302": {
"a3m": {
"alignment": ">query\nMVPSAGQLALF...",
"format": "a3m"
}
},
"colabfold": {
"a3m": {
"alignment": ">query\nMVPSAGQLALF...",
"format": "a3m"
}
}
},
"search_hits": {
"pdb70_220313": {
"m8": {
"hits": "query\t1abc_A\t85.0\t150\t...",
"format": "m8"
}
}
},
"structures": {
"1abc": {
"structure": "data_1ABC\n_entry.id 1ABC\n...",
"format": "mmcif"
}
},
"metrics": {}
}
Get Database Configuration#
Endpoint path: /biology/colabfold/msa-search/config/msa-database-configs
Request type: GET
Input Parameters#
None.
Outputs#
Returns a list of database configuration objects, each containing:
name(string): The database identifier used in API requests (for example,uniref30_2302).display_name(string): A human-readable display name suitable for UI presentation (for example,UniProt Reference Clusters (30% identity) 2023-02). Falls back tonamefor custom databases without a configured display name. Customizable using theNIM_MSA_DB_DISPLAY_NAMESenvironment variable or by mounting a custom/opt/nim/msa/config.py.relative_path(string, deprecated): AlwaysN/A. Refer to the/v1/metadataendpoint.index_relative_path(string, deprecated): AlwaysN/A. Refer to the/v1/metadataendpoint.ngc_model(string, deprecated): AlwaysN/A. Refer to the/v1/metadataendpoint.
Get MMSeqs2 Version#
Endpoint path: /biology/colabfold/msa-search/mmseqs2/version
Request type: GET
Input Parameters#
None.
Outputs#
mmseqs2_version(string): The version string of the MMSeqs2 installation, typically a git commit hash. Use this value when building custom database indices; indices must be created with the same MMSeqs2 version as the NIM for compatibility.
Health Endpoints#
Readiness Check#
Endpoint path: /v1/health/ready
Request type: GET
Description: Checks if the service is ready to handle requests.
Outputs#
Status code
200: Service is readyStatus code
503: Service is not ready
Response includes a JSON object with:
message(string): Status messageobject(string): Always “health.response”status(string, optional): Status string for backwards compatibility
Liveness Check#
Endpoint path: /v1/health/live
Request type: GET
Description: Checks if the service is live (running).
Outputs#
Status code
200: Service is liveStatus code
503: Service is not live
Response format is the same as the readiness check.
NIM Metadata Endpoints#
Version#
Endpoint path: /v1/version
Request type: GET
Description: Returns version information for the NIM.
Outputs#
release(string): The product release version of the NIMapi(string): The server API version running inside the NIM
License#
Endpoint path: /v1/license
Request type: GET
Description: Returns license information for the NIM.
Outputs#
name(string): The name of the licensepath(string): The filepath within the container containing the license contentsha(string): SHA1 hash of the license contentssize(integer): Number of characters in the license contenturl(string): URL where the license is hosted externallytype(string): Always “file”content(string): The full license text
Metadata#
Endpoint path: /v1/metadata
Request type: GET
Description: Returns comprehensive metadata about the NIM deployment.
Outputs#
assetInfo(list[string]): Required container assets excluding model artifactslicenseInfo(LicenseEndpointModel): License informationmodelInfo(list[ModelInfo]): Information about models being servedrepository_override(string): Alternate location for retrieving artifactsversion(string): NIM service versionselectedModelProfileId(string): ID of the currently selected model profile
Manifest#
Endpoint path: /v1/manifest
Request type: GET
Description: Returns the manifest file describing required model artifacts.
Outputs#
manifest_file(string): Content of the manifest filerepository_override(string): Alternate location for retrieving artifacts
Metrics#
Endpoint path: /v1/metrics
Request type: GET
Description: Exposes Prometheus metrics for monitoring.
Outputs#
Returns metrics in Prometheus format.