API Reference#
This documentation contains the API reference for the MSA Search NIM.
OpenAPI Specification#
You can download or view the OpenAPI specification when the NIM is running:
curl http://localhost:8000/openapi.json
You can also navigate to the interactive API documentation at http://localhost:8000/docs in your browser.
Multiple Sequence Alignment Search#
Endpoint path: /biology/colabfold/msa-search/predict
Request type: POST
Input Parameters#
sequence(string, required): A sequence to search against the MSA databases. Must be a valid protein sequence composed of the 20 standard amino acids plus X for unknown residues (ARNDCQEGHILKMFPSTWYVX). Length: 1-4096 characters.Example:
"SGSMKTAISLPDETFDRVSRRASELGMSRSEFFTKAAQR"databases(list[string], optional): Database names to search against. All databases are searched by default. Database names are case-insensitive; the response preserves the case you specify. Default:["all"].Important: For ColabFold search type, the first database in the list is used for profile generation. When using
["all"], uniref30 is automatically placed first.Examples:
["all"],["uniref30_2302"],["uniref30_2302", "pdb70_220313"]search_type(string, optional): Which type of MSA Search to run for alignment production. Default:"colabfold".Options:
"colabfold": Cascaded search with higher sensitivity. The first database is used for profile generation."alphafold2": Single-pass iterative search
Examples:
"colabfold","alphafold2"e_value(float, optional): The e-value threshold for filtering hits when building the Multiple Sequence Alignment. Sequences with an e-value greater than this are not included in the MSA. Range: 0.0-1.0. Default:0.0001.iterations(int, optional): The number of MSA iterations to perform, where more iterations find more distant homologs. Default:1. Note: For cascaded search (search_type="colabfold"), the number of iterations is fixed to 3 and this parameter is ignored.max_msa_sequences(int, optional): Maximum sequences per individual database in the response (N). Each database’s result is trimmed to at most N sequences. The mergedcolabfoldentry is not trimmed. It concatenates untrimmed results from all D databases, so its size can be up to D × U, where U > N. The cascaded pipeline first accepts up tomax_accepttargets (default 100), then computes up toalt_alialternative alignments per target (default 10), giving U ≤max_accept× (1 +alt_ali) = 1100 with defaults. Configurable usingNIM_MMSEQS_PROFILE_ALIGN_MAX_ACCEPT/NIM_MMSEQS_FOLLOWUP_ALIGN_MAX_ACCEPTandNIM_MMSEQS_PROFILE_ALIGN_ALT_ALI/NIM_MMSEQS_FOLLOWUP_ALIGN_ALT_ALI. Range: 1 toNIM_GLOBAL_MAX_MSA_DEPTH(default500). When GPU Server is enabled (default in version 2.0.0), the value must matchNIM_GLOBAL_MAX_MSA_DEPTHexactly.output_alignment_formats(list[string], optional): The output format of the MSA. Supported formats:"a3m","fasta". Default:["a3m"].Examples:
["a3m"],["a3m", "fasta"]
Outputs#
alignments(Dictionary[string → Dictionary[string → AlignmentFileRecord]]): The MSA alignments organized by database and format. For example,alignments["uniref30_2302"]["a3m"]contains the A3M alignment for the uniref30 database. For colabfold search type, when multiple databases are searched, an additionalcolabfoldkey contains the merged alignment. When only a single database is searched, nocolabfoldkey is present. The mergedcolabfoldalignment concatenates results from all databases, so the query sequence appears once per source database. Unlike per-database entries, which are trimmed tomax_msa_sequencesN, the mergedcolabfoldentry concatenates untrimmed results from all D databases, so its size can be up to D × U where U > N. The cascaded pipeline first accepts up tomax_accepttargets (default 100), then computes up toalt_alialternative alignments per target (default 10), giving U ≤max_accept× (1 +alt_ali) = 1100 with defaults. Refer toNIM_MMSEQS_PROFILE_ALIGN_MAX_ACCEPT/NIM_MMSEQS_FOLLOWUP_ALIGN_MAX_ACCEPTandNIM_MMSEQS_PROFILE_ALIGN_ALT_ALI/NIM_MMSEQS_FOLLOWUP_ALIGN_ALT_ALI. The merged result is not globally sorted by e-value; sequences from each database are sorted within their block, but blocks are concatenated in database order. The mergedcolabfoldkey is provided for compatibility and may be removed in a future release. Avoid usingcolabfoldkey, if possible.metrics(dictionary, optional): Contains information about the response useful for debugging and measuring performance. May be empty or null.
Paired Multiple Sequence Alignment Search#
Endpoint path: /biology/colabfold/msa-search/paired/predict
Request type: POST
Paired MSA search finds homologous sequences for each chain of a protein complex and pairs them by species, preserving co-evolutionary signals across chains. This is essential for accurate structure prediction of protein complexes.
Input Parameters#
sequences(list[string] or dict[string, string], required): Protein sequences, one per chain. Must contain at least 2 sequences. Each sequence must be composed of the 20 standard amino acids plus X for unknown residues (ARNDCQEGHILKMFPSTWYVX). Can be provided as a list (chain IDs assigned automatically as identifiers such as “A” and “B”) or as a dictionary keyed by chain ID.Examples:
["VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH", "MHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPYTQRFFESFGDLST"]
{"A": "VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH", "B": "MHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPYTQRFFESFGDLST"}
databases(list[string], optional): Databases to search against. Only databases with taxonomy information can be used for paired search. Database names are case-insensitive; the response preserves the case you specify. Default:["all"].Examples:
["all"],["uniref30_2302"]e_value(float, optional): The e-value threshold for filtering hits when building the Multiple Sequence Alignment. Sequences with an e-value greater than this are not included in the MSA. Range: 0.0-1.0. Default:0.0001.max_msa_sequences(int, optional): Maximum sequences per individual database per chain (N). Each database’s result is trimmed to at most N sequences per chain. Only applies whenunpack=True; whenunpack=False, row count may exceed N as the cascaded search pipeline (expand + align) can produce up to D × U sequences where D is the number of databases and U ≤max_accept× (1 +alt_ali) = 1100 with defaults (refer to monomer search documentation for details). Range: 1 toNIM_GLOBAL_MAX_MSA_DEPTH(default500). When GPU Server is enabled (default since version 2.0.0), the value must matchNIM_GLOBAL_MAX_MSA_DEPTHexactly.pairing_strategy(string, optional): Pairing strategy for cross-chain sequence matching. Default:"greedy".Options:
"greedy": Maximizes rows by including species with partial chain coverage, pairing all chains that have hits and leaving gaps in others."complete": Only includes species where all chains have hits, producing fewer rows but with full coverage.
Note
For 2-chain searches, both strategies produce identical results. Differences only arise for 3+ chains.
unpack(boolean, optional): Controls output format. Default:True.True: Returns N alignments (one per chain) keyed by chain ID (for example, “A”, “B”), each strictly limited tomax_msa_sequencesrows.False: Returns raw MMseqs2 output as a single “all_chains” alignment with chains concatenated using null-byte separators. Row count may exceedNIM_GLOBAL_MAX_MSA_DEPTHdue to MMseqs2 internal pipeline behavior (prefilter limits are per-step, not strict global caps).
Outputs#
alignments_by_chain(Dictionary[string → Dictionary[string → Dictionary[string → AlignmentFileRecord]]]): Paired MSA alignments organized by chain ID. Whenunpack=True, structure is{chain_id: {database_name: {format: AlignmentFileRecord}}}. Whenunpack=False, contains a single “all_chains” key with the concatenated paired alignment.Example response (with
unpack=True):{ "alignments_by_chain": { "A": { "uniref30_2302": { "a3m": { "alignment": ">A|-|A\nVLSPADKTNVKAAWGKV...\n>UniRef100_UPI00148F070C...\nVLSPAD...", "format": "a3m" } } }, "B": { "uniref30_2302": { "a3m": { "alignment": ">B|-|B\nMHLTPEEKSAVTALWGKV...\n>UniRef100_UPI0008DEA318...\nMHLTPE...", "format": "a3m" } } } }, "metrics": {} }
To access the A3M-formatted alignment for chain A from the uniref30_2302 database:
alignments_by_chain["A"]["uniref30_2302"]["a3m"]["alignment"]
metrics(dictionary, optional): Contains information about the response useful for debugging and measuring performance. May be empty or null.
Note
The output format for paired MSA search is always A3M.
Structural Template Search#
Endpoint path: /biology/colabfold/msa-search/structure-templates/predict
Request type: POST
Structural template search finds homologous protein structures by searching PDB-based databases and retrieves the corresponding mmCIF structure files. This endpoint combines MSA generation with template discovery in a single request, providing all inputs needed for template-based structure prediction.
Input Parameters#
sequence(string, required): A protein sequence to search against the databases. Must be composed of the 20 standard amino acids plus X for unknown residues (ARNDCQEGHILKMFPSTWYVX). Length: 1-4096 characters.Example:
"SGSMKTAISLPDETFDRVSRRASELGMSRSEFFTKAAQR"structural_template_databases(list[string], optional): List of databases to search for structural templates. Database names are case-insensitive; the response preserves the case you specify. Default: value ofNIM_MSA_API_DEFAULT_STRUCTURAL_TEMPLATE_DBSenvironment variable (typically["pdb70_220313"]).Examples:
["pdb70_220313"],["pdb70_220313", "pdb100_230517"]msa_databases(list[string], optional): Database names to search for MSA generation. The first database is used for profile generation, which determines template search results. Database names are case-insensitive; the response preserves the case you specify. Default:["all"].Examples:
["all"],["uniref30_2302"]e_value(float, optional): The e-value threshold for filtering hits. Range: 0.0-1.0. Default:0.0001.max_structures(int, optional): Maximum number of PDB structures to return from template search. Default:20.max_msa_sequences(int, optional): Maximum sequences per individual database in the response (N). Each database’s result is trimmed to at most N sequences. The mergedcolabfoldentry is not trimmed. It concatenates untrimmed results from all D databases, so its size can be up to D × U, where U > N. The cascaded pipeline first accepts up tomax_accepttargets (default 100), then computes up toalt_alialternative alignments per target (default 10), giving U ≤max_accept× (1 +alt_ali) = 1100 with defaults. Configurable usingNIM_MMSEQS_PROFILE_ALIGN_MAX_ACCEPT/NIM_MMSEQS_FOLLOWUP_ALIGN_MAX_ACCEPTandNIM_MMSEQS_PROFILE_ALIGN_ALT_ALI/NIM_MMSEQS_FOLLOWUP_ALIGN_ALT_ALI. Range: 1 toNIM_GLOBAL_MAX_MSA_DEPTH(default500). When GPU Server is enabled (default), the value must matchNIM_GLOBAL_MAX_MSA_DEPTHexactly.
Outputs#
The response includes the same fields as the standard MSA search endpoint, plus template-specific outputs:
alignments: MSA alignments organized by database and format (same as standard MSA search). Refer to Outputs above for details on the mergedcolabfoldentry.search_hits(Dictionary[string → Dictionary[string → SearchHitRecord]]): Structural template hits organized by database. Each entry contains template hits in M8 (BLAST tabular) format.Each
SearchHitRecordcontains:hits(string): Template hits with columns: query, target, fident, alnlen, mismatch, gapopen, qstart, qend, tstart, tend, evalue, bits, and cigar. The output format can be customized using theNIM_MMSEQS_TEMPLATE_CONVERTALIS_FORMATenvironment variable.format(string): Always"m8".
structures(Dictionary[string → StructuralTemplate]): Retrieved PDB structures for template hits, organized by PDB ID.Each
StructuralTemplatecontains:structure(string): The mmCIF file contentformat(string): Always"mmcif"
metrics(dictionary, optional): Performance and debugging information
Example response:
{
"alignments": {
"uniref30_2302": {
"a3m": {
"alignment": ">query\nMVPSAGQLALF...",
"format": "a3m"
}
},
"colabfold": {
"a3m": {
"alignment": ">query\nMVPSAGQLALF...",
"format": "a3m"
}
}
},
"search_hits": {
"pdb70_220313": {
"m8": {
"hits": "query\t1abc_A\t85.0\t150\t...",
"format": "m8"
}
}
},
"structures": {
"1abc": {
"structure": "data_1ABC\n_entry.id 1ABC\n...",
"format": "mmcif"
}
},
"metrics": {}
}
Note
Template search uses the ColabFold cascaded search approach internally. The profile generated from the first MSA database (for example, uniref30_2302) is used to find structural templates.
Get Database Configuration#
Endpoint path: /biology/colabfold/msa-search/config/msa-database-configs
Request type: GET
Input Parameters#
None.
Outputs#
Returns a list of database configuration objects, each containing:
name(string): The database identifier used in API requests (for example,uniref30_2302).display_name(string): A human-readable display name suitable for UI presentation (for example,UniProt Reference Clusters (30% identity) 2023-02). Falls back tonamefor custom databases without a configured display name. Customizable using theNIM_MSA_DB_DISPLAY_NAMESenvironment variable or by mounting a custom/opt/nim/msa/config.py.relative_path(string, deprecated): AlwaysN/A. Refer to the/v1/metadataendpoint.index_relative_path(string, deprecated): AlwaysN/A. Refer to the/v1/metadataendpoint.ngc_model(string, deprecated): AlwaysN/A. Refer to the/v1/metadataendpoint.
Get MMSeqs2 Version#
Endpoint path: /biology/colabfold/msa-search/mmseqs2/version
Request type: GET
Input Parameters#
None.
Outputs#
mmseqs2_version(string): The version string of the MMSeqs2 installation, typically a git commit hash.
Note
Use this endpoint to get the exact MMSeqs2 version when you need to create custom database indices. Custom indices must be created with the same MMSeqs2 version as the one running in the NIM to ensure compatibility.
Health Endpoints#
Readiness Check#
Endpoint path: /v1/health/ready
Request type: GET
Description: Checks if the service is ready to handle requests.
Outputs#
Status code
200: Service is readyStatus code
503: Service is not ready
Response includes a JSON object with:
message(string): Status messageobject(string): Always “health.response”status(string, optional): Status string for backwards compatibility
Liveness Check#
Endpoint path: /v1/health/live
Request type: GET
Description: Checks if the service is live (running).
Outputs#
Status code
200: Service is liveStatus code
503: Service is not live
Response format is the same as the readiness check.
NIM Metadata Endpoints#
Version#
Endpoint path: /v1/version
Request type: GET
Description: Returns version information for the NIM.
Outputs#
release(string): The product release version of the NIMapi(string): The server API version running inside the NIM
License#
Endpoint path: /v1/license
Request type: GET
Description: Returns license information for the NIM.
Outputs#
name(string): The name of the licensepath(string): The filepath within the container containing the license contentsha(string): SHA1 hash of the license contentssize(integer): Number of characters in the license contenturl(string): URL where the license is hosted externallytype(string): Always “file”content(string): The full license text
Metadata#
Endpoint path: /v1/metadata
Request type: GET
Description: Returns comprehensive metadata about the NIM deployment.
Outputs#
assetInfo(list[string]): Required container assets excluding model artifactslicenseInfo(LicenseEndpointModel): License informationmodelInfo(list[ModelInfo]): Information about models being servedrepository_override(string): Alternate location for retrieving artifactsversion(string): NIM service versionselectedModelProfileId(string): ID of the currently selected model profile
Manifest#
Endpoint path: /v1/manifest
Request type: GET
Description: Returns the manifest file describing required model artifacts.
Outputs#
manifest_file(string): Content of the manifest filerepository_override(string): Alternate location for retrieving artifacts
Metrics#
Endpoint path: /v1/metrics
Request type: GET
Description: Exposes Prometheus metrics for monitoring.
Outputs#
Returns metrics in Prometheus format.