API Reference#
This documentation contains the API reference for the MSA Search NIM.
OpenAPI Specification#
You can download or view the OpenAPI specification when the NIM is running:
curl http://localhost:8000/openapi.json
You can also navigate to the interactive API documentation at http://localhost:8000/docs in your browser.
Multiple Sequence Alignment Search#
Endpoint path: /biology/colabfold/msa-search/predict
Request type: POST
Input Parameters#
sequence(string, required): A sequence to search against the MSA databases. Must be a valid protein sequence composed of the 20 standard amino acids (ARNDCQEGHILKMFPSTWYV). Length: 1-4096 characters.Example:
"SGSMKTAISLPDETFDRVSRRASELGMSRSEFFTKAAQR"databases(list[string], optional): Database names to search against. All databases are searched by default. Default:["all"].Examples:
["all"],["Uniref30_2302"],["Uniref30_2302", "PDB70_220313"]search_type(string, optional): Which type of MSA Search to run for alignment production. Default:"colabfold".Options:
"colabfold": Cascaded search with higher sensitivity"alphafold2": Single-pass iterative search
Examples:
"colabfold","alphafold2"e_value(float, optional): The e-value threshold for filtering hits when building the Multiple Sequence Alignment. Sequences with an e-value greater than this are not included in the MSA. Range: 0.0-1.0. Default:0.0001.iterations(int, optional): The number of MSA iterations to perform, where more iterations find more distant homologs. Default:1. Note: For cascaded search (search_type="colabfold"), the number of iterations is fixed to 3 and this parameter is ignored.max_msa_sequences(int, optional): The maximum sequences taken from the MSA for model prediction. Default:500. Note: When GPU Server is enabled (default in version 2.0.0), this parameter must be set globally using theNIM_GLOBAL_MAX_MSA_DEPTHenvironment variable at container startup.output_alignment_formats(list[string], optional): The output format of the MSA. Supported formats:"a3m","fasta". Default:["a3m"].Examples:
["a3m"],["a3m", "fasta"]
Outputs#
alignments(Dictionary[string → Dictionary[string → AlignmentFileRecord]]): The MSA alignments organized by database and format. Structure:{database_name: {format: AlignmentFileRecord}}. For example,alignments['Uniref30_2302']['a3m']contains the A3M alignment for the Uniref30 database.Each
AlignmentFileRecordcontains:alignment(string): The contents of a single MSA. For ‘a3m’ format, sequences are in compact A3M format with insertions in lowercase. For ‘fasta’ format, sequences are in standard aligned FASTA format with gaps.format(string): The format of the alignment record(s). Values:"a3m"or"fasta".
Example response:
{ "alignments": { "Uniref30_2302": { "a3m": { "alignment": ">query\nMVPSAGQLALFALGIVLAACQALENS\n>hit1\nMVPSAGQLALFALGIV---CQALENS\n>hit2\nMVPSAGQLALF-LGIV---CQALENS", "format": "a3m" } } }, "metrics": {} }
To access the A3M-formatted alignment for the Uniref30_2302 database:
alignments["Uniref30_2302"]["a3m"]["alignment"]
metrics(dictionary, optional): Contains information about the response useful for debugging and measuring performance. May be empty or null.
Paired Multiple Sequence Alignment Search#
Endpoint path: /biology/colabfold/msa-search/paired/predict
Request type: POST
Paired MSA search finds homologous sequences for each chain of a protein complex and pairs them by species, preserving co-evolutionary signals across chains. This is essential for accurate structure prediction of protein complexes.
Input Parameters#
sequences(list[string] or dict[string, string], required): Protein sequences, one per chain. Must contain at least 2 sequences. Each sequence must be composed of the 20 standard amino acids (ARNDCQEGHILKMFPSTWYV). Can be provided as a list (chain IDs assigned automatically as identifiers such as “A” and “B”) or as a dictionary keyed by chain ID.Examples:
["VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH", "MHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPYTQRFFESFGDLST"]
{"A": "VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH", "B": "MHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPYTQRFFESFGDLST"}
databases(list[string], optional): Databases to search against. Only databases with taxonomy information can be used for paired search. Default:["all"].Examples:
["all"],["Uniref30_2302"]e_value(float, optional): The e-value threshold for filtering hits when building the Multiple Sequence Alignment. Sequences with an e-value greater than this are not included in the MSA. Range: 0.0-1.0. Default:0.0001.max_msa_sequences(int, optional): Maximum sequences taken from the MSA per chain (only applies whenunpack=True; whenunpack=False, row count may exceedNIM_GLOBAL_MAX_MSA_DEPTHdue to MMseqs2 internal pipeline behavior where prefilter limits are per-step, not strict global caps). Default:500. Note: When GPU Server is enabled (default since version 2.0.0), this parameter must be set globally using theNIM_GLOBAL_MAX_MSA_DEPTHenvironment variable at container startup.pairing_strategy(string, optional): Pairing strategy for cross-chain sequence matching. Default:"greedy".Options:
"greedy": Maximizes rows by including species with partial chain coverage, pairing all chains that have hits and leaving gaps in others."complete": Only includes species where all chains have hits, producing fewer rows but with full coverage.
Note
For 2-chain searches, both strategies produce identical results. Differences only arise for 3+ chains.
unpack(boolean, optional): Controls output format. Default:True.True: Returns N alignments (one per chain) keyed by chain ID (for example, “A”, “B”), each strictly limited tomax_msa_sequencesrows.False: Returns raw MMseqs2 output as a single “all_chains” alignment with chains concatenated using null-byte separators. Row count may exceedNIM_GLOBAL_MAX_MSA_DEPTHdue to MMseqs2 internal pipeline behavior (prefilter limits are per-step, not strict global caps).
Outputs#
alignments_by_chain(Dictionary[string → Dictionary[string → Dictionary[string → AlignmentFileRecord]]]): Paired MSA alignments organized by chain ID. Whenunpack=True, structure is{chain_id: {database_name: {format: AlignmentFileRecord}}}. Whenunpack=False, contains a single “all_chains” key with the concatenated paired alignment.Example response (with
unpack=True):{ "alignments_by_chain": { "A": { "Uniref30_2302": { "a3m": { "alignment": ">A|-|A\nVLSPADKTNVKAAWGKV...\n>UniRef100_UPI00148F070C...\nVLSPAD...", "format": "a3m" } } }, "B": { "Uniref30_2302": { "a3m": { "alignment": ">B|-|B\nMHLTPEEKSAVTALWGKV...\n>UniRef100_UPI0008DEA318...\nMHLTPE...", "format": "a3m" } } } }, "metrics": {} }
To access the A3M-formatted alignment for chain A from the Uniref30_2302 database:
alignments_by_chain["A"]["Uniref30_2302"]["a3m"]["alignment"]
metrics(dictionary, optional): Contains information about the response useful for debugging and measuring performance. May be empty or null.
Note
The output format for paired MSA search is always A3M.
Get Database Configuration#
Endpoint path: /biology/colabfold/msa-search/config/msa-database-configs
Request type: GET
Input Parameters#
None.
Outputs#
configs(dictionary): A nested dictionary containing information about all configured MSA databases. The configuration can be converted to YAML format to view the original database configuration.
Get MMSeqs2 Version#
Endpoint path: /biology/colabfold/msa-search/mmseqs2/version
Request type: GET
Input Parameters#
None.
Outputs#
Returns version information for the MMSeqs2 installation used by the NIM.
Note
Use this endpoint to get the exact MMSeqs2 version when you need to create custom database indices. Custom indices must be created with the same MMSeqs2 version as the one running in the NIM to ensure compatibility.
Health Endpoints#
Readiness Check#
Endpoint path: /v1/health/ready
Request type: GET
Description: Checks if the service is ready to handle requests.
Outputs#
Status code
200: Service is readyStatus code
503: Service is not ready
Response includes a JSON object with:
message(string): Status messageobject(string): Always “health.response”status(string, optional): Status string for backwards compatibility
Liveness Check#
Endpoint path: /v1/health/live
Request type: GET
Description: Checks if the service is live (running).
Outputs#
Status code
200: Service is liveStatus code
503: Service is not live
Response format is the same as the readiness check.
NIM Metadata Endpoints#
Version#
Endpoint path: /v1/version
Request type: GET
Description: Returns version information for the NIM.
Outputs#
release(string): The product release version of the NIMapi(string): The server API version running inside the NIM
License#
Endpoint path: /v1/license
Request type: GET
Description: Returns license information for the NIM.
Outputs#
name(string): The name of the licensepath(string): The filepath within the container containing the license contentsha(string): SHA1 hash of the license contentssize(integer): Number of characters in the license contenturl(string): URL where the license is hosted externallytype(string): Always “file”content(string): The full license text
Metadata#
Endpoint path: /v1/metadata
Request type: GET
Description: Returns comprehensive metadata about the NIM deployment.
Outputs#
assetInfo(list[string]): Required container assets excluding model artifactslicenseInfo(LicenseEndpointModel): License informationmodelInfo(list[ModelInfo]): Information about models being servedrepository_override(string): Alternate location for retrieving artifactsversion(string): NIM service versionselectedModelProfileId(string): ID of the currently selected model profile
Manifest#
Endpoint path: /v1/manifest
Request type: GET
Description: Returns the manifest file describing required model artifacts.
Outputs#
manifest_file(string): Content of the manifest filerepository_override(string): Alternate location for retrieving artifacts
Metrics#
Endpoint path: /v1/metrics
Request type: GET
Description: Exposes Prometheus metrics for monitoring.
Outputs#
Returns metrics in Prometheus format.