API Reference#
This documentation contains the API reference for the MSA Search NIM.
OpenAPI Specification#
You can download the complete OpenAPI Specification for the MSA Search NIM here.
Predict a Multiple Sequence Alignment Search#
Endpoint path: /biology/colabfold/msa-search/predict
Request type: post
Input parameters#
sequence (string)
: Required. An input string representing a protein sequence and composed of valid IUPAC amino acid single-letter codes.databases (list[strings])
: Optional (default: all available databases). Number of tokens to be generated.e_value (float)
: Optional. Sequences with an e-value less than this are not included in the MSA.iterations (int)
: Optional. The number of search iterations. Running more iterations produces more sensitive results.output_alignment_formats (list[strings])
: A list of output alignment formats, which can contain any ofa3m
orfasta
.max_msa_sequences (int)
: Optional. The maximum number of sequences included in the output MSA of each database. Note: This parameter is overridden by theNIM_GLOBAL_MAX_MSA_SEQUENCES
environment variable and the limit to the number of NIM sequences is 10,000.
Outputs#
alignments (Dictionary[string -> Dictionary[string, Alignment Record]])
: This structure holds a dictionary, with keys corresponding to each search database. The values corresponding to the dictionary name keys are dictionaries mapping from the file format to an output Alignment Record. Each Alignment Record contains the raw alignment data and a format field. The following is an example of accessing the fast-formatted alignment for the Uniref30_2302 database:
Example#
#!/bin/bash # Define protein sequence SEQUENCE="MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN" # Create JSON payload JSON='{ "sequence": "'"$SEQUENCE"'", "e_value": 0.0001, "iterations": 1, "databases": ["Uniref30_2302", "colabfold_envdb_202108", "PDB70_220313"], "search_type": "alphafold2", "output_alignment_formats": ["fasta", "a3m"], "max_msa_sequences": 1000 }' # Make request echo "Making request..." curl -s -X POST \ -H "Content-Type: application/json" \ -d "$JSON" \ http://localhost:8000/biology/colabfold/msa-search/predictimport requests import json if __name__ == "__main__": sequence = "MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN" # Replace with your sequence value of interest headers = { "content-type": "application/json" } data = { "sequence": sequence, "e_value": 0.0001, "iterations": 1, "databases": ["Uniref30_2302", "colabfold_envdb_202108", "PDB70_220313"], "search_type": "alphafold2", "output_alignment_formats" : ["fasta", "a3m"], "max_msa_sequences" : 1000 } print("Making request...") response = requests.post("http://localhost:8000/biology/colabfold/msa-search/predict",headers=headers, data=json.dumps(data)) ## Get the fasta formatted record record = response.json()["alignments"]["Uniref30_2302"]["a3m"] print(record["format"]) ## prints "a3m" print(record["alignment"]) ## prints the full A3M-formatted alignment file, which may be very large.
templates (dictionary)
: Always returned, but may be empty. This would contain structural templates. This field is maintained for compatibility with the OpenFold NIM API.metrics (dictionary)
: Always returned, but may be empty. This dictionary contains information about the response that may be useful for debugging and measuring performance.
Get the configured databases#
Endpoint path: /biology/colabfold/msa-search/config/msa-database-configs
Request type: get
Input parameters#
None: This endpoint takes no inputs.
Outputs#
configs (dictionary)
: Always returned, except on error. This is a nested dictionary that contains information about all of the configured MSA databases. This can be converted to YAML to see the original configuration file.
Example#
#!/bin/bash SEQUENCE="MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN" # Create JSON payload JSON_DATA='{ "sequence": "'"$SEQUENCE"'", "e_value": 0.0001, "iterations": 1, "databases": ["Uniref30_2302", "colabfold_envdb_202108", "PDB70_220313"], "search_type": "alphafold2", "output_alignment_formats": ["fasta", "a3m"], "max_msa_sequences": 1000 }' echo "Making request..." # Make the POST request # Note: This script requires jq to be installed for JSON processing response=$(curl -s -X POST \ -H "Content-Type: application/json" \ -d "$JSON_DATA" \ http://localhost:8000/biology/colabfold/msa-search/predict) # Extract and display the A3M format and alignment # Using jq to parse the JSON response echo "Format: $(echo $response | jq -r '.alignments.Uniref30_2302.a3m.format')" echo "Alignment:" echo "$response" | jq -r '.alignments.Uniref30_2302.a3m.alignment'import requests import json if __name__ == "__main__": headers = { "content-type": "application/json" } print("Making request...") response = requests.get("http://localhost:8000/biology/colabfold/msa-search/config/msa-database-configs",headers=headers) print(json.dumps(response.json(), indent=4))
Readiness check#
Endpoint path: /v1/health/ready
Input parameters#
None.
Outputs#
The output of the endpoint is a JSON response with a value that indicates the readiness of the microservice. When the NIM is ready, it returns the response 200
.
Example#
#!/bin/bash URL=${NIM_URL:-"http://localhost:8000/v1/health/ready"} curl -s -w "\nStatus code: %{http_code}\n" -H "Content-Type: application/json" $URLimport requests import os if __name__ == "__main__": url = os.environ.get("NIM_URL", "http://localhost:8000/v1/health/ready") headers = { "content-type": "application/json" } try: response = requests.get(url, headers=headers) print(f"NIM readiness check returned {response.status_code}") assert response.status_code == 200, f"Unexpected status code: {response.status_code}" except Exception as e: print(f"Health query failed: {e}")