API Reference#

This documentation contains the API reference for the MSA Search NIM.

OpenAPI Specification#

You can download the complete OpenAPI Specification for the MSA Search NIM here.

Predict a Multiple Sequence Alignment Search#

Endpoint path: /biology/colabfold/msa-search/predict

Request type: post

Input parameters#

sequence (string): Required. An input string representing a protein sequence and composed of valid IUPAC amino acid single-letter codes.
databases (list[strings]): Optional (default: all available databases). Number of tokens to be generated.
e_value (float): Optional. Sequences with an e-value less than this are not included in the MSA.
iterations (int): Optional. The number of search iterations. Running more iterations produces more sensitive results.
output_alignment_formats (list[strings]): A list of output alignment formats, which can contain any of a3m or fasta.
max_msa_sequences (int): Optional. The maximum number of sequences included in the output MSA of each database. Note: This parameter is overridden by the NIM_GLOBAL_MAX_MSA_SEQUENCES environment variable and the limit to the number of NIM sequences is 10,000.

Outputs#

alignments (Dictionary[string -> Dictionary[string, Alignment Record]]): This structure holds a dictionary, with keys corresponding to each search database. The values corresponding to the dictionary name keys are dictionaries mapping from the file format to an output Alignment Record. Each Alignment Record contains the raw alignment data and a format field. The following is an example of accessing the fast-formatted alignment for the Uniref30_2302 database:

Example#

Curl

#!/bin/bash

# Define protein sequence
SEQUENCE="MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN"

# Create JSON payload
JSON='{
"sequence": "'"$SEQUENCE"'",
"e_value": 0.0001,
"iterations": 1,
"databases": ["Uniref30_2302", "colabfold_envdb_202108", "PDB70_220313"],
"search_type": "alphafold2",
"output_alignment_formats": ["fasta", "a3m"],
"max_msa_sequences": 1000
}'

# Make request
echo "Making request..."
curl -s -X POST \
-H "Content-Type: application/json" \
-d "$JSON" \
http://localhost:8000/biology/colabfold/msa-search/predict

Python

import requests
import json


if __name__ == "__main__":
    sequence = "MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN"  # Replace with your sequence value of interest
    headers = {
    "content-type": "application/json"
    }
    data = {
    "sequence": sequence,
    "e_value": 0.0001,
    "iterations": 1,
    "databases": ["Uniref30_2302", "colabfold_envdb_202108", "PDB70_220313"],
    "search_type": "alphafold2",
    "output_alignment_formats" : ["fasta", "a3m"],
    "max_msa_sequences" : 1000
    }
    print("Making request...")
    response = requests.post("http://localhost:8000/biology/colabfold/msa-search/predict",headers=headers, data=json.dumps(data))
    ## Get the fasta formatted record
    record = response.json()["alignments"]["Uniref30_2302"]["a3m"]
    print(record["format"]) ## prints "a3m"
    print(record["alignment"]) ## prints the full A3M-formatted alignment file, which may be very large.

templates (dictionary): Always returned, but may be empty. This would contain structural templates. This field is maintained for compatibility with the OpenFold NIM API.
metrics (dictionary): Always returned, but may be empty. This dictionary contains information about the response that may be useful for debugging and measuring performance.

Get the configured databases#

Endpoint path: /biology/colabfold/msa-search/config/msa-database-configs

Request type: get

Input parameters#

None: This endpoint takes no inputs.

Outputs#

configs (dictionary): Always returned, except on error. This is a nested dictionary that contains information about all of the configured MSA databases. This can be converted to YAML to see the original configuration file.

Example#

Curl

#!/bin/bash

SEQUENCE="MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN"

# Create JSON payload
JSON_DATA='{
"sequence": "'"$SEQUENCE"'",
"e_value": 0.0001,
"iterations": 1,
"databases": ["Uniref30_2302", "colabfold_envdb_202108", "PDB70_220313"],
"search_type": "alphafold2",
"output_alignment_formats": ["fasta", "a3m"],
"max_msa_sequences": 1000
}'

echo "Making request..."

# Make the POST request
# Note: This script requires jq to be installed for JSON processing
response=$(curl -s -X POST \
-H "Content-Type: application/json" \
-d "$JSON_DATA" \
http://localhost:8000/biology/colabfold/msa-search/predict)

# Extract and display the A3M format and alignment
# Using jq to parse the JSON response
echo "Format: $(echo $response | jq -r '.alignments.Uniref30_2302.a3m.format')"
echo "Alignment:"
echo "$response" | jq -r '.alignments.Uniref30_2302.a3m.alignment'

Python

import requests
import json


if __name__ == "__main__":
    headers = {
        "content-type": "application/json"
        }
    print("Making request...")
    response = requests.get("http://localhost:8000/biology/colabfold/msa-search/config/msa-database-configs",headers=headers)
    print(json.dumps(response.json(), indent=4))

Readiness check#

Endpoint path: /v1/health/ready

Input parameters#

None.

Outputs#

The output of the endpoint is a JSON response with a value that indicates the readiness of the microservice. When the NIM is ready, it returns the response 200.

Example#

Curl

#!/bin/bash
URL=${NIM_URL:-"http://localhost:8000/v1/health/ready"}
curl -s -w "\nStatus code: %{http_code}\n" -H "Content-Type: application/json" $URL

Python

import requests
import os

if __name__ == "__main__":
    url = os.environ.get("NIM_URL", "http://localhost:8000/v1/health/ready")
    headers = {
        "content-type": "application/json"
    }
    try:
        response = requests.get(url, headers=headers)
        print(f"NIM readiness check returned {response.status_code}")
        assert response.status_code == 200, f"Unexpected status code: {response.status_code}"
    except Exception as e:
        print(f"Health query failed: {e}")