Getting Started#

Prerequisites#

Before you begin, ensure you have completed the setup described in the Prerequisites and Support Matrix, including:

  • Compatible hardware and software

  • NGC account and API key

  • Docker authentication with NGC

  • Python 3 with the requests module (for running examples)

Starting the NIM Container#

  1. Ensure you have logged in to Docker and set your NGC_API_KEY environment variable as described in the Prerequisites.

  2. Create a local cache directory for the NIM.

export LOCAL_NIM_CACHE=~/.cache/nim
mkdir -p "$LOCAL_NIM_CACHE"
sudo chmod 0777 -R "$LOCAL_NIM_CACHE"
  1. Start the NIM container:

docker run -it --rm \
  --runtime=nvidia \
  --gpus all \
  -e NGC_API_KEY \
  -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
  -p 8000:8000 \
  nvcr.io/nim/colabfold/msa-search:2

Note

The first time you start the container, it will download approximately 1.4 TB of database files to the local cache. This process may take several hours depending on your internet connection speed. Subsequent starts will use the cached data.

  1. Confirm the service is ready to respond to inference requests:

curl http://localhost:8000/v1/health/ready

MSA Search Examples#

The following examples demonstrate how to use the MSA Search NIM to perform multiple sequence alignment on protein sequences.

For comprehensive descriptions of all available API parameters, see the API Reference. For information on assessing model performance, see Performance.

Python Client Example#

The following example shows how to search for similar sequences and generate a multiple sequence alignment.

  1. Save the following Python example to a file named nim_client.py:

#!/usr/bin/env python3

import requests
import json

url = "http://localhost:8000/biology/colabfold/msa-search/predict"

r = requests.post(
    json={
        "sequence": "SGSMKTAISLPDETFDRVSRRASELGMSRSEFFTKAAQR",
        "e_value": 0.0001,
        "iterations": 1,
        "output_alignment_formats": ["a3m", "fasta"],
    },
    url=url,
)

print(r.text[:100], "...")
r = r.json()
print("Response keys:", list(r.keys()), "\n")
print("Alignments by dbs:\n")
for db, formats in r["alignments"].items():
    print(" ", db)
    for fmt, alignment_obj in formats.items():
        aln = alignment_obj["alignment"]
        print("   ", alignment_obj["format"], "lines:", len(aln.split("\n")))
        from textwrap import indent
        print(indent(aln[:300] + "...", "        | "))
  1. Execute the example:

chmod +x nim_client.py
./nim_client.py

The example will display the alignment results for each database, showing the format and a preview of the alignment content.

Shell Client Example#

You can also use curl to send requests directly:

curl http://localhost:8000/biology/colabfold/msa-search/predict \
  -H "Content-Type: application/json" \
  -d '{
    "sequence": "SGSMKTAISLPDETFDRVSRRASELGMSRSEFFTKAAQR",
    "e_value": 0.0001,
    "iterations": 1,
    "output_alignment_formats": ["a3m", "fasta"]
  }' -o response.json && sed 's/\\n/\n/g' response.json | head -n 25 && echo "...trimmed..."

Paired MSA Search for Protein Complexes#

The MSA Search NIM also supports paired MSA search for protein complexes (multimers). Paired search finds homologous sequences for each chain and pairs them by species, preserving co-evolutionary signals essential for accurate complex structure prediction.

Python Client Example (Paired)#

Follow the example below to perform a paired MSA search for a two-chain protein complex.

  1. Save the following Python example to a file named nim_client_paired.py:

#!/usr/bin/env python3

import requests
import json
from pathlib import Path

url = "http://localhost:8000/biology/colabfold/msa-search/paired/predict"

# Hemoglobin alpha and beta chains (truncated for example)
seq1 = "VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVA"
seq2 = "MHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPYTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAF"

r = requests.post(
    json={
        "sequences": [seq1, seq2],
        "e_value": 0.0001,
    },
    url=url,
)

print(r.text[:100], "...")
r = r.json()

Path("msa_outputs_paired").mkdir(exist_ok=True)
Path("msa_outputs_paired/response.json").write_text(json.dumps(r, indent=4, sort_keys=True))
print("Saved response to msa_outputs_paired/response.json\n")

print("Response keys:", list(r.keys()), "\n")
print("Alignments by chain:\n")

for chain_id, alignments in r["alignments_by_chain"].items():
    print(f" Chain {chain_id}:")
    for db, formats in alignments.items():
        print(f"   {db}")
        for fmt, alignment_obj in formats.items():
            aln = alignment_obj["alignment"]
            fmt_name = alignment_obj["format"]
            print("     ", fmt_name, "lines:", len(aln.split("\n")))
            from textwrap import indent
            print(indent(aln[:300] + "...", "          | "))

            f = Path(f"msa_outputs_paired/{chain_id}_{db}.{fmt_name}")
            f.write_text(aln)
            print(f"          Saved to {f}")
  1. Execute the example:

chmod +x nim_client_paired.py
./nim_client_paired.py

Shell Client Example (Paired)#

You can also use curl to send paired MSA requests:

curl http://localhost:8000/biology/colabfold/msa-search/paired/predict \
  -H "Content-Type: application/json" \
  -d '{
    "sequences": [
        "VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVA",
        "MHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPYTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAF"
    ],
    "e_value": 0.0001
  }' -o response.json && sed 's/\\n/\n/g' response.json | head -n 25 && echo "...trimmed..."

The response contains alignments_by_chain with paired alignments for each chain (for example, “A”, “B”), where each chain has the same number of sequences paired by species.

Next Steps#

  • Performance - View benchmarking results and learn how to run your own performance tests

  • Configuration - Configure environment variables, GPU selection, and volume mounting

  • Optimization and Scaling - Learn about scaling strategies for production deployments

  • API Reference - Reference the comprehensive API documentation with all available parameters