Getting Started#
Prerequisites#
Before you begin, ensure you have completed the setup described in the Prerequisites and Support Matrix, including:
Compatible hardware and software
NGC account and API key
Docker authentication with NGC
Python 3 with the
requestsmodule (for running examples)
Starting the NIM Container#
Ensure you have logged in to Docker and set your
NGC_API_KEYenvironment variable as described in the Prerequisites.Create a local cache directory for the NIM.
export LOCAL_NIM_CACHE=~/.cache/nim
mkdir -p "$LOCAL_NIM_CACHE"
sudo chmod 0777 -R "$LOCAL_NIM_CACHE"
Start the NIM container:
docker run -it --rm \
--runtime=nvidia \
--gpus all \
-e NGC_API_KEY \
-v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
-p 8000:8000 \
nvcr.io/nim/colabfold/msa-search:2
Note
The first time you start the container, it will download approximately 1.4 TB of database files to the local cache. This process may take several hours depending on your internet connection speed. Subsequent starts will use the cached data.
Confirm the service is ready to respond to inference requests:
curl http://localhost:8000/v1/health/ready
MSA Search Examples#
The following examples demonstrate how to use the MSA Search NIM to perform multiple sequence alignment on protein sequences.
For comprehensive descriptions of all available API parameters, see the API Reference. For information on assessing model performance, see Performance.
Python Client Example#
The following example shows how to search for similar sequences and generate a multiple sequence alignment.
Save the following Python example to a file named
nim_client.py:
#!/usr/bin/env python3
import requests
import json
url = "http://localhost:8000/biology/colabfold/msa-search/predict"
r = requests.post(
json={
"sequence": "SGSMKTAISLPDETFDRVSRRASELGMSRSEFFTKAAQR",
"e_value": 0.0001,
"iterations": 1,
"output_alignment_formats": ["a3m", "fasta"],
},
url=url,
)
print(r.text[:100], "...")
r = r.json()
print("Response keys:", list(r.keys()), "\n")
print("Alignments by dbs:\n")
for db, formats in r["alignments"].items():
print(" ", db)
for fmt, alignment_obj in formats.items():
aln = alignment_obj["alignment"]
print(" ", alignment_obj["format"], "lines:", len(aln.split("\n")))
from textwrap import indent
print(indent(aln[:300] + "...", " | "))
Execute the example:
chmod +x nim_client.py
./nim_client.py
The example will display the alignment results for each database, showing the format and a preview of the alignment content.
Shell Client Example#
You can also use curl to send requests directly:
curl http://localhost:8000/biology/colabfold/msa-search/predict \
-H "Content-Type: application/json" \
-d '{
"sequence": "SGSMKTAISLPDETFDRVSRRASELGMSRSEFFTKAAQR",
"e_value": 0.0001,
"iterations": 1,
"output_alignment_formats": ["a3m", "fasta"]
}' -o response.json && sed 's/\\n/\n/g' response.json | head -n 25 && echo "...trimmed..."
Paired MSA Search for Protein Complexes#
The MSA Search NIM also supports paired MSA search for protein complexes (multimers). Paired search finds homologous sequences for each chain and pairs them by species, preserving co-evolutionary signals essential for accurate complex structure prediction.
Python Client Example (Paired)#
Follow the example below to perform a paired MSA search for a two-chain protein complex.
Save the following Python example to a file named
nim_client_paired.py:
#!/usr/bin/env python3
import requests
import json
from pathlib import Path
url = "http://localhost:8000/biology/colabfold/msa-search/paired/predict"
# Hemoglobin alpha and beta chains (truncated for example)
seq1 = "VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVA"
seq2 = "MHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPYTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAF"
r = requests.post(
json={
"sequences": [seq1, seq2],
"e_value": 0.0001,
},
url=url,
)
print(r.text[:100], "...")
r = r.json()
Path("msa_outputs_paired").mkdir(exist_ok=True)
Path("msa_outputs_paired/response.json").write_text(json.dumps(r, indent=4, sort_keys=True))
print("Saved response to msa_outputs_paired/response.json\n")
print("Response keys:", list(r.keys()), "\n")
print("Alignments by chain:\n")
for chain_id, alignments in r["alignments_by_chain"].items():
print(f" Chain {chain_id}:")
for db, formats in alignments.items():
print(f" {db}")
for fmt, alignment_obj in formats.items():
aln = alignment_obj["alignment"]
fmt_name = alignment_obj["format"]
print(" ", fmt_name, "lines:", len(aln.split("\n")))
from textwrap import indent
print(indent(aln[:300] + "...", " | "))
f = Path(f"msa_outputs_paired/{chain_id}_{db}.{fmt_name}")
f.write_text(aln)
print(f" Saved to {f}")
Execute the example:
chmod +x nim_client_paired.py
./nim_client_paired.py
Shell Client Example (Paired)#
You can also use curl to send paired MSA requests:
curl http://localhost:8000/biology/colabfold/msa-search/paired/predict \
-H "Content-Type: application/json" \
-d '{
"sequences": [
"VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVA",
"MHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPYTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAF"
],
"e_value": 0.0001
}' -o response.json && sed 's/\\n/\n/g' response.json | head -n 25 && echo "...trimmed..."
The response contains alignments_by_chain with paired alignments for each chain (for example, “A”, “B”), where each chain has the same number of sequences paired by species.
Next Steps#
Performance - View benchmarking results and learn how to run your own performance tests
Configuration - Configure environment variables, GPU selection, and volume mounting
Optimization and Scaling - Learn about scaling strategies for production deployments
API Reference - Reference the comprehensive API documentation with all available parameters