Optimization#
This section details the options available for optimizing the MSA Search NIM. Note that achieving optimal performance depends on many factors and may require settings unique to your individual deployment.
Note
For most users, the default settings of the NIM will provide good performance that balances the throughput, latency, resource utilization, and complexity of using the NIM. We recommend only changing these options after consulting with an expert about your specific use case and any unique performance requirements it may have.
Automatic Profile Selection#
The MSA Search NIM is designed to automatically select the most suitable profile for the detected hardware from the list of available profiles.
Deploying the NIM on a multi-GPU System#
The MSA Search NIM is designed to run with one or more NVIDIA GPUs. When increasing the number of GPUs allocated to the NIM, it is recommended to also increase the allocated number of CPU cores and RAM. As a rule of thumb, for each additional GPU allocated, one should also allocated an additional 12 CPU cores and 32 GB of additional system RAM.
Using the MMSeqs2 GPU Server with Compatible Databases#
MMSeqs2 supports both GPU-accelerated MSA and the GPU Server. This client-server implementation dramatically reduces the latency associated with MSA queries by intelligently storing and loading the target sequence databases in GPU memory.
Using a Compatible Database#
Note
Currently, databases that support expandable search (including the default ones used by the NIM) do not offer a performance benefit when using the GPU server. These databases will run at the same performance as when they are not using the GPU server. For maximum performance, use the GPU Server with a set of databases created using the MMSeqs2 makepaddedseqdb
command.
For a database to be compatible with the GPU Server, it must be processed using the MMSeqs2 makepaddedseqdb
command and must be indexed accordingly. Currently, only seq
databases are supported, which means cascaded searches do not support the GPU Server.
When using compatible databases, each GPU Server instance serves a single database and requires its own GPU. A dedicated GPU is also reserved for processing queries when the number of assigned GPUs is less than the number of databases. This means for optimal performance, the following guideline should be observed:
Allocate at least one GPU per database when using the GPU server.
Note
The NIM will still run with fewer than one allocated GPU per database, but without the optimal performance of the GPU Server.
NIM Behavior without the GPU Server#
Note
The GPU Server is disabled by default to ensure optimal performance with databases that support cascaded search.
When running with the GPU Server disabled, the NIM will intelligently assign any available GPU for the next database search. This means that the NIM’s performance can scale automatically with the number of GPU’s on the system. As a guideline:
When your are not using the GPU Server, performance should increase with each additional GPU allocated to the NIM.
Note
Note: The NIM’s performance is also impacted by the number of CPU cores and the amount of system RAM. When allocating additional GPUs, consider allocating at least 12 additional CPU cores and 32 GB of additional RAM to the NIM to ensure good performance scaling.
Enabling the GPU Server#
The GPU Server is disabled by default to ensure compatibility with databases supporting cascaded search. The GPU Server can be enabled using the NIM_DISABLE_GPU_SERVER=False
environment variable setting. The following is an example of starting the NIM with the GPU Server enabled using Docker:
export LOCAL_NIM_CACHE=~/.cache/nim
export NGC_API_KEY=<Your NGC API Key>
docker run --rm --name msa-search --runtime=nvidia \
-e NGC_API_KEY \
-e NIM_DISABLE_GPU_SERVER=False \
-v $LOCAL_NIM_CACHE:/opt/nim/.cache \
-p 8000:8000 \
nvcr.io/nim/colabfold/msa-search:1.0.0
Support for the GPU Server and NIM behavior may change in future versions of the NIM.
Querying the NIM Repeatedly or with Batches of Inputs#
In computational drug discovery, it is common to need to analyze hundreds, thousands, or even millions of sequences to find candidates with the desired properties. This section details some useful patterns for users that want to analyze more than one input sequence.
Note
Note: The MSA Search NIM has been optimized for a balance of throughput and latency, and the performance of repeated queries may not be in line with published benchmarks due to the complexity of scheduling such workloads in the NIM. Factors such as the underlying hardware and software stack, number of concurrent users, and system load may impact the latency and throughput of the NIM.
When running repeated queries, the NIM will intelligently handle incoming requests from multiple users in parallel. If you want to run multiple requests in parallel, refer to Client-side Batching of Repeated Queries
Running Repeated Queries Against the NIM Serially#
Repeated queries against the NIM can be submitted as subsequent requests. For example, requests from a list of sequences
can be submitted using a for
loop. Each request is blocking, which means a maximum of one request from a single
submitter will run at a time when calling the NIM in this manner. The following is an example of submitting multiple requests to the NIM:
import requests
import json
def main():
url = "http://localhost:8000/biology/colabfold/msa-search/predict"
proteins = [
{
"name": "Green Fluorescent Protein (GFP)",
"sequence": (
"MSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTFCYGD"
"QIQEQYKGIPLDGDQVQAVNGHEFEIEGEGEGRPYEGTQTAQ"
)
},
{
"name": "Tumor Protein p53",
"sequence": (
"MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGPDEAPRMPEA"
"APPVAPAPAAPTPAAPAPAPSWPLSSSVPSQKTYQGSYGFRLGFLHSGTAKSVTCTYSPALNKMFCQLA"
"KTCPVQLWVDSTPPPGTRVRAMAIYKKSQHMTEVVRRCPHHERCSDSDGLAPPQHLIRVEGNLRVEYLD"
"DPSKYLQW"
)
},
{
"name": "Lactose Operon Repressor (LacI)",
"sequence": (
"MKPVTLYDVAEYAGVSYQTVSRVVNQASHVSAKTREKVEAAMAELNYIPNRVAQQLAGKQNLKDGDPTR"
"ADKKSIEYSASVSRQQSYSIKKNLIDQFEAQKPSLTGMSADSQIGQVTKDAQAMIKAIGVNLLQFPRQ"
"SPGDLEQGVNLTPCTLNTVTQTSLSVRGDKLIAEIGDKVAASEN"
)
},
{
"name": "Bovine Serum Albumin (BSA)",
"sequence": (
"MKWVTFISLLFLFSSAYSRGVFRRDTHKSEIAHRFKDLGEENFKALVLIAFAQYLQQCPFDEHVKLVNE"
"GTKPVETVTKLVTDLTKVHTECCHGDLLECADDRADLAKYICDNQDTISSKLKECCDKPLLEKSHCIAE"
"VFCKYKEHKEMPFPKCCETSLVNRRPCFSALTPDETYVPKAFDEKLFTFHADICTLPDTEKQIKKQTAL"
"VELVKHKPKATKEQLKAVMDDFAAFVEKCCKADDKETCFAEEGKKLVAASQAALGL"
)
}
]
responses = []
# For each protein, submit the request and store the response
for protein in proteins:
data = {
"sequence": protein["sequence"],
"databases": ["Uniref30_2302", "PDB70_220313", "colabfold_envdb_202108"],
"e_value": 0.0001,
"iterations": 2
}
response = None # Initialize response
response_data = None # Initialize response_data
try:
# Use the 'json' parameter instead of 'data'
response = requests.post(url, json=data)
# Attempt to parse the JSON response
response_data = response.json()
if response.ok:
print(f"Request for {protein['name']} succeeded.")
else:
print(f"Request for {protein['name']} failed: {response.status_code} {response.text}")
except requests.exceptions.RequestException as req_err:
# Catch any request-related errors
print(f"Request for {protein['name']} failed: {req_err}")
except json.JSONDecodeError:
# Catch JSON parsing errors
print(f"Response from {protein['name']} could not be decoded as JSON.")
# Store the response along with the protein name and sequence
responses.append({
"protein": protein["name"],
"sequence": protein["sequence"],
"response": response_data,
"status_code": response.status_code if response else None,
"text": response.text if response else None
})
# Print the responses
for res in responses:
print(f"Protein: {res['protein']}")
print(f"Status Code: {res['status_code']}")
if res['response']:
print("Response Data:", json.dumps(res['response'], indent=2))
else:
print("No response data available.")
print("-" * 40)
if __name__ == "__main__":
main()
Using Client-Side Batching to Improve Throughput#
For users who want to submit multiple requests, client-side batching can help improve throughput when working with the MSA NIM. Client-side batching involves opening multiple request connections to the NIM simultaneously. The NIM is able to handle a sizeable number of these concurrent requests and is designed to be stable at least until the number of simultaneous requests is equal to the number of GPUs allocated to the NIM. By performing client-side batching, the total throughput of the NIM can scale with little impact to the latency of individual requests. The following is an example of running the NIM using client-side batching.
import requests
import json
from concurrent.futures import ThreadPoolExecutor
url = "http://localhost:8000/biology/colabfold/msa-search/predict"
proteins = [
{
"name": "Green Fluorescent Protein (GFP)",
"sequence": (
"MSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTFCYGD"
"QIQEQYKGIPLDGDQVQAVNGHEFEIEGEGEGRPYEGTQTAQ"
),
},
{
"name": "Tumor Protein p53",
"sequence": (
"MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGPDEAPRMPEA"
"APPVAPAPAAPTPAAPAPAPSWPLSSSVPSQKTYQGSYGFRLGFLHSGTAKSVTCTYSPALNKMFCQLA"
"KTCPVQLWVDSTPPPGTRVRAMAIYKKSQHMTEVVRRCPHHERCSDSDGLAPPQHLIRVEGNLRVEYLD"
"DPSKYLQW"
),
},
{
"name": "Lactose Operon Repressor (LacI)",
"sequence": (
"MKPVTLYDVAEYAGVSYQTVSRVVNQASHVSAKTREKVEAAMAELNYIPNRVAQQLAGKQNLKDGDPTR"
"ADKKSIEYSASVSRQQSYSIKKNLIDQFEAQKPSLTGMSADSQIGQVTKDAQAMIKAIGVNLLQFPRQ"
"SPGDLEQGVNLTPCTLNTVTQTSLSVRGDKLIAEIGDKVAASEN"
),
},
{
"name": "Bovine Serum Albumin (BSA)",
"sequence": (
"MKWVTFISLLFLFSSAYSRGVFRRDTHKSEIAHRFKDLGEENFKALVLIAFAQYLQQCPFDEHVKLVNE"
"VTKPVETVTKLVTDLTKVHTECCHGDLLECADDRADLAKYICDNQDTISSKLKECCDKPLLEKSHCIAE"
"VFCKYKEHKEMPFPKCCETSLVNRRPCFSALTPDETYVPKAFDEKLFTFHADICTLPDTEKQIKKQTAL"
"VELVKHKPKATKEQLKAVMDDFAAFVEKCCKADDKETCFAEEGKKLVAASQAALGL"
),
},
]
headers = {
"Content-Type": "application/json"
}
def send_request(protein):
data = {
"sequence": protein["sequence"],
"databases": ["Uniref30_2302", "PDB70_220313", "colabfold_envdb_202108"],
"e_value": 0.0001,
"iterations": 2,
}
try:
response = requests.post(url, headers=headers, data=json.dumps(data))
if response.ok:
print(f"Request for {protein['name']} succeeded.")
else:
print(f"Request for {protein['name']} failed: {response.status_code} {response.text}")
return {
"protein": protein["name"],
"sequence": protein["sequence"],
"response": response.json() if response.ok else None,
"status_code": response.status_code,
"text": response.text,
}
except Exception as e:
print(f"Request for {protein['name']} encountered an error: {e}")
return {
"protein": protein["name"],
"sequence": protein["sequence"],
"response": None,
"status_code": None,
"text": str(e),
}
def main():
responses = []
## Create a ThreadPoolExecutor with 4 workers to
## open four concurrent requests to the NIM.
with ThreadPoolExecutor(max_workers=4) as executor:
## Using the built-in queueing system of the executor and the NIM,
## Submit all the requests to the executor.
## A Future object is returned.
futures = [executor.submit(send_request, protein) for protein in proteins]
for future in futures:
## This call is blocking, meaning any unfinished futures will cause the loop to wait.
result = future.result()
responses.append(result)
for res in responses:
print(f"Protein: {res['protein']}")
print(f"Status Code: {res['status_code']}")
if res['response']:
print("Response Data:", res['response'])
else:
print("No response data available.")
print("-" * 40)
if __name__ == "__main__":
main()
Client-side batching provides a flexible way of controlling the concurrency of the NIM from the client side. While there are tradeoffs in performance, for some users this may offer increase in the throughput of processing their requests.
Below are some examples of speedups from client-side batching in a small test:
Dataset |
Client-side Batching |
Compute Environment |
Runtime |
Speedup Relative to 1-way |
---|---|---|---|---|
GFP, p53, LacI, and BSA (4 sequences) |
1-way (no batching) |
6x NVIDIA A100 80GB |
269 seconds |
N/A |
GFP, p53, LacI, and BSA (4 sequences) |
2-way |
6x NVIDIA A100 80GB |
154 seconds |
1.74X |
GFP, p53, LacI, and BSA (4 sequences) |
4-way |
6x NVIDIA A100 80GB |
92 seconds |
2.9X |
Note: The speedup associated with multi-way batching may be impacted by hardware factors such as the number of CPU cores and the speed of the SSD holding the databases. Users may not experience linear scaling in throughput with the number of GPUs.