Overview#

The MSA Search NIM supports GPU-accelerated Multiple Sequence Alignment (MSA) of a query amino acid sequence against a set of protein sequence databases. These databases are searched for similar sequences to the query and then the collection of sequences are aligned to establish similar regions even when the proteins have different lengths and motifs.

The outputs of the MSA process are used to inform structural prediction models such as AlphaFold2 and OpenFold. This tends to improve structural prediction accuracy because similar sequences often have similar structures. MSA Search is also used by evolutionary biologists to look for homology between protein sequences that may indicate a common evolutionary origin.

The MSA NIM implements two search styles.

  • The AlphaFold2 search type was first used in the AlphaFold2 paper in Nature[1] and performs a single-pass search per database.

  • The ColabFold search process in the MSA Search NIM was first introduced in ColabFold and implements a cascaded search of generated profiles, providing even higher sensitivity and generally better throughput.

Both methods utilize GPU-accelerated MMSeqs2 for improved accuracy and reduced latency. Combined with AlphaFold2 or OpenFold, the MSA Search NIM enables a sensitive and high-throughput protein structure prediction pipeline.

Note

A more detailed description of the ColabFold search process can be found in Mirdita et al. 2022 .[2]

Note

More details about the acceleration available in GPU-accelerated MMSeqs2 are available in Kallenborn et al. 2024.[3]

Advantages of NIMs#

In general, NIMs offer a simple and easy-to-deploy route for self-hosted AI applications. Two major advantages that NIMs offer for system administrators and developers are:

  • Increased productivity: NIMs enable developers to build generative AI applications quickly, in minutes rather than weeks, by providing a standardized way to add AI capabilities to their applications.

  • Simplified deployment: NIMs provide containers that can be easily deployed on various platforms, including clouds, data centers, or workstations, making it convenient for developers to test and deploy their applications.

In the context of protein design and drug development, these advantages can:

  • Accelerate lead optimization: Researchers can use NIMs to accelerate the lead optimization process by quickly generating and testing multiple molecular structures, enabling them to identify potential leads more efficiently.

  • Streamline data analysis: Researchers can use NIMs to analyze large datasets generated during the drug discovery process, such as molecular dynamics simulations or high-throughput screening data, to identify patterns and trends that can inform the development of new drugs.

  • Improve collaboration: NIMs can facilitate collaboration among researchers by providing a standardized platform for sharing and integrating AI models, enabling teams to work together more effectively and efficiently.

  • Enhance predictive modeling: Researchers can use NIMs to develop and deploy predictive models that can accurately predict the properties and behavior of molecules, such as their binding affinity or toxicity, enabling them to make more informed decisions during the drug development process.