Optimization and Scaling#

Starting with version 2.0.0, MSA NIM configuration for optimal performance has been drastically simplified. The GPU Server is now enabled by default with pre-indexed ColabFold databases, eliminating the need for manual database preparation and complex configuration tuning. For most users, the default settings will provide excellent performance out of the box.

This section focuses on scaling strategies for production deployments that require high throughput or need to handle variable workloads.

Scaling and Load Balancing#

With version 2.0.0, the MSA Search NIM no longer internally manages GPU allocation across multiple GPUs in a node. Instead, the NIM is designed to work with external load-balancers and orchestration platforms that can scale NIM instances up and down based on demand. Examples include NVIDIA Cloud Functions, and popular container orchestration solutions and similar platforms that support scalable, distributed deployments of containerized applications.

The recommended approach is to allocate the minimum number of GPUs to each individual NIM instance and rely on horizontal scaling through orchestration. This allows for:

  • Better resource utilization across your infrastructure

  • Dynamic scaling based on actual workload demands

  • Improved fault tolerance through instance redundancy

  • Simplified deployment and maintenance