AlphaFold2 NIM Performance#
The AlphaFold2 NIM has been tuned to improve performance. However, the Multiple Sequence Alignment (MSA) and structure prediction processes are still computationally expensive. Performance will also vary significantly depending on:
- Which NVIDIA GPUs are attached and available to the NIM 
- How many CPU cores are available to the NIM 
- The speed of the Solid State Drive (SSD) available to the NIM 
- The parameters used to configure the NIM at runtime. 
Below, we detail some performance expectations and provide general tips. These are not meant to be indicative of expected performance and performance on your system will vary from these values.
Recommended System Requirements#
- At least 1x NVIDIA A100 GPU 
- At least 1.3 Terabytes (1300 Gigabytes) of available SSD storage (preferably NVMe gen4 or later with a speed of >3,500MB/s) 
- At least 24 CPU cores (physical cores) 
- An internet connection with a download speed of at least 200 Mbps to download the AlphaFold2 model and databases. 
General Performance Guidelines#
In general, users should expect the following performance:
- A100 and >=64 CPU cores and >=180GB of RAM (Best): sequences up to 3,000 amino acids should succeed, but NIM may take several hours to run. Fastest MSA configuration. 
- A100 and >=32 CPU cores and >=128GB of RAM (Good): similar speed to Best configuration, but with more limited MSA performance. Use fewer runners per MSA and fewer cores per MSA (see the - Sequence to MSAsection).
- L40S and >=32 CPU cores and >=128GB of RAM (Good): reduced structural prediction speed compared to A100. Similar MSA performance compared to using the same number of cores with A100 GPU. 
- NVIDIA CUDA GPU with >=32GB of VRAM and >=12 CPU cores and >=64GB of RAM (Minimum, Poor Experience): Short sequences will run on this configuration, but sequences longer than 600 amino acids may fail in either MSA or structural prediction. MSA will be very slow, and the number of MSA runners should be set to 1 (see the - Sequence to MSAsection).
Sequence to MSA#
- MSA performance is largely dependent on disk speed and CPU cores. Make sure your NIM cache is on a fast SSD, and that you have allocated at least 24 cores to the NIM. See the “Configuring the NIM at Runtime” section for details. 
- Use the - mmseqs2algorithm for Multiple Sequence Alignment. This algorithm produces similar results but is significantly faster (particularly with longer sequence lengths). You can use- mmseqs2by passing- algorithm=mmseqs2in your request.
- Increase the threads used per MSA process. This can be done using the - NIM_PARALLEL_THREADS_PER_MSAenvironment variable. See the “Configuring the NIM at Runtime” section for details.
- Increase the number of concurrent MSA processes. This can be done using the - NIM_PARALLEL_MSA_RUNNERSenvironment variable. See the “Configuring the NIM at Runtime” section for details. Note: we recommend a maximum of three parallel MSA runners.
MSA runtime for short sequences can vary dramatically, but generally runtime scales linearly with sequence length. For sequences with many hits in the database (that is, sequences with motifs observed in many other proteins), the MSA runtime can grow significantly regardless of sequencing size.
In general, expect MSA to take between one minute and six hours per query. Note that these are not exact numbers and that performance will vary significantly between machines and input sequences.
MSA to Structure and Sequence to Structure#
- Structure prediction performance is mostly dependent on GPU capability. If you find structure prediction to be a bottleneck, consider utilizing a more powerful GPU (for example, switching from A30 to A100). 
The time required for structure prediction grows exponentially with sequence length. As a rule of thumb, a sequence that is twice as long will take at least four times as long during the structural prediction phase.
In general, expect the structural prediction component to take between four minutes and 24 hours. Note that these are not exact numbers and that performance will vary significantly between machines and input sequences.