Performance#
Version 2.0.0#
Compared to Version 1.0.0, Version 2.0.0
Removes HHR-based template processing and
Adds TensorRT support for a wide set of GPU architectures. Refer to the Release Notes.
Adds support for ‘explicit_templates’. Refer to the Migrate from HHR-based Templates to Explicit mmCIF Templates.
Performance and Accuracy#
Faster startup: Reduced initialization time due to removal of large template database loading (~300GB)
Enhanced GPU support: TensorRT optimization for L40S, B200, and RTX 6000 Ada Generation
Reduced storage footprint: Container and cache requirements significantly reduced (from 380GB to 80GB total)
Performance Comparison (Time in seconds) on H100#
Below are benchmark times, measured for each input chain, in sequential execution on a single NVIDIA H100 80GB HBM3 device.
protein chain id |
metric |
7WBN_A |
7ONG_A |
7ZHT_A |
7Y4I_A |
---|---|---|---|---|---|
sequence length |
sequence length |
98 |
304 |
562 |
914 |
Version 1.0.0 |
pipeline_time |
26.6 |
50.9 |
98.2 |
205.9 |
Version 1.0.0 |
pipeline_time_per_model |
5.3 |
10.2 |
19.6 |
41.2 |
Version 2.0.0 |
pipeline_time |
4.92 |
19.4 |
44.9 |
113 |
Version 2.0.0 |
pipeline_time_per_model |
0.984 |
3.90 |
8.99 |
22.6 |
Version 2.0.0 |
speed-up vs 1.0.0 |
5.4x |
2.6x |
2.2x |
1.8x |
*pipeline_time is defined as the time to load parameter sets, compute features, and the sum of the time to complete the forward pass for each model in [model_1, model_2, model_3, model_4, model_5]
**pipeline_time divided by 5
Accuracy Metrics on H100#
Metric |
Version 1.0.0 |
Version 2.0.0 |
---|---|---|
CADS |
0.744 |
0.740 |
LDDT |
0.861 |
0.860 |
STRIDE |
4.24 |
4.24 |
MP |
0.882 |
0.878 |
Version 1.0.0#
Performance will also vary significantly depending on:
The type of NVIDIA GPUs that are attached and available to the NIM
The CPU type
System RAM available
The following section details some performance expectations and provides general tips. These are not meant to be indicative of expected performance and performance on your system will vary from these values.
Recommended System Requirements#
Refer to Supported Hardware for requirements.
Performance Benchmarks#
The following are performance benchmarks for OpenFold2 version 1.0.0.
Structure prediction performance is mostly dependent on GPU capability and memory. If you find structure prediction to be a bottleneck, consider using a higher memory device.
The time required for structure prediction grows with sequence length.
The time required for structure prediction grows with the total number of sequences in the alignments.
Below are benchmark times, measured for each input chain, in sequential execution on a single NVIDIA H100 80GB HBM3 device.
The average value of LDDT-CA, for these protein chains, averaged over 2 runs, is 0.86.
protein chain id |
7WBN_A |
7ONG_A |
7ZHT_A |
7Y4I_A |
---|---|---|---|---|
sequence length |
98 |
304 |
562 |
914 |
pipeline_time* |
26.6 |
50.9 |
98.2 |
205.9 |
pipeline_time_per_model** |
5.3 |
10.2 |
19.6 |
41.2 |
*pipeline_time is defined as the time to load parameter sets, compute features, and the sum of the time to complete the forward pass for each model in [model_1, model_2, model_3, model_4, model_5]
**pipeline_time divided by 5
Version 1.0.0 Configuration#
algo feature / parameter |
setting |
---|---|
use_templates |
False |
selected_models |
[1,2,3,4,5] |
relax_prediction |
False |
deepspeed evoformer kernel |
active |
precision for deepspeed evoformer kernel |
bf16 |
precision for the rest of the model |
fp32 |