Performance#

Version 1.1.0#

This version introduces significant performance improvements through TensorRT (TRT) optimization, while maintaining the same accuracy as version 1.0.0. The optimization focuses on:

Enhanced inference speed through TensorRT engine optimization
Improved memory utilization during inference
Maintained accuracy metrics compared to version 1.0.0

Recommended System Requirements#

Refer to Supported Hardware for requirements.

Performance Benchmarks#

The following are performance benchmarks comparing OpenFold2 versions 1.0.0 and 1.1.0.

Structure prediction performance is mostly dependent on GPU capability and memory. If you find structure prediction to be a bottleneck, consider using a higher memory device.
The time required for structure prediction grows with sequence length.
The time required for structure prediction grows with the total number of sequences in the alignments.
Below are benchmark times, measured for each input chain, in sequential execution on a single ‘NVIDIA H100 80GB HBM3’ device.

Performance Comparison (Time in seconds)#

protein chain id	7WBN_A	7ONG_A	7ZHT_A	7Y4I_A
sequence length	98	304	562	914
Version 1.0.0	5.31	10.2	19.6	41.2
Version 1.1.0	4.33	7.73	19.36	41.19
Speedup	1.23x	1.32x	1.01x	1.00x

Accuracy Metrics#

Metric	Version 1.0.0	Version 1.1.0
CADS	0.7443	0.6889
LDDT	0.8618	0.775
STRIDE	0.8672	0.8464
MP	4.2445	4.263

Version 1.0.0#

Performance will also vary significantly depending on:

The type of NVIDIA GPUs that are attached and available to the NIM
The CPU type
System RAM available

The following section details some performance expectations and provides general tips. These are not meant to be indicative of expected performance and performance on your system will vary from these values.

Recommended System Requirements#

Refer to Supported Hardware for requirements.

Performance Benchmarks#

The following are performance benchmarks for OpenFold2 version 1.0.0.

Structure prediction performance is mostly dependent on GPU capability and memory. If you find structure prediction to be a bottleneck, consider using a higher memory device.
The time required for structure prediction grows with sequence length.
The time required for structure prediction grows with the total number of sequences in the alignments.
Below are benchmark times, measured for each input chain, in sequential execution on a single ‘NVIDIA H100 80GB HBM3’ device.
The average value of LDDT-CA, for these protein chains, averaged over 2 runs, is 0.86.

protein chain id	7WBN_A	7ONG_A	7ZHT_A	7Y4I_A
sequence length	98	304	562	914
time*	5.31	10.2	19.6	41.2

*time to load parameter sets, compute features, and do forward pass, averaged over the 5 models [1, 2, 3, 4, 5]

Version 1.0.0 Configuration#

algo feature / parameter	setting
use_templates	False
selected_models	[1,2,3,4,5]
relax_prediction	False
deepspeed evoformer kernel	active
precision for deepspeed evoformer kernel	bf16
precision for the rest of the model	fp32