Performance#

Version 1.2.0#

Version 1.2.0 adds support for user-supplied mmCIF templates. Refer to the Release Notes.

Note: The core inference pipeline, model weights, and TensorRT optimizations are the same as Version 1.1.0.

Performance and Accuracy#

  • For the configuration shown in the sections below for Version 1.1.0 and 1.0.0, internal benchmarks for Version 1.2.0 [data not shown] yield similar latency and accuracy metrics as reported below for Versions 1.1.0 and 1.0.0 on NVIDIA H100 80GB HBM3.

  • For a configuration where user-supplied mmCIF templates are used as a replacement for the the templates identified by input HHR file and loaded from local DB, internal benchmarks show comparable accuracy and ~1% runtime speed-ups, consistent with expectation.


Version 1.1.0#

This version introduces significant performance improvements through TensorRT (TRT) optimization, while maintaining the same accuracy as version 1.0.0. The optimization focuses on:

  • Enhanced inference speed through TensorRT engine optimization

  • Improved memory utilization during inference

  • Maintained accuracy metrics compared to version 1.0.0

Recommended System Requirements#

Performance Benchmarks#

The following are performance benchmarks comparing OpenFold2 versions 1.0.0 and 1.1.0.

  • Structure prediction performance is mostly dependent on GPU capability and memory. If you find structure prediction to be a bottleneck, consider using a higher memory device.

  • The time required for structure prediction grows with sequence length.

  • The time required for structure prediction grows with the total number of sequences in the alignments.

  • Below are benchmark times, measured for each input chain, in sequential execution on a single ‘NVIDIA H100 80GB HBM3’ device.

Performance Comparison (Time in seconds) on A100#

protein chain id

7WBN_A

7ONG_A

7ZHT_A

7Y4I_A

sequence length

98

304

562

914

Version 1.0.0

5.31

10.2

19.6

41.2

Version 1.1.0

4.33

7.73

19.36

41.19

Speedup

1.23x

1.32x

1.01x

1.00x

Accuracy Metrics on A100#

Metric

Version 1.0.0

Version 1.1.0

CADS

0.7443

0.7424

LDDT

0.8618

0.8614

STRIDE

0.8672

0.8782

MP

4.2445

4.2439

Version 1.0.0#

Performance will also vary significantly depending on:

  • The type of NVIDIA GPUs that are attached and available to the NIM

  • The CPU type

  • System RAM available

The following section details some performance expectations and provides general tips. These are not meant to be indicative of expected performance and performance on your system will vary from these values.

Recommended System Requirements#

Performance Benchmarks#

The following are performance benchmarks for OpenFold2 version 1.0.0.

  • Structure prediction performance is mostly dependent on GPU capability and memory. If you find structure prediction to be a bottleneck, consider using a higher memory device.

  • The time required for structure prediction grows with sequence length.

  • The time required for structure prediction grows with the total number of sequences in the alignments.

  • Below are benchmark times, measured for each input chain, in sequential execution on a single ‘NVIDIA H100 80GB HBM3’ device.

  • The average value of LDDT-CA, for these protein chains, averaged over 2 runs, is 0.86.

protein chain id

7WBN_A

7ONG_A

7ZHT_A

7Y4I_A

sequence length

98

304

562

914

time*

5.31

10.2

19.6

41.2

*time to load parameter sets, compute features, and do forward pass, averaged over the 5 models [1, 2, 3, 4, 5]

Version 1.0.0 Configuration#

algo feature / parameter

setting

use_templates

False

selected_models

[1,2,3,4,5]

relax_prediction

False

deepspeed evoformer kernel

active

precision for deepspeed evoformer kernel

bf16

precision for the rest of the model

fp32