Performance#

Version 2.0.0#

Compared to Version 1.0.0, Version 2.0.0

Performance and Accuracy#

  • Faster startup: Reduced initialization time due to removal of large template database loading (~300GB)

  • Enhanced GPU support: TensorRT optimization for L40S, B200, and RTX 6000 Ada Generation

  • Reduced storage footprint: Container and cache requirements significantly reduced (from 380GB to 80GB total)

Performance Comparison (Time in seconds) on H100#

  • Below are benchmark times, measured for each input chain, in sequential execution on a single NVIDIA H100 80GB HBM3 device.

protein chain id

metric

7WBN_A

7ONG_A

7ZHT_A

7Y4I_A

sequence length

sequence length

98

304

562

914

Version 1.0.0

pipeline_time

26.6

50.9

98.2

205.9

Version 1.0.0

pipeline_time_per_model

5.3

10.2

19.6

41.2

Version 2.0.0

pipeline_time

4.92

19.4

44.9

113

Version 2.0.0

pipeline_time_per_model

0.984

3.90

8.99

22.6

Version 2.0.0

speed-up vs 1.0.0

5.4x

2.6x

2.2x

1.8x

*pipeline_time is defined as the time to load parameter sets, compute features, and the sum of the time to complete the forward pass for each model in [model_1, model_2, model_3, model_4, model_5]

**pipeline_time divided by 5

Accuracy Metrics on H100#

Metric

Version 1.0.0

Version 2.0.0

CADS

0.744

0.740

LDDT

0.861

0.860

STRIDE

4.24

4.24

MP

0.882

0.878

Version 1.0.0#

Performance will also vary significantly depending on:

  • The type of NVIDIA GPUs that are attached and available to the NIM

  • The CPU type

  • System RAM available

The following section details some performance expectations and provides general tips. These are not meant to be indicative of expected performance and performance on your system will vary from these values.

Performance Benchmarks#

The following are performance benchmarks for OpenFold2 version 1.0.0.

  • Structure prediction performance is mostly dependent on GPU capability and memory. If you find structure prediction to be a bottleneck, consider using a higher memory device.

  • The time required for structure prediction grows with sequence length.

  • The time required for structure prediction grows with the total number of sequences in the alignments.

  • Below are benchmark times, measured for each input chain, in sequential execution on a single NVIDIA H100 80GB HBM3 device.

  • The average value of LDDT-CA, for these protein chains, averaged over 2 runs, is 0.86.

protein chain id

7WBN_A

7ONG_A

7ZHT_A

7Y4I_A

sequence length

98

304

562

914

pipeline_time*

26.6

50.9

98.2

205.9

pipeline_time_per_model**

5.3

10.2

19.6

41.2

*pipeline_time is defined as the time to load parameter sets, compute features, and the sum of the time to complete the forward pass for each model in [model_1, model_2, model_3, model_4, model_5]

**pipeline_time divided by 5

Version 1.0.0 Configuration#

algo feature / parameter

setting

use_templates

False

selected_models

[1,2,3,4,5]

relax_prediction

False

deepspeed evoformer kernel

active

precision for deepspeed evoformer kernel

bf16

precision for the rest of the model

fp32