Performance#

Version 2.4.0#

This section reports OpenFold2 v2.4.0 performance using internal benchmark artifacts.

The benchmark set contains four protein chains:

Test ID

Seq Length

7WBN_A

98

7ONG_A

304

7ZHT_A

562

7Y4I_A

914

Benchmark Configuration#

Parameter

Setting

selected_models

[1,2,3,4,5]

num_trials

2

Table 1: Performance Across the Supported NVIDIA Hardware Units#

The table below reports pipeline_time_mean (seconds) with TensorRT backend and no structural templates.

Hardware

7WBN_A (98)

7ONG_A (304)

7ZHT_A (562)

7Y4I_A (914)

NVIDIA A100 80GB

7.08

34.16

73.82

171.75

NVIDIA B200

5.68

22.56

48.49

104.95

NVIDIA H100 80GB HBM3

4.60

20.03

42.66

97.04

NVIDIA GB200

7.15

18.16

39.93

83.19

NVIDIA H200

4.35

16.20

37.60

85.28

NVIDIA L40S

12.67

36.40

81.56

183.18

NVIDIA GB10 (DGX Spark)

29.70

137.10

402.87

1131.11

NVIDIA RTX 6000 Ada

11.28

33.56

80.64

187.23

NVIDIA RTX PRO 6000 Blackwell

5.68

27.59

66.81

154.09

NVIDIA GH200

5.63

13.81

29.65

69.50

Table 2: Performance Across Optimization Backends#

The table below compares H100 performance between PyTorch and TensorRT backends without structural templates.

Test ID

Seq Length

torch (s)

trt (s)

trt-speedup-over-torch

7WBN_A

98

29.74

4.60

6.47x

7ONG_A

304

53.34

20.03

2.66x

7ZHT_A

562

98.38

42.66

2.31x

7Y4I_A

914

199.99

97.04

2.06x

Table 3: Performance Impact From Structural Templates#

The table below reports H100 TensorRT performance with and without structural templates.

Test ID

Seq Length

Without structural templates (s)

With structural templates (s)

7WBN_A

98

4.60

5.29

7ONG_A

304

20.03

19.53

7ZHT_A

562

42.66

43.06

7Y4I_A

914

97.04

97.77

Version 2.3.0#

Version 2.3.0 adds support for GB10 (DGX Spark) GPU architecture with optimized performance for this platform.

Performance on GB10 (DGX Spark)#

Below are benchmark times, measured for each input chain, in sequential execution on a single NVIDIA GB10 (DGX Spark) device.

protein chain id

metric

7WBN_A

7ONG_A

7ZHT_A

7Y4I_A

sequence length

sequence length

98

304

562

914

Version 2.3.0

pipeline_time

46.51

170.78

441.02

1079.12

Version 2.3.0

pipeline_time_per_model

9.30

34.16

88.20

215.82

*pipeline_time is defined as the time to load parameter sets, compute features, and the sum of the time to complete the forward pass for each model in [model_1, model_2, model_3, model_4, model_5]

**pipeline_time divided by 5

Accuracy on GB10#

Accuracy metrics for GB10 (DGX Spark) remain consistent with other supported GPU architectures. For detailed accuracy benchmarks, refer to the Accuracy Metrics in the Version 2.0.0 section below.

For performance benchmarks on other GPU architectures, refer to the Version 2.0.0 section below.

Version 2.2.0#

Version 2.2.0 maintains the same performance characteristics as Version 2.1.0. All performance benchmarks and accuracy metrics from Version 2.1.0 apply to Version 2.2.0.

Version 2.1.0#

Version 2.2.0 maintains the same performance characteristics as version 2.1.0. All performance benchmarks and accuracy metrics from version 2.1.0 apply to version 2.2.0.

For detailed performance comparisons and benchmarks, see the Version 2.0.0 section below.

Version 2.0.0#

Compared to Version 1.0.0, Version 2.0.0

Performance and Accuracy#

  • Faster startup: Reduced initialization time due to removal of large template database loading (~300GB)

  • Enhanced GPU support: TensorRT optimization for L40S, B200, and RTX 6000 Ada Generation

  • Reduced storage footprint: Container and cache requirements significantly reduced (from 380GB to 80GB total)

Performance Comparison (Time in seconds) on H100#

  • Below are benchmark times, measured for each input chain, in sequential execution on a single NVIDIA H100 80GB HBM3 device.

protein chain id

metric

7WBN_A

7ONG_A

7ZHT_A

7Y4I_A

sequence length

sequence length

98

304

562

914

Version 1.0.0

pipeline_time

26.6

50.9

98.2

205.9

Version 1.0.0

pipeline_time_per_model

5.3

10.2

19.6

41.2

Version 2.0.0

pipeline_time

4.92

19.4

44.9

113

Version 2.0.0

pipeline_time_per_model

0.984

3.90

8.99

22.6

Version 2.0.0

speed-up vs 1.0.0

5.4x

2.6x

2.2x

1.8x

*pipeline_time is defined as the time to load parameter sets, compute features, and the sum of the time to complete the forward pass for each model in [model_1, model_2, model_3, model_4, model_5]

**pipeline_time divided by 5

Accuracy Metrics on H100#

Metric

Version 1.0.0

Version 2.0.0

CADS

0.744

0.740

LDDT

0.861

0.860

STRIDE

4.24

4.24

MP

0.882

0.878

Version 1.0.0#

Performance will also vary significantly depending on:

  • The type of NVIDIA GPUs that are attached and available to the NIM

  • The CPU type

  • System RAM available

The following section details some performance expectations and provides general tips. These are not meant to be indicative of expected performance and performance on your system will vary from these values.

Performance Benchmarks#

The following are performance benchmarks for OpenFold2 version 1.0.0.

  • Structure prediction performance is mostly dependent on GPU capability and memory. If you find structure prediction to be a bottleneck, consider using a higher memory device.

  • The time required for structure prediction grows with sequence length.

  • The time required for structure prediction grows with the total number of sequences in the alignments.

  • Below are benchmark times, measured for each input chain, in sequential execution on a single NVIDIA H100 80GB HBM3 device.

  • The average value of LDDT-CA, for these protein chains, averaged over 2 runs, is 0.86.

protein chain id

7WBN_A

7ONG_A

7ZHT_A

7Y4I_A

sequence length

98

304

562

914

pipeline_time*

26.6

50.9

98.2

205.9

pipeline_time_per_model**

5.3

10.2

19.6

41.2

*pipeline_time is defined as the time to load parameter sets, compute features, and the sum of the time to complete the forward pass for each model in [model_1, model_2, model_3, model_4, model_5]

**pipeline_time divided by 5

Version 1.0.0 Configuration#

algo feature / parameter

setting

use_templates

False

selected_models

[1,2,3,4,5]

relax_prediction

False

deepspeed evoformer kernel

active

precision for deepspeed evoformer kernel

bf16

precision for the rest of the model

fp32