Benchmarks#

Following with the tests from SAFE-GPT, a benchmark for both performance and accuracy is built on two fragment completion tasks: motif-extension and scaffold-decoration, for each of them 10 tests from the SAFE-DRUGS dataset were used. More information about these tasks can be found in this article.

For the task of motif-extension, parameters are chosen as:

  • mask_length = 17

  • temperature = 1.2

  • noise = 1.6

  • step_size = 1

For the task of scaffold-decoration, parameters are chosen as:

  • mask_length = 17

  • temperature = 1.2

  • noise = 2.0

  • step_size = 1

Performance#

Average of wall-time (in seconds) of generating 1000 molecules for tests of the two tasks are below, showing relative performances of the model on different GPU models.

GPU

motif-extension

scaffold-decoration

A10G

2.487

1.750

RTX6000

1.108

1.235

L40S

1.892

1.207

A100

2.378

1.703

H100

1.677

1.006

The end-to-end time of the molecular generation requests (on RTX6000) depends on following parameters:

  • num_molecule: number of molecules to be generated.

  • steps: number of steps to recover all masks.

  • context_length: number of tokens, including the ending and masking tokens, converted from the input molecular sequence (SMILES).

Comparison of wall-times on num_molecule:

context_length

num_molecules

steps

walltime

20

500

5

1.7304

20

1000

5

3.3841

20

2000

5

6.9061

20

10000

5

33.8946

Comparison of wall-times on steps:

context_length

num_molecules

steps

walltime

20

1000

5

1.7304

20

1000

10

4.5902

20

1000

20

7.0649

context_length

num_molecules

steps

walltime

40

1000

5

5.8414

40

1000

10

7.9896

40

1000

20

13.0631

Comparison of wall-times on context_length:

context_length

num_molecules

steps

walltime

20

1000

5

1.7304

40

1000

5

5.8414

80

1000

5

10.6091

Accuracy#

Accuracy of the model have been evaluated from generation of 100 molecules on following metrics:

  • validity: fraction of generated SMILES that are valid.

  • uniqueness: ratio of unique molecules from all valid molecules.

  • diversity: average pair-wise distances in molecular fingerprints of generated molecules.

  • novelty: average distances of molecular fingerprints from the input molecule to generated molecules.

  • quality: fraction of generated molecules that are with QED_score > 0.6 and SA_score < 4.

A summary of accuracy evaluations is below

metric

motif-extension

scaffold-decoration

validity

0.902 (0.009)

0.977 (0.004)

uniqueness

0.690 (0.015)

0.770 (0.012)

diversity

0.606 (0.003)

0.560 (0.003)

novelty

0.684 (0.002)

0.657 (0.001)

quality

0.278 (0.012)

0.332 (0.011)