Benchmarks#

Following with the tests from SAFE-GPT, a benchmark for both performance and accuracy is built on two fragment completion tasks: motif-extension and scaffold-decoration, for each of them 10 tests from the SAFE-DRUGS dataset were used. More information about these tasks can be found in this article.

For the task of motif-extension, parameters are chosen as:

mask_length = 17
temperature = 1.2
noise = 1.6
step_size = 1

For the task of scaffold-decoration, parameters are chosen as:

mask_length = 17
temperature = 1.2
noise = 2.0
step_size = 1

Performance#

Average of wall-time (in seconds) of generating 1000 molecules for tests of the two tasks are below, showing relative performances of the model on different GPU models.

GPU	motif-extension	scaffold-decoration
A10G	2.487	1.750
RTX6000	1.108	1.235
L40S	1.892	1.207
A100	2.378	1.703
H100	1.677	1.006

The end-to-end time of the molecular generation requests (on RTX6000) depends on following parameters:

num_molecule: number of molecules to be generated.
steps: number of steps to recover all masks.
context_length: number of tokens, including the ending and masking tokens, converted from the input molecular sequence (SMILES).

Comparison of wall-times on num_molecule:

context_length	num_molecules	steps	walltime
20	500	5	1.7304
20	1000	5	3.3841
20	2000	5	6.9061
20	10000	5	33.8946

Comparison of wall-times on steps:

context_length	num_molecules	steps	walltime
20	1000	5	1.7304
20	1000	10	4.5902
20	1000	20	7.0649

context_length	num_molecules	steps	walltime
40	1000	5	5.8414
40	1000	10	7.9896
40	1000	20	13.0631

Comparison of wall-times on context_length:

context_length	num_molecules	steps	walltime
20	1000	5	1.7304
40	1000	5	5.8414
80	1000	5	10.6091

Accuracy#

Accuracy of the model have been evaluated from generation of 100 molecules on following metrics:

validity: fraction of generated SMILES that are valid.
uniqueness: ratio of unique molecules from all valid molecules.
diversity: average pair-wise distances in molecular fingerprints of generated molecules.
novelty: average distances of molecular fingerprints from the input molecule to generated molecules.
quality: fraction of generated molecules that are with QED_score > 0.6 and SA_score < 4.

A summary of accuracy evaluations is below

metric	motif-extension	scaffold-decoration
validity	0.902 (0.009)	0.977 (0.004)
uniqueness	0.690 (0.015)	0.770 (0.012)
diversity	0.606 (0.003)	0.560 (0.003)
novelty	0.684 (0.002)	0.657 (0.001)
quality	0.278 (0.012)	0.332 (0.011)