Benchmarks#
Following with the tests from SAFE-GPT, a benchmark for both performance and accuracy is built on two fragment completion tasks: motif-extension and scaffold-decoration, for each of them 10 tests from the SAFE-DRUGS dataset were used. More information about these tasks can be found in this article.
For the task of motif-extension, parameters are chosen as:
mask_length = 17
temperature = 1.2
noise = 1.6
step_size = 1
For the task of scaffold-decoration, parameters are chosen as:
mask_length = 17
temperature = 1.2
noise = 2.0
step_size = 1
Performance#
Average of wall-time (in seconds) of generating 1000 molecules for tests of the two tasks are below, showing relative performances of the model on different GPU models.
GPU |
motif-extension |
scaffold-decoration |
---|---|---|
A10G |
2.487 |
1.750 |
RTX6000 |
1.108 |
1.235 |
L40S |
1.892 |
1.207 |
A100 |
2.378 |
1.703 |
H100 |
1.677 |
1.006 |
The end-to-end time of the molecular generation requests (on RTX6000) depends on following parameters:
num_molecule: number of molecules to be generated.
steps: number of steps to recover all masks.
context_length: number of tokens, including the ending and masking tokens, converted from the input molecular sequence (SMILES).
Comparison of wall-times on num_molecule:
context_length |
num_molecules |
steps |
walltime |
---|---|---|---|
20 |
500 |
5 |
1.7304 |
20 |
1000 |
5 |
3.3841 |
20 |
2000 |
5 |
6.9061 |
20 |
10000 |
5 |
33.8946 |
Comparison of wall-times on steps:
context_length |
num_molecules |
steps |
walltime |
---|---|---|---|
20 |
1000 |
5 |
1.7304 |
20 |
1000 |
10 |
4.5902 |
20 |
1000 |
20 |
7.0649 |
context_length |
num_molecules |
steps |
walltime |
---|---|---|---|
40 |
1000 |
5 |
5.8414 |
40 |
1000 |
10 |
7.9896 |
40 |
1000 |
20 |
13.0631 |
Comparison of wall-times on context_length:
context_length |
num_molecules |
steps |
walltime |
---|---|---|---|
20 |
1000 |
5 |
1.7304 |
40 |
1000 |
5 |
5.8414 |
80 |
1000 |
5 |
10.6091 |
Accuracy#
Accuracy of the model have been evaluated from generation of 100 molecules on following metrics:
validity: fraction of generated SMILES that are valid.
uniqueness: ratio of unique molecules from all valid molecules.
diversity: average pair-wise distances in molecular fingerprints of generated molecules.
novelty: average distances of molecular fingerprints from the input molecule to generated molecules.
quality: fraction of generated molecules that are with QED_score > 0.6 and SA_score < 4.
A summary of accuracy evaluations is below
metric |
motif-extension |
scaffold-decoration |
---|---|---|
validity |
0.902 (0.009) |
0.977 (0.004) |
uniqueness |
0.690 (0.015) |
0.770 (0.012) |
diversity |
0.606 (0.003) |
0.560 (0.003) |
novelty |
0.684 (0.002) |
0.657 (0.001) |
quality |
0.278 (0.012) |
0.332 (0.011) |