ZINC-15 Dataset
ZINC-15 Dataset#
The ZINC-15 is a free database of commercially-available compounds for virtual screening and was used for training. Approximately 1.54 Billion molecules (SMILES strings) were selected from tranches meeting the following constraints: molecular weight <= 500 Daltons, LogP <= 5, reactivity level was “reactive,” and purchasability was “annotated.” The compounds were filtered to ensure a maximum length of 512 characters. Train, validation, and test splits were randomly split as 99% / 0.5% / 0.5%.