MolMIM is a state-of-the-art generative model for small molecule drug development that learns an informative and clustered latent space. It is a probabilistic auto-encoder that provides a fixed-length representation of variable-length SMILES strings. MolMIM is trained with Mutual Information Machine (MIM) learning and can sample valid SMILES strings from using perturbations of its clustered latent space.

MolMIM can:

Learn an informative and meaningfully clustered latent space

Sample valid molecules from this latent space using an initial seed molecule

Generate novel small molecules with desired properties under specific constraints

Note A more detailed description of the model can be found in the MolMIM manuscript.

MolMIM’s training procedure promotes a dense latent space which simplifies the process of sampling valid SMILES string. Comparisons to competing techniques demonstrate its superior molecular generation capabilities as measured by validity, uniqueness, and novelty of sampled SMILES strings.

MolMIM Capabilities# Embedding# Retrieve the embeddings from MolMIM for a given input molecule, allowing for: Molecular representation in a high-dimensional space

Similarity analysis and clustering of molecules

Use as input for other AI models or algorithms Hidden State# Retrieve the hidden state from MolMIM for a given input molecule, also known as the “latent code,” allowing for: Analysis of the molecular structure’s underlying properties and patterns

Manipulation and modification of the molecular structure

Use as input for other AI models or algorithms Decode# Decode a hidden state representation into a SMILES string sequence, allowing for: Generation of novel molecules from a given latent code

Reconstruction of the original input molecule from its hidden state

Use as a tool for understanding the relationship between molecular structure and latent space Sample# Sample the latent space within a given scaled radius from a seed molecule, generating new molecule samples in an unguided fashion, allowing for: Exploration of the molecular space around a given molecule

Use for generating diverse molecule sets or libraries

Use as a starting point for guided sampling or optimization Generate# Generate novel molecules (optionally while optimizing against a certain property) using CMA-ES-guided sampling, allowing for: Optimization of molecules against specific properties or criteria

Use for generating molecules with desired properties or characteristics

Use for improving the quality or performance of generated molecules