Controlled Small Molecule Generation (Latest)
Controlled Small Molecule Generation (Latest)

MolMIM Endpoints

MolMIM provides the following endpoints and associated functions:

  • /embedding - Retrieve the embeddings from MolMIM for a given input molecule.

  • /hidden - Retrieve the hidden state from MolMIM for a given input molecule (shown as the “latent code” in Figure 1 of the MolMIM manuscript).

  • /decode - Decode a hidden state representation into a SMILES string sequence.

  • /sample - Sample the latent space within a given scaled radius from a seed molecule. This method generates new molecule samples from the given input in an unguided fashion.

  • /generate - Generate novel molecules (optionally while optimizing against a certain property). This method generates new optimized molecules if CMA-ES-guided sampling is enabled.

Below, we provide example notebooks that demonstrate how each of these endpoints might be used in a drug discovery context.

  1. Using MolMIM Embeddings to Cluster Molecules - Use MolMIM’s /embedding endpoint to cluster molecules by similarity in MolMIM’s embedding space

    ClusterMolMIMEmbeddings.ipynb

  2. Interpolating Between Molecules by Manipulating MolMIM Hidden States - Use MolMIM’s /hidden and /decode endpoints to interpolate new molecules between two distinct seed molecules

    MolMIMInterpolation.ipynb

  3. Sampling Chemical Space for Drug Discovery using the MolMIM NIM - Use MolMIM’s /sample and /generate endpoints to explore the molecular space around a seed molecule and improve its Quantitative Estimate of Drug-likeness (QED) score

    MolMIMGeneration.ipynb

The following examples include CURL and Python commands to test each endpoint. Where applicable, the examples include commands that test the endpoint’s functionality with both single and multiple SMILES sequence examples.

The MolMIM NIM logs requests and additional information to stdout of the terminal in which it is running. You can reference those outputs to identify issues with any requests or verify correctly-handled requests.

Embedding

The following commands send a POST request to the /embedding endpoint, providing a JSON object with a single molecule sequence (CC(Cc1ccc(cc1)C(C(=O)O)C)C) to retrieve its embeddings from MolMIM.

Bash:

Copy
Copied!
            

curl -X 'POST' \ -i \ "http://localhost:8000/embedding" \ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ -d '{"sequences": ["CC(Cc1ccc(cc1)C(C(=O)O)C)C"]}'

Python:

Copy
Copied!
            

import requests import json url = "http://localhost:8000/embedding" headers = { 'accept': 'application/json', 'Content-Type': 'application/json' } data = json.dumps({"sequences": ["CC(Cc1ccc(cc1)C(C(=O)O)C)C"]}) response = requests.post(url, headers=headers, data=data) print(response.text)

The next commands send a POST request to the /embedding endpoint, providing a JSON object with two molecule sequences (CN1C=NC2=C1C(=O)N(C(=O)N2C)C and CC(Cc1ccc(cc1)C(C(=O)O)C)C) to retrieve their embeddings from MolMIM.

Bash:

Copy
Copied!
            

curl -X 'POST' \ -i \ "http://localhost:8000/embedding" \ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ -d '{"sequences": ["CN1C=NC2=C1C(=O)N(C(=O)N2C)C", "CC(Cc1ccc(cc1)C(C(=O)O)C)C"]}'

Python:

Copy
Copied!
            

import requests import json url = "http://localhost:8000/embedding" data = {"sequences": ["CN1C=NC2=C1C(=O)N(C(=O)N2C)C", "CC(Cc1ccc(cc1)C(C(=O)O)C)C"]} headers = { 'accept': 'application/json', 'Content-Type': 'application/json' } response = requests.post(url, headers=headers, json=data) print(response.text)

Hidden

The following commands send a POST request to the /hidden endpoint, providing a JSON object with a single molecule sequence (CC(Cc1ccc(cc1)C(C(=O)O)C)C) to retrieve its hidden state representation from MolMIM. The response is saved to the local file local-hidden-single.json.

Bash:

Copy
Copied!
            

curl -X 'POST' \ "http://localhost:8000/hidden" \ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ -d '{"sequences": ["CC(Cc1ccc(cc1)C(C(=O)O)C)C"]}' > local-hidden-single.json

Python:

Copy
Copied!
            

import requests import json url = "http://localhost:8000/hidden" headers = { 'accept': 'application/json', 'Content-Type': 'application/json' } data = '{"sequences": ["CC(Cc1ccc(cc1)C(C(=O)O)C)C"]}' response = requests.post(url, headers=headers, data=data) with open('local-hidden-single.json', 'w') as f: json.dump(response.json(), f)

The following commands send a POST request to the /hidden endpoint, providing a JSON object with two molecule sequences (CN1C=NC2=C1C(=O)N(C(=O)N2C)C and CC(Cc1ccc(cc1)C(C(=O)O)C)C) to retrieve their hidden state representations from MolMIM. The response is saved to the local file local-hidden-multiple.json.

Bash:

Copy
Copied!
            

curl -X 'POST' \ "http://localhost:8000/hidden" \ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ -d '{"sequences": ["CN1C=NC2=C1C(=O)N(C(=O)N2C)C", "CC(Cc1ccc(cc1)C(C(=O)O)C)C"]}' > local-hidden-multiple.json

Python:

Copy
Copied!
            

import requests import json url = "http://localhost:8000/hidden" headers = { 'accept': 'application/json', 'Content-Type': 'application/json' } data = { "sequences": ["CN1C=NC2=C1C(=O)N(C(=O)N2C)C", "CC(Cc1ccc(cc1)C(C(=O)O)C)C"] } response = requests.post(url, headers=headers, json=data) with open('local-hidden-multiple.json', 'w') as f: json.dump(response.json(), f)

Note

For each of the /decode commands below, you will need the saved output from the previous calls to the /hidden endpoint.

The following commands send a POST request to the /decode endpoint, providing the contents of the local-hidden-single.json file (which contains a single molecule’s hidden state representation) to decode the hidden state into a SMILES string sequence.

Bash:

Copy
Copied!
            

curl -X 'POST' \ -i \ "http://localhost:8000/decode" \ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ -d '@./local-hidden-single.json'

Python:

Copy
Copied!
            

import requests import json with open('./local-hidden-single.json') as f: data = json.load(f) response = requests.post('http://localhost:8000/decode', headers={'accept': 'application/json', 'Content-Type': 'application/json'}, json=data) print(response.text)

The following commands send a POST request to the /decode endpoint, providing the contents of the local-hidden-multiple.json file (which contains multiple molecules’ hidden state representations) to decode the hidden states into SMILES string sequences.

Bash:

Copy
Copied!
            

curl -X 'POST' \ -i \ "http://localhost:8000/decode" \ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ -d '@./local-hidden-multiple.json'

Python:

Copy
Copied!
            

import requests import json with open('./local-hidden-multiple.json', 'r') as f: data = json.load(f) response = requests.post('http://localhost:8000/decode', headers={'accept': 'application/json', 'Content-Type': 'application/json'}, json=data) print(response.text)

The following commands send a POST request to the /sampling endpoint, providing a JSON object with one molecule sequence (CN1C=NC2=C1C(=O)N(C(=O)N2C)C). The MolMIM server samples the latent space within a given scaled radius from each of this seed molecule, generating new molecule samples in an unguided fashion.

Bash:

Copy
Copied!
            

curl -X POST \ localhost:8000/sampling \ --header 'Content-Type: application/json' \ -d '{"sequences": ["CC(Cc1ccc(cc1)C(C(=O)O)C)C"]}'

Python:

Copy
Copied!
            

import requests import json url = "http://localhost:8000/sampling" data = {"sequences": ["CC(Cc1ccc(cc1)C(C(=O)O)C)C"]} headers = { 'Content-Type': 'application/json' } response = requests.post(url, headers=headers, json=data) print(response.text)

The following commands send a POST request to the /sampling endpoint, providing a JSON object with two molecule sequences (CN1C=NC2=C1C(=O)N(C(=O)N2C)C and CC(Cc1ccc(cc1)C(C(=O)O)C)C). The MolMIM server samples the latent space within a given scaled radius from each of these seed molecules, generating new molecule samples in an unguided fashion.

Bash:

Copy
Copied!
            

curl -X POST \ localhost:8000/sampling \ --header 'Content-Type: application/json' \ -d '{"sequences": ["CN1C=NC2=C1C(=O)N(C(=O)N2C)C", "CC(Cc1ccc(cc1)C(C(=O)O)C)C"]}'

Python:

Copy
Copied!
            

import requests import json url = "http://localhost:8000/sampling" data = {"sequences": ["CN1C=NC2=C1C(=O)N(C(=O)N2C)C", "CC(Cc1ccc(cc1)C(C(=O)O)C)C"]} headers = {"Content-Type": "application/json"} response = requests.post(url, headers=headers, json=data) print(response.text)

The /generate endpoint provides two alternate options:

  1. CMA-ES - a black-box optimization algorithm that can guide MolMIM sampling to optimize for a specific property, in this case either QED or plogP.

  2. Random sampling - functions similarly to the /sampling endpoint, but with less flexibility for the sampling parameters.

Required parameters for each algorithm type:

  • For the “CMA-ES” algorithm:

    • smi: SMILES string of the seed molecule

    • num_molecules: Number of molecules to generate

    • property_name: Property to optimize (QED or plogP)

    • minimize: Whether to minimize or maximize the property

    • min_similarity: Minimum similarity to the seed molecule

    • particles: Number of particles to use for the controlled generation algorithm (must be greater than or equal to num_molecules)

    • iterations: Number of iterations to run the controlled generation algorithm

  • For random sampling (“none”) algorithm:

    • smi: SMILES string of the seed molecule

    • num_molecules: Number of molecules to generate

    • particles: Number of particles to use for sampling (must be greater than or equal to num_molecules).

    • scaled_radius: Scaled radius for sampling

This first set of commands use the CMA-ES algorithm to generate 5 molecules, minimizing the QED property, with a minimum similarity of 0.4, 8 particles, and 3 iterations.

Bash:

Copy
Copied!
            

curl --request POST \ localhost:8000/generate \ --header 'Content-Type: application/json' \ --data-raw '{"smi":"CN1C=NC2=C1C(=O)N(C(=O)N2C)C", "algorithm":"CMA-ES", "num_molecules":5, "property_name":"QED", "minimize": false, "min_similarity": 0.4, "particles": 8, "iterations": 3}'

Python:

Copy
Copied!
            

import requests import json url = 'http://localhost:8000/generate' data = { "smi": "CN1C=NC2=C1C(=O)N(C(=O)N2C)C", "algorithm": "CMA-ES", "num_molecules": 5, "property_name": "QED", "minimize": False, "min_similarity": 0.4, "particles": 8, "iterations": 3 } headers = {'Content-Type': 'application/json'} response = requests.post(url, headers=headers, json=data) print(response.text)

This second set of commands use the CMA-ES algorithm to generate 5 molecules, maximizing plogP, with a minimum similarity of 0.4, 8 particles, and 3 iterations.

Bash:

Copy
Copied!
            

curl --request POST \ localhost:8000/generate \ --header 'Content-Type: application/json' \ --data-raw '{"smi":"CN1C=NC2=C1C(=O)N(C(=O)N2C)C", "algorithm":"CMA-ES", "num_molecules":5, "property_name":"plogP", "minimize": true, "min_similarity": 0.4, "particles": 8, "iterations": 3}'

Python:

Copy
Copied!
            

import requests import json url = "http://localhost:8000/generate" data = { "smi": "CN1C=NC2=C1C(=O)N(C(=O)N2C)C", "algorithm": "CMA-ES", "num_molecules": 5, "property_name": "plogP", "minimize": True, "min_similarity": 0.4, "particles": 8, "iterations": 3 } headers = { 'Content-Type': 'application/json' } response = requests.post(url, headers=headers, json=data) print(response.text)

The last set of commands use the random sampling (“none”) algorithm to generate 5 molecules with a seed molecule specified by the SMILES string (CN1C=NC2=C1C(=O)N(C(=O)N2C)C), using 8 particles and a scaled radius of 1.0.

Bash:

Copy
Copied!
            

curl --request POST \ localhost:8000/generate \ --header 'Content-Type: application/json' \ --data-raw '{"smi":"CN1C=NC2=C1C(=O)N(C(=O)N2C)C", "algorithm":"none", "num_molecules":5, "particles": 8, "scaled_radius": 1.0}'

Python:

Copy
Copied!
            

import requests import json url = "http://localhost:8000/generate" data = { "smi": "CN1C=NC2=C1C(=O)N(C(=O)N2C)C", "algorithm": "none", "num_molecules": 5, "particles": 8, "scaled_radius": 1.0 } headers = { 'Content-Type': 'application/json' } response = requests.post(url, headers=headers, json=data) print(response.text)

Previous Deployment Guide
Next Advanced Usage
© | | | | | | |. Last updated on Jul 25, 2024.