MolMIM Endpoints#
MolMIM provides the following endpoints and associated functions:
/embedding
- Retrieve the embeddings from MolMIM for a given input molecule./hidden
- Retrieve the hidden state from MolMIM for a given input molecule (shown as the “latent code” in Figure 1 of the MolMIM manuscript)./decode
- Decode a hidden state representation into a SMILES string sequence./sampling
- Sample the latent space within a given scaled radius from a seed molecule. This method generates new molecule samples from the given input in an unguided fashion./generate
- Generate novel molecules (optionally while optimizing against a certain property). This method generates new optimized molecules if CMA-ES-guided sampling is enabled.
Notebooks#
Below, we provide example notebooks that demonstrate how each of these endpoints could be used in a drug discovery context.
Using MolMIM Embeddings to Cluster Molecules - Use MolMIM’s
/embedding
endpoint to cluster molecules by similarity in MolMIM’s embedding spaceInterpolating Between Molecules by Manipulating MolMIM Hidden States - Use MolMIM’s
/hidden
and/decode
endpoints to interpolate new molecules between two distinct seed moleculesSampling Chemical Space for Drug Discovery using the MolMIM NIM - Use MolMIM’s
/sampling
and/generate
endpoints to explore the molecular space around a seed molecule and improve its Quantitative Estimate of Drug-likeness (QED) score
Usage#
The following examples include CURL and Python commands to test each endpoint. Where applicable, the examples include commands that test the endpoint’s functionality with both single and multiple SMILES sequence examples.
The MolMIM NIM logs requests and additional information to stdout of the terminal in which it is running. You can reference those outputs to identify issues with any requests or verify correctly-handled requests.
Embedding#
/embedding
Request Body:
sequences
: array of strings (SMILES strings)
Response:
embeddings
: array of arrays of floating point numbers (embeddings)
The following commands send a POST request to the /embedding
endpoint, providing a JSON object with a single molecule sequence (CC(Cc1ccc(cc1)C(C(=O)O)C)C
) to retrieve its embeddings from MolMIM.
Bash:
curl -X 'POST' \
-i \
"http://localhost:8000/embedding" \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{"sequences": ["CC(Cc1ccc(cc1)C(C(=O)O)C)C"]}'
Python:
import requests
import json
url = "http://localhost:8000/embedding"
headers = {
'accept': 'application/json',
'Content-Type': 'application/json'
}
data = json.dumps({"sequences": ["CC(Cc1ccc(cc1)C(C(=O)O)C)C"]})
response = requests.post(url, headers=headers, data=data)
print(response.text)
The next commands send a POST request to the /embedding
endpoint, providing a JSON object with two molecule sequences (CN1C=NC2=C1C(=O)N(C(=O)N2C)C
and CC(Cc1ccc(cc1)C(C(=O)O)C)C
) to retrieve their embeddings from MolMIM.
Bash:
curl -X 'POST' \
-i \
"http://localhost:8000/embedding" \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{"sequences": ["CN1C=NC2=C1C(=O)N(C(=O)N2C)C", "CC(Cc1ccc(cc1)C(C(=O)O)C)C"]}'
Python:
import requests
import json
url = "http://localhost:8000/embedding"
data = {"sequences": ["CN1C=NC2=C1C(=O)N(C(=O)N2C)C", "CC(Cc1ccc(cc1)C(C(=O)O)C)C"]}
headers = {
'accept': 'application/json',
'Content-Type': 'application/json'
}
response = requests.post(url, headers=headers, json=data)
print(response.text)
Decode#
/decode
Request Body:
hiddens
: array of arrays of arrays of floating point numbers (hidden states)mask
: array of arrays of booleans (mask)
Response:
generated
: array of strings (SMILES strings)
The following commands send a POST request to the /decode
endpoint, providing the contents of the local-hidden-single.json
file (which contains a single molecule’s hidden state representation) to decode the hidden state into a SMILES string sequence.
Note
For each of the /decode
commands below, you will need the saved output from the previous calls to the /hidden
endpoint.
Bash:
curl -X 'POST' \
-i \
"http://localhost:8000/decode" \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '@./local-hidden-single.json'
Python:
import requests
import json
with open('./local-hidden-single.json') as f:
data = json.load(f)
response = requests.post('http://localhost:8000/decode',
headers={'accept': 'application/json', 'Content-Type': 'application/json'},
json=data)
print(response.text)
The following commands send a POST request to the /decode
endpoint, providing the contents of the local-hidden-multiple.json
file (which contains multiple molecules’ hidden state representations) to decode the hidden states into SMILES string sequences.
Bash:
curl -X 'POST' \
-i \
"http://localhost:8000/decode" \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '@./local-hidden-multiple.json'
Python:
import requests
import json
with open('./local-hidden-multiple.json', 'r') as f:
data = json.load(f)
response = requests.post('http://localhost:8000/decode',
headers={'accept': 'application/json', 'Content-Type': 'application/json'},
json=data)
print(response.text)
Sampling#
/sampling
Request Body:
sequences
: array of strings (SMILES strings)beam_size
: integer (beam width, between 1 and 10, default: 1)num_molecules
: integer (number of molecules, between 1 and 10, default: 1)scaled_radius
: floating point number (scaled radius, between 0 and 2, default: 0.7)
Response:
generated
: array of arrays of strings (SMILES strings)
The following commands send a POST request to the /sampling
endpoint, providing a JSON object with one molecule sequence (CN1C=NC2=C1C(=O)N(C(=O)N2C)C
). The MolMIM server samples the latent space within a given scaled radius from each of this seed molecule, generating new molecule samples in an unguided fashion.
Bash:
curl -X POST \
localhost:8000/sampling \
--header 'Content-Type: application/json' \
-d '{"sequences": ["CC(Cc1ccc(cc1)C(C(=O)O)C)C"]}'
Python:
import requests
import json
url = "http://localhost:8000/sampling"
data = {"sequences": ["CC(Cc1ccc(cc1)C(C(=O)O)C)C"]}
headers = {
'Content-Type': 'application/json'
}
response = requests.post(url, headers=headers, json=data)
print(response.text)
The following commands send a POST request to the /sampling
endpoint, providing a JSON object with two molecule sequences (CN1C=NC2=C1C(=O)N(C(=O)N2C)C
and CC(Cc1ccc(cc1)C(C(=O)O)C)C
). The MolMIM server samples the latent space within a given scaled radius from each of these seed molecules, generating new molecule samples in an unguided fashion.
Bash:
curl -X POST \
localhost:8000/sampling \
--header 'Content-Type: application/json' \
-d '{"sequences": ["CN1C=NC2=C1C(=O)N(C(=O)N2C)C", "CC(Cc1ccc(cc1)C(C(=O)O)C)C"]}'
Python:
import requests
import json
url = "http://localhost:8000/sampling"
data = {"sequences": ["CN1C=NC2=C1C(=O)N(C(=O)N2C)C", "CC(Cc1ccc(cc1)C(C(=O)O)C)C"]}
headers = {"Content-Type": "application/json"}
response = requests.post(url, headers=headers, json=data)
print(response.text)
Generate#
Request Body:
smi
: string (SMILES string)algorithm
: string (algorithm to use, either “CMA-ES” or “none”, default: “CMA-ES”)iterations
: integer (number of iterations, between 1 and 1000, default: 10)min_similarity
: floating point number (minimum similarity, between 0 and 0.7, default: 0.7)minimize
: boolean (whether to minimize the property, default: false)num_molecules
: integer (number of molecules, between 1 and 100, default: 10)particles
: integer (number of particles, between 2 and 1000, default: 30)property_name
: string (property to optimize, either “QED” or “plogP”, default: “QED”)scaled_radius
: floating point number (scaled radius, between 0 and 2, default: 1.0)
Response:
generated
: array of strings (SMILES strings)
The /generate
endpoint provides two alternate options:
CMA-ES - a black-box optimization algorithm that can guide MolMIM sampling to optimize for a specific property; in this case, either QED or plogP.
Random sampling - functions similarly to the
/sampling
endpoint, but with less flexibility for the sampling parameters.
Required parameters for each algorithm type:
For the “CMA-ES” algorithm:
smi
num_molecules
property_name
minimize
min_similarity
particles
iterations
For random sampling (“none”) algorithm:
smi
num_molecules
particles
scaled_radius
This first set of commands use the CMA-ES algorithm to generate five molecules, maximizing the QED property, with a minimum similarity of 0.4, eight particles, and three iterations.
Bash:
curl --request POST \
localhost:8000/generate \
--header 'Content-Type: application/json' \
--data-raw '{"smi":"CN1C=NC2=C1C(=O)N(C(=O)N2C)C", "algorithm":"CMA-ES", "num_molecules":5, "property_name":"QED", "minimize": false, "min_similarity": 0.4, "particles": 8, "iterations": 3}'
Python:
import requests
import json
url = 'http://localhost:8000/generate'
data = {
"smi": "CN1C=NC2=C1C(=O)N(C(=O)N2C)C",
"algorithm": "CMA-ES",
"num_molecules": 5,
"property_name": "QED",
"minimize": False,
"min_similarity": 0.4,
"particles": 8,
"iterations": 3
}
headers = {'Content-Type': 'application/json'}
response = requests.post(url, headers=headers, json=data)
print(response.text)
This second set of commands use the CMA-ES algorithm to generate five molecules, maximizing plogP, with a minimum similarity of 0.4, eight particles, and three iterations.
Bash:
curl --request POST \
localhost:8000/generate \
--header 'Content-Type: application/json' \
--data-raw '{"smi":"CN1C=NC2=C1C(=O)N(C(=O)N2C)C", "algorithm":"CMA-ES", "num_molecules":5, "property_name":"plogP", "minimize": true, "min_similarity": 0.4, "particles": 8, "iterations": 3}'
Python:
import requests
import json
url = "http://localhost:8000/generate"
data = {
"smi": "CN1C=NC2=C1C(=O)N(C(=O)N2C)C",
"algorithm": "CMA-ES",
"num_molecules": 5,
"property_name": "plogP",
"minimize": True,
"min_similarity": 0.4,
"particles": 8,
"iterations": 3
}
headers = {
'Content-Type': 'application/json'
}
response = requests.post(url, headers=headers, json=data)
print(response.text)
The last set of commands use the random sampling (“none”) algorithm to generate five molecules with a seed molecule specified by the SMILES string (CN1C=NC2=C1C(=O)N(C(=O)N2C)C
), using eight particles and a scaled radius of 1.0.
Bash:
curl --request POST \
localhost:8000/generate \
--header 'Content-Type: application/json' \
--data-raw '{"smi":"CN1C=NC2=C1C(=O)N(C(=O)N2C)C", "algorithm":"none", "num_molecules":5, "particles": 8, "scaled_radius": 1.0}'
Python:
import requests
import json
url = "http://localhost:8000/generate"
data = {
"smi": "CN1C=NC2=C1C(=O)N(C(=O)N2C)C",
"algorithm": "none",
"num_molecules": 5,
"particles": 8,
"scaled_radius": 1.0
}
headers = {
'Content-Type': 'application/json'
}
response = requests.post(url, headers=headers, json=data)
print(response.text)