MolMIM Endpoints
MolMIM provides the following endpoints and associated functions:
/embedding
- Retrieve the embeddings from MolMIM for a given input molecule./hidden
- Retrieve the hidden state from MolMIM for a given input molecule (shown as the “latent code” in Figure 1 of the MolMIM manuscript)./decode
- Decode a hidden state representation into a SMILES string sequence./sample
- Sample the latent space within a given scaled radius from a seed molecule. This method generates new molecule samples from the given input in an unguided fashion./generate
- Generate novel molecules (optionally while optimizing against a certain property). This method generates new optimized molecules if CMA-ES-guided sampling is enabled.
Below, we provide example notebooks that demonstrate how each of these endpoints might be used in a drug discovery context.
Using MolMIM Embeddings to Cluster Molecules - Use MolMIM’s
/embedding
endpoint to cluster molecules by similarity in MolMIM’s embedding spaceInterpolating Between Molecules by Manipulating MolMIM Hidden States - Use MolMIM’s
/hidden
and/decode
endpoints to interpolate new molecules between two distinct seed moleculesSampling Chemical Space for Drug Discovery using the MolMIM NIM - Use MolMIM’s
/sample
and/generate
endpoints to explore the molecular space around a seed molecule and improve its Quantitative Estimate of Drug-likeness (QED) score
The following examples include CURL and Python commands to test each endpoint. Where applicable, the examples include commands that test the endpoint’s functionality with both single and multiple SMILES sequence examples.
The MolMIM NIM logs requests and additional information to stdout of the terminal in which it is running. You can reference those outputs to identify issues with any requests or verify correctly-handled requests.
Embedding
The following commands send a POST request to the /embedding
endpoint, providing a JSON object with a single molecule sequence (CC(Cc1ccc(cc1)C(C(=O)O)C)C
) to retrieve its embeddings from MolMIM.
Bash:
curl -X 'POST' \
-i \
"http://localhost:8000/embedding" \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{"sequences": ["CC(Cc1ccc(cc1)C(C(=O)O)C)C"]}'
Python:
import requests
import json
url = "http://localhost:8000/embedding"
headers = {
'accept': 'application/json',
'Content-Type': 'application/json'
}
data = json.dumps({"sequences": ["CC(Cc1ccc(cc1)C(C(=O)O)C)C"]})
response = requests.post(url, headers=headers, data=data)
print(response.text)
The next commands send a POST request to the /embedding
endpoint, providing a JSON object with two molecule sequences (CN1C=NC2=C1C(=O)N(C(=O)N2C)C
and CC(Cc1ccc(cc1)C(C(=O)O)C)C
) to retrieve their embeddings from MolMIM.
Bash:
curl -X 'POST' \
-i \
"http://localhost:8000/embedding" \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{"sequences": ["CN1C=NC2=C1C(=O)N(C(=O)N2C)C", "CC(Cc1ccc(cc1)C(C(=O)O)C)C"]}'
Python:
import requests
import json
url = "http://localhost:8000/embedding"
data = {"sequences": ["CN1C=NC2=C1C(=O)N(C(=O)N2C)C", "CC(Cc1ccc(cc1)C(C(=O)O)C)C"]}
headers = {
'accept': 'application/json',
'Content-Type': 'application/json'
}
response = requests.post(url, headers=headers, json=data)
print(response.text)
Hidden
The following commands send a POST request to the /hidden
endpoint, providing a JSON object with a single molecule sequence (CC(Cc1ccc(cc1)C(C(=O)O)C)C
) to retrieve its hidden state representation from MolMIM. The response is saved to the local file local-hidden-single.json
.
Bash:
curl -X 'POST' \
"http://localhost:8000/hidden" \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{"sequences": ["CC(Cc1ccc(cc1)C(C(=O)O)C)C"]}' > local-hidden-single.json
Python:
import requests
import json
url = "http://localhost:8000/hidden"
headers = {
'accept': 'application/json',
'Content-Type': 'application/json'
}
data = '{"sequences": ["CC(Cc1ccc(cc1)C(C(=O)O)C)C"]}'
response = requests.post(url, headers=headers, data=data)
with open('local-hidden-single.json', 'w') as f:
json.dump(response.json(), f)
The following commands send a POST request to the /hidden
endpoint, providing a JSON object with two molecule sequences (CN1C=NC2=C1C(=O)N(C(=O)N2C)C
and CC(Cc1ccc(cc1)C(C(=O)O)C)C
) to retrieve their hidden state representations from MolMIM. The response is saved to the local file local-hidden-multiple.json
.
Bash:
curl -X 'POST' \
"http://localhost:8000/hidden" \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{"sequences": ["CN1C=NC2=C1C(=O)N(C(=O)N2C)C", "CC(Cc1ccc(cc1)C(C(=O)O)C)C"]}' > local-hidden-multiple.json
Python:
import requests
import json
url = "http://localhost:8000/hidden"
headers = {
'accept': 'application/json',
'Content-Type': 'application/json'
}
data = {
"sequences": ["CN1C=NC2=C1C(=O)N(C(=O)N2C)C", "CC(Cc1ccc(cc1)C(C(=O)O)C)C"]
}
response = requests.post(url, headers=headers, json=data)
with open('local-hidden-multiple.json', 'w') as f:
json.dump(response.json(), f)
For each of the /decode
commands below, you will need the saved output from the previous calls to the /hidden
endpoint.
The following commands send a POST request to the /decode
endpoint, providing the contents of the local-hidden-single.json
file (which contains a single molecule’s hidden state representation) to decode the hidden state into a SMILES string sequence.
Bash:
curl -X 'POST' \
-i \
"http://localhost:8000/decode" \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '@./local-hidden-single.json'
Python:
import requests
import json
with open('./local-hidden-single.json') as f:
data = json.load(f)
response = requests.post('http://localhost:8000/decode',
headers={'accept': 'application/json', 'Content-Type': 'application/json'},
json=data)
print(response.text)
The following commands send a POST request to the /decode
endpoint, providing the contents of the local-hidden-multiple.json
file (which contains multiple molecules’ hidden state representations) to decode the hidden states into SMILES string sequences.
Bash:
curl -X 'POST' \
-i \
"http://localhost:8000/decode" \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '@./local-hidden-multiple.json'
Python:
import requests
import json
with open('./local-hidden-multiple.json', 'r') as f:
data = json.load(f)
response = requests.post('http://localhost:8000/decode',
headers={'accept': 'application/json', 'Content-Type': 'application/json'},
json=data)
print(response.text)
The following commands send a POST request to the /sampling
endpoint, providing a JSON object with one molecule sequence (CN1C=NC2=C1C(=O)N(C(=O)N2C)C
). The MolMIM server samples the latent space within a given scaled radius from each of this seed molecule, generating new molecule samples in an unguided fashion.
Bash:
curl -X POST \
localhost:8000/sampling \
--header 'Content-Type: application/json' \
-d '{"sequences": ["CC(Cc1ccc(cc1)C(C(=O)O)C)C"]}'
Python:
import requests
import json
url = "http://localhost:8000/sampling"
data = {"sequences": ["CC(Cc1ccc(cc1)C(C(=O)O)C)C"]}
headers = {
'Content-Type': 'application/json'
}
response = requests.post(url, headers=headers, json=data)
print(response.text)
The following commands send a POST request to the /sampling
endpoint, providing a JSON object with two molecule sequences (CN1C=NC2=C1C(=O)N(C(=O)N2C)C
and CC(Cc1ccc(cc1)C(C(=O)O)C)C
). The MolMIM server samples the latent space within a given scaled radius from each of these seed molecules, generating new molecule samples in an unguided fashion.
Bash:
curl -X POST \
localhost:8000/sampling \
--header 'Content-Type: application/json' \
-d '{"sequences": ["CN1C=NC2=C1C(=O)N(C(=O)N2C)C", "CC(Cc1ccc(cc1)C(C(=O)O)C)C"]}'
Python:
import requests
import json
url = "http://localhost:8000/sampling"
data = {"sequences": ["CN1C=NC2=C1C(=O)N(C(=O)N2C)C", "CC(Cc1ccc(cc1)C(C(=O)O)C)C"]}
headers = {"Content-Type": "application/json"}
response = requests.post(url, headers=headers, json=data)
print(response.text)
The /generate
endpoint provides two alternate options:
CMA-ES - a black-box optimization algorithm that can guide MolMIM sampling to optimize for a specific property, in this case either QED or plogP.
Random sampling - functions similarly to the
/sampling
endpoint, but with less flexibility for the sampling parameters.
Required parameters for each algorithm type:
For the “CMA-ES” algorithm:
smi
: SMILES string of the seed moleculenum_molecules
: Number of molecules to generateproperty_name
: Property to optimize (QED or plogP)minimize
: Whether to minimize or maximize the propertymin_similarity
: Minimum similarity to the seed moleculeparticles
: Number of particles to use for the controlled generation algorithm (must be greater than or equal tonum_molecules
)iterations
: Number of iterations to run the controlled generation algorithm
For random sampling (“none”) algorithm:
smi
: SMILES string of the seed moleculenum_molecules
: Number of molecules to generateparticles
: Number of particles to use for sampling (must be greater than or equal tonum_molecules
).scaled_radius
: Scaled radius for sampling
This first set of commands use the CMA-ES algorithm to generate 5 molecules, minimizing the QED property, with a minimum similarity of 0.4, 8 particles, and 3 iterations.
Bash:
curl --request POST \
localhost:8000/generate \
--header 'Content-Type: application/json' \
--data-raw '{"smi":"CN1C=NC2=C1C(=O)N(C(=O)N2C)C", "algorithm":"CMA-ES", "num_molecules":5, "property_name":"QED", "minimize": false, "min_similarity": 0.4, "particles": 8, "iterations": 3}'
Python:
import requests
import json
url = 'http://localhost:8000/generate'
data = {
"smi": "CN1C=NC2=C1C(=O)N(C(=O)N2C)C",
"algorithm": "CMA-ES",
"num_molecules": 5,
"property_name": "QED",
"minimize": False,
"min_similarity": 0.4,
"particles": 8,
"iterations": 3
}
headers = {'Content-Type': 'application/json'}
response = requests.post(url, headers=headers, json=data)
print(response.text)
This second set of commands use the CMA-ES algorithm to generate 5 molecules, maximizing plogP, with a minimum similarity of 0.4, 8 particles, and 3 iterations.
Bash:
curl --request POST \
localhost:8000/generate \
--header 'Content-Type: application/json' \
--data-raw '{"smi":"CN1C=NC2=C1C(=O)N(C(=O)N2C)C", "algorithm":"CMA-ES", "num_molecules":5, "property_name":"plogP", "minimize": true, "min_similarity": 0.4, "particles": 8, "iterations": 3}'
Python:
import requests
import json
url = "http://localhost:8000/generate"
data = {
"smi": "CN1C=NC2=C1C(=O)N(C(=O)N2C)C",
"algorithm": "CMA-ES",
"num_molecules": 5,
"property_name": "plogP",
"minimize": True,
"min_similarity": 0.4,
"particles": 8,
"iterations": 3
}
headers = {
'Content-Type': 'application/json'
}
response = requests.post(url, headers=headers, json=data)
print(response.text)
The last set of commands use the random sampling (“none”) algorithm to generate 5 molecules with a seed molecule specified by the SMILES string (CN1C=NC2=C1C(=O)N(C(=O)N2C)C
), using 8 particles and a scaled radius of 1.0.
Bash:
curl --request POST \
localhost:8000/generate \
--header 'Content-Type: application/json' \
--data-raw '{"smi":"CN1C=NC2=C1C(=O)N(C(=O)N2C)C", "algorithm":"none", "num_molecules":5, "particles": 8, "scaled_radius": 1.0}'
Python:
import requests
import json
url = "http://localhost:8000/generate"
data = {
"smi": "CN1C=NC2=C1C(=O)N(C(=O)N2C)C",
"algorithm": "none",
"num_molecules": 5,
"particles": 8,
"scaled_radius": 1.0
}
headers = {
'Content-Type': 'application/json'
}
response = requests.post(url, headers=headers, json=data)
print(response.text)