MolMIM Endpoints#

MolMIM provides the following endpoints and associated functions:

  • /embedding - Retrieve the embeddings from MolMIM for a given input molecule.

  • /hidden - Retrieve the hidden state from MolMIM for a given input molecule (shown as the “latent code” in Figure 1 of the MolMIM manuscript).

  • /decode - Decode a hidden state representation into a SMILES string sequence.

  • /sampling - Sample the latent space within a given scaled radius from a seed molecule. This method generates new molecule samples from the given input in an unguided fashion.

  • /generate - Generate novel molecules (optionally while optimizing against a certain property). This method generates new optimized molecules if CMA-ES-guided sampling is enabled.

Notebooks#

Below, we provide example notebooks that demonstrate how each of these endpoints could be used in a drug discovery context.

  1. Using MolMIM Embeddings to Cluster Molecules - Use MolMIM’s /embedding endpoint to cluster molecules by similarity in MolMIM’s embedding space

    ClusterMolMIMEmbeddings.ipynb

  2. Interpolating Between Molecules by Manipulating MolMIM Hidden States - Use MolMIM’s /hidden and /decode endpoints to interpolate new molecules between two distinct seed molecules

    MolMIMInterpolation.ipynb

  3. Sampling Chemical Space for Drug Discovery using the MolMIM NIM - Use MolMIM’s /sampling and /generate endpoints to explore the molecular space around a seed molecule and improve its Quantitative Estimate of Drug-likeness (QED) score

    MolMIMGeneration.ipynb

Usage#

The following examples include CURL and Python commands to test each endpoint. Where applicable, the examples include commands that test the endpoint’s functionality with both single and multiple SMILES sequence examples.

The MolMIM NIM logs requests and additional information to stdout of the terminal in which it is running. You can reference those outputs to identify issues with any requests or verify correctly-handled requests.

Embedding#

/embedding

  • Request Body:

    • sequences: array of strings (SMILES strings)

  • Response:

    • embeddings: array of arrays of floating point numbers (embeddings)

The following commands send a POST request to the /embedding endpoint, providing a JSON object with a single molecule sequence (CC(Cc1ccc(cc1)C(C(=O)O)C)C) to retrieve its embeddings from MolMIM.

Bash:

curl -X 'POST' \
    -i \
    "http://localhost:8000/embedding" \
    -H 'accept: application/json' \
    -H 'Content-Type: application/json' \
    -d '{"sequences": ["CC(Cc1ccc(cc1)C(C(=O)O)C)C"]}'

Python:

import requests
import json

url = "http://localhost:8000/embedding"

headers = {
    'accept': 'application/json',
    'Content-Type': 'application/json'
}

data = json.dumps({"sequences": ["CC(Cc1ccc(cc1)C(C(=O)O)C)C"]})

response = requests.post(url, headers=headers, data=data)

print(response.text)

The next commands send a POST request to the /embedding endpoint, providing a JSON object with two molecule sequences (CN1C=NC2=C1C(=O)N(C(=O)N2C)C and CC(Cc1ccc(cc1)C(C(=O)O)C)C) to retrieve their embeddings from MolMIM.

Bash:

curl -X 'POST' \
    -i \
    "http://localhost:8000/embedding" \
    -H 'accept: application/json' \
    -H 'Content-Type: application/json' \
    -d '{"sequences": ["CN1C=NC2=C1C(=O)N(C(=O)N2C)C", "CC(Cc1ccc(cc1)C(C(=O)O)C)C"]}'

Python:

import requests
import json

url = "http://localhost:8000/embedding"

data = {"sequences": ["CN1C=NC2=C1C(=O)N(C(=O)N2C)C", "CC(Cc1ccc(cc1)C(C(=O)O)C)C"]}

headers = {
    'accept': 'application/json',
    'Content-Type': 'application/json'
}

response = requests.post(url, headers=headers, json=data)

print(response.text)

Hidden#

/hidden

  • Request Body:

    • sequences: array of strings (SMILES strings)

  • Response:

    • hiddens: array of arrays of arrays of floating point numbers (hidden states)

    • mask: array of arrays of booleans (mask)

The following commands send a POST request to the /hidden endpoint, providing a JSON object with a single molecule sequence (CC(Cc1ccc(cc1)C(C(=O)O)C)C) to retrieve its hidden state representation from MolMIM. The response is saved to the local file local-hidden-single.json.

Bash:

curl -X 'POST' \
    "http://localhost:8000/hidden" \
    -H 'accept: application/json' \
    -H 'Content-Type: application/json' \
    -d '{"sequences": ["CC(Cc1ccc(cc1)C(C(=O)O)C)C"]}' > local-hidden-single.json

Python:

import requests
import json

url = "http://localhost:8000/hidden"
headers = {
    'accept': 'application/json',
    'Content-Type': 'application/json'
}
data = '{"sequences": ["CC(Cc1ccc(cc1)C(C(=O)O)C)C"]}'
response = requests.post(url, headers=headers, data=data)

with open('local-hidden-single.json', 'w') as f:
    json.dump(response.json(), f)

The following commands send a POST request to the /hidden endpoint, providing a JSON object with two molecule sequences (CN1C=NC2=C1C(=O)N(C(=O)N2C)C and CC(Cc1ccc(cc1)C(C(=O)O)C)C) to retrieve their hidden state representations from MolMIM. The response is saved to the local file local-hidden-multiple.json.

Bash:

curl -X 'POST' \
    "http://localhost:8000/hidden" \
    -H 'accept: application/json' \
    -H 'Content-Type: application/json' \
    -d '{"sequences": ["CN1C=NC2=C1C(=O)N(C(=O)N2C)C", "CC(Cc1ccc(cc1)C(C(=O)O)C)C"]}'  > local-hidden-multiple.json

Python:

import requests
import json

url = "http://localhost:8000/hidden"
headers = {
    'accept': 'application/json',
    'Content-Type': 'application/json'
}

data = {
    "sequences": ["CN1C=NC2=C1C(=O)N(C(=O)N2C)C", "CC(Cc1ccc(cc1)C(C(=O)O)C)C"]
}

response = requests.post(url, headers=headers, json=data)

with open('local-hidden-multiple.json', 'w') as f:
    json.dump(response.json(), f)

Decode#

/decode

  • Request Body:

    • hiddens: array of arrays of arrays of floating point numbers (hidden states)

    • mask: array of arrays of booleans (mask)

  • Response:

    • generated: array of strings (SMILES strings)

The following commands send a POST request to the /decode endpoint, providing the contents of the local-hidden-single.json file (which contains a single molecule’s hidden state representation) to decode the hidden state into a SMILES string sequence.

Note

For each of the /decode commands below, you will need the saved output from the previous calls to the /hidden endpoint.

Bash:

curl -X 'POST' \
    -i \
    "http://localhost:8000/decode" \
    -H 'accept: application/json' \
    -H 'Content-Type: application/json' \
    -d '@./local-hidden-single.json'

Python:

import requests
import json

with open('./local-hidden-single.json') as f:
    data = json.load(f)

response = requests.post('http://localhost:8000/decode', 
                         headers={'accept': 'application/json', 'Content-Type': 'application/json'}, 
                         json=data)

print(response.text)

The following commands send a POST request to the /decode endpoint, providing the contents of the local-hidden-multiple.json file (which contains multiple molecules’ hidden state representations) to decode the hidden states into SMILES string sequences.

Bash:

curl -X 'POST' \
    -i \
    "http://localhost:8000/decode" \
    -H 'accept: application/json' \
    -H 'Content-Type: application/json' \
    -d '@./local-hidden-multiple.json'

Python:

import requests
import json

with open('./local-hidden-multiple.json', 'r') as f:
    data = json.load(f)

response = requests.post('http://localhost:8000/decode', 
                         headers={'accept': 'application/json', 'Content-Type': 'application/json'}, 
                         json=data)

print(response.text)

Sampling#

/sampling

  • Request Body:

    • sequences: array of strings (SMILES strings)

    • beam_size: integer (beam width, between 1 and 10, default: 1)

    • num_molecules: integer (number of molecules, between 1 and 10, default: 1)

    • scaled_radius: floating point number (scaled radius, between 0 and 2, default: 0.7)

  • Response:

    • generated: array of arrays of strings (SMILES strings)

The following commands send a POST request to the /sampling endpoint, providing a JSON object with one molecule sequence (CN1C=NC2=C1C(=O)N(C(=O)N2C)C). The MolMIM server samples the latent space within a given scaled radius from each of this seed molecule, generating new molecule samples in an unguided fashion.

Bash:

curl -X POST \
    localhost:8000/sampling \
    --header 'Content-Type: application/json' \
    -d '{"sequences": ["CC(Cc1ccc(cc1)C(C(=O)O)C)C"]}'

Python:

import requests
import json

url = "http://localhost:8000/sampling"
data = {"sequences": ["CC(Cc1ccc(cc1)C(C(=O)O)C)C"]}

headers = {
    'Content-Type': 'application/json'
}

response = requests.post(url, headers=headers, json=data)

print(response.text)

The following commands send a POST request to the /sampling endpoint, providing a JSON object with two molecule sequences (CN1C=NC2=C1C(=O)N(C(=O)N2C)C and CC(Cc1ccc(cc1)C(C(=O)O)C)C). The MolMIM server samples the latent space within a given scaled radius from each of these seed molecules, generating new molecule samples in an unguided fashion.

Bash:

curl -X POST \
    localhost:8000/sampling \
    --header 'Content-Type: application/json' \
    -d '{"sequences": ["CN1C=NC2=C1C(=O)N(C(=O)N2C)C", "CC(Cc1ccc(cc1)C(C(=O)O)C)C"]}'

Python:

import requests
import json

url = "http://localhost:8000/sampling"
data = {"sequences": ["CN1C=NC2=C1C(=O)N(C(=O)N2C)C", "CC(Cc1ccc(cc1)C(C(=O)O)C)C"]}

headers = {"Content-Type": "application/json"}

response = requests.post(url, headers=headers, json=data)

print(response.text)

Generate#

  • Request Body:

    • smi: string (SMILES string)

    • algorithm: string (algorithm to use, either “CMA-ES” or “none”, default: “CMA-ES”)

    • iterations: integer (number of iterations, between 1 and 1000, default: 10)

    • min_similarity: floating point number (minimum similarity, between 0 and 0.7, default: 0.7)

    • minimize: boolean (whether to minimize the property, default: false)

    • num_molecules: integer (number of molecules, between 1 and 100, default: 10)

    • particles: integer (number of particles, between 2 and 1000, default: 30)

    • property_name: string (property to optimize, either “QED” or “plogP”, default: “QED”)

    • scaled_radius: floating point number (scaled radius, between 0 and 2, default: 1.0)

  • Response:

    • generated: array of strings (SMILES strings)

The /generate endpoint provides two alternate options:

  1. CMA-ES - a black-box optimization algorithm that can guide MolMIM sampling to optimize for a specific property; in this case, either QED or plogP.

  2. Random sampling - functions similarly to the /sampling endpoint, but with less flexibility for the sampling parameters.

Required parameters for each algorithm type:

  • For the “CMA-ES” algorithm:

    • smi

    • num_molecules

    • property_name

    • minimize

    • min_similarity

    • particles

    • iterations

  • For random sampling (“none”) algorithm:

    • smi

    • num_molecules

    • particles

    • scaled_radius

This first set of commands use the CMA-ES algorithm to generate five molecules, maximizing the QED property, with a minimum similarity of 0.4, eight particles, and three iterations.

Bash:

curl --request POST \
    localhost:8000/generate \
    --header 'Content-Type: application/json' \
    --data-raw '{"smi":"CN1C=NC2=C1C(=O)N(C(=O)N2C)C", "algorithm":"CMA-ES", "num_molecules":5, "property_name":"QED", "minimize": false, "min_similarity": 0.4, "particles": 8, "iterations": 3}'

Python:

import requests
import json

url = 'http://localhost:8000/generate'

data = {
    "smi": "CN1C=NC2=C1C(=O)N(C(=O)N2C)C",
    "algorithm": "CMA-ES",
    "num_molecules": 5,
    "property_name": "QED",
    "minimize": False,
    "min_similarity": 0.4,
    "particles": 8,
    "iterations": 3
}

headers = {'Content-Type': 'application/json'}

response = requests.post(url, headers=headers, json=data)

print(response.text)

This second set of commands use the CMA-ES algorithm to generate five molecules, maximizing plogP, with a minimum similarity of 0.4, eight particles, and three iterations.

Bash:

curl --request POST \
    localhost:8000/generate \
    --header 'Content-Type: application/json' \
    --data-raw '{"smi":"CN1C=NC2=C1C(=O)N(C(=O)N2C)C", "algorithm":"CMA-ES", "num_molecules":5, "property_name":"plogP", "minimize": true, "min_similarity": 0.4, "particles": 8, "iterations": 3}'

Python:

import requests
import json

url = "http://localhost:8000/generate"

data = {
    "smi": "CN1C=NC2=C1C(=O)N(C(=O)N2C)C",
    "algorithm": "CMA-ES",
    "num_molecules": 5,
    "property_name": "plogP",
    "minimize": True,
    "min_similarity": 0.4,
    "particles": 8,
    "iterations": 3
}

headers = {
    'Content-Type': 'application/json'
}

response = requests.post(url, headers=headers, json=data)

print(response.text)

The last set of commands use the random sampling (“none”) algorithm to generate five molecules with a seed molecule specified by the SMILES string (CN1C=NC2=C1C(=O)N(C(=O)N2C)C), using eight particles and a scaled radius of 1.0.

Bash:

curl --request POST \
    localhost:8000/generate \
    --header 'Content-Type: application/json' \
    --data-raw '{"smi":"CN1C=NC2=C1C(=O)N(C(=O)N2C)C", "algorithm":"none", "num_molecules":5, "particles": 8, "scaled_radius": 1.0}'

Python:

import requests
import json

url = "http://localhost:8000/generate"

data = {
    "smi": "CN1C=NC2=C1C(=O)N(C(=O)N2C)C",
    "algorithm": "none",
    "num_molecules": 5,
    "particles": 8,
    "scaled_radius": 1.0
}

headers = {
    'Content-Type': 'application/json'
}

response = requests.post(url, headers=headers, json=data)

print(response.text)