Basic Usage
It is recommended to use Python>=3.9
and jupyter-notebook
to interact with the steps in the Generative Virtual Screening Blueprint. Please run following commands to ensure your have all required dependencies for Python. For more complete example please also refer to GitHub for these files.
pip install jupyterlab pandas numpy
In a terminal, please launch a jupyter-notebook
with the command:
jupyter-notebook
The command will launch your default web-browser with a jupyter-notebook
UI.
Click the “New” button, and select “Notebook” from the dropdown list.
Then a new notebook will be created, in which you can follow the rest of this guide by copying the code blocks into it and run.
Because all the models applied here are generative AI with intentional randomness, one may not obtain exactly the same values in the results as shown in the rest of this document, however, the format should be the same.
Copy the following code into a new code-block in jupyter-notebook
and run:
import requests
AF2_HOST = 'http://localhost:8081'
DIFFDOCK_HOST = 'http://localhost:8082'
MOLMIM_HOST = 'http://localhost:8083'
def is_ready(name, endpoint, expected):
try:
r = requests.get(f'{endpoint}/v1/health/ready')
return name, 'READY' if r.text == expected else 'FAILED'
except:
return name, "OFFLINE"
print(is_ready('AlphaFold2', AF2_HOST, '{"status":"ready"}'))
print(is_ready('MolMIM', MOLMIM_HOST, '{"status":"ready"}'))
print(is_ready('DiffDock', DIFFDOCK_HOST, 'true'))
Expected output:
('AlphaFold2', 'READY')
('MolMIM', 'READY')
('DiffDock', 'READY')
This example notebook demonstrates how to connect BioNeMo NIMs to carry out a few key steps of a virtual screening workflow. Importantly, these steps are powered by highly performant AI models in each category: AlphaFold2 for folding, MolMIM for molecular generation, and DiffDock for protein-ligand docking.Below, we illustrate this workflow using an example protein and example molecule of interest, the SARS-CoV-2 main protease and Nirmatrelvir.
Protein Folding with AlphaFold2
Copy the following code into a new code-block in jupyter-notebook
and run:
protein = "SGFRKMAFPSGKVEGCMVQVTCGTTTLNGLWLDDVVYCPRHVICTSEDMLNPNYEDLLIRKSNHNFLVQAGNVQLRVIGHSMQNCVLKLKVDTANPKTPKYKFVRIQPGQTFSVLACYNGSPSGVYQCAMRPNFTIKGSFLNGSCGSVGFNIDYDCVSFCYMHHMELPTGVHAGTDLEGNFYGPFVDRQTAQAAGTDTTITVNVLAWLYAAVINGDRWFLNRFTTTLNDFNLVAMKYNYEPLTQDHVDILGPLSAQTGIAVLDMCASLKELLQNGMNGRTILGSALLEDEFTPFDVVRQCSGVTFQ"
af2_response = requests.post(
f'{AF2_HOST}/protein-structure/alphafold2/predict-structure-from-sequence',
json={
'sequence': protein,
'databases': ['uniref90', 'mgnify', 'small_bfd'],
'msa_algorithm': 'jackhmmer',
'e_value': 0.0001,
'bit_score': -1, # -1 means to fallback to the e-value
'msa_iterations': 1,
'relax_prediction': True,
}).json()
print(af2_response[0][:485])
This step can take about 15~20 minutes, depending on the GPU type. It will print the first 5 rows of the result PDB file. Example output:
ATOM 1 N SER A 1 22.994 7.615 -6.454 1.00 78.58 N
ATOM 2 H SER A 1 23.381 6.685 -6.517 1.00 78.58 H
ATOM 3 H2 SER A 1 22.366 7.739 -7.236 1.00 78.58 H
ATOM 4 H3 SER A 1 23.716 8.318 -6.519 1.00 78.58 H
ATOM 5 CA SER A 1 22.213 7.766 -5.199 1.00 78.58 C
ATOM 6 HA SER A 1 22.898 7.757 -4.351 1.00 78.58 H
Molecular Generation with MolMIM
Copy the following code into a new code-block in jupyter-notebook
and run:
molecule = "CC1(C2C1C(N(C2)C(=O)C(C(C)(C)C)NC(=O)C(F)(F)F)C(=O)NC(CC3CCNC3=O)C#N)C"
molmim_response = requests.post(
f'{MOLMIM_HOST}/generate',
json={
'smi': molecule,
'num_molecules': 5,
'algorithm': 'CMA-ES',
'property_name': 'QED',
'min_similarity': 0.7, # Ignored if algorithm is not "CMA-ES".
'iterations': 10,
}).json()
import pandas
pandas.DataFrame(molmim_response['generated'])
Example output:
smiles score
0 CC(C)(C)[C@H](NC(=O)C(F)(F)c1ccccc1Cl)C1CC1 0.877810
1 CC(C)(C)C(NC(=O)Cc1cccc(F)c1Br)C(N)=O 0.877725
2 CCCC(C)(C)NC(=O)C(F)(F)c1ccccc1Cl 0.867557
3 CC(C)(C)C(NC(=O)C(F)(F)F)C(=O)NN1Cc2ccccc2C1=O 0.865407
4 CCCC(C)(C)NC(=O)C(F)(F)c1ccccc1N1CCCC1 0.855187
Molecular Docking with DiffDock
Please note that docking is a downstream task following the generation of protein structure and ligand molecules. So, please be sure the previous two steps by AlphaFold2 and MolMIM are finished to start this step. Copy the following code into a new code-block in jupyter-notebook
and run:
folded_protein = af2_response[0]
generated_ligands = '\n'.join([v['smiles'] for v in molmim_response['generated']])
diffdock_response = requests.post(
f'{DIFFDOCK_HOST}/molecular-docking/diffdock/generate',
json={
'protein': folded_protein,
'ligand': generated_ligands,
'ligand_file_type': 'txt',
'num_poses': 10,
'time_divisions': 20,
'num_steps': 18,
}).json()
for i in range(len(diffdock_response['ligand_positions'])):
print(diffdock_response['ligand_positions'][i][0])
The code above will also print the best pose (top-1 with the highest confidence score) in SDF format for every generated molecule . Example output:
protein_ligand_0
RDKit 3D
21 22 0 0 0 0 0 0 0 0999 V2000
-11.1359 -7.9280 8.5774 C 0 0 0 0 0 0 0 0 0 0 0 0
-11.1762 -9.3281 9.1767 C 0 0 0 0 0 0 0 0 0 0 0 0
-9.7296 -9.6812 9.6177 C 0 0 0 0 0 0 0 0 0 0 0 0
-11.9626 -9.2559 10.4524 C 0 0 0 0 0 0 0 0 0 0 0 0
-11.6435 -10.3516 8.2080 C 0 0 2 0 0 0 0 0 0 0 0 0
-13.0176 -10.1552 7.7807 N 0 0 0 0 0 0 0 0 0 0 0 0
-13.3862 -9.6916 6.4945 C 0 0 0 0 0 0 0 0 0 0 0 0
-12.5155 -9.2515 5.7192 O 0 0 0 0 0 0 0 0 0 0 0 0
-14.8015 -9.7369 6.0902 C 0 0 0 0 0 0 0 0 0 0 0 0
-15.5618 -10.5302 6.9412 F 0 0 0 0 0 0 0 0 0 0 0 0
...