Agent Blueprint: Generative Virtual Screening
Agent Blueprint: Generative Virtual Screening

Basic Usage

It is recommended to use Python>=3.9 and jupyter-notebook to interact with the steps in the Generative Virtual Screening Blueprint. Please run following commands to ensure your have all required dependencies for Python. For more complete example please also refer to GitHub for these files.

Copy
Copied!
            

pip install jupyterlab pandas numpy


In a terminal, please launch a jupyter-notebook with the command:

Copy
Copied!
            

jupyter-notebook


The command will launch your default web-browser with a jupyter-notebook UI.

notebook-1.png

Click the “New” button, and select “Notebook” from the dropdown list.

notebook-2.png

Then a new notebook will be created, in which you can follow the rest of this guide by copying the code blocks into it and run.

notebook-3.png

Note

Because all the models applied here are generative AI with intentional randomness, one may not obtain exactly the same values in the results as shown in the rest of this document, however, the format should be the same.


Copy the following code into a new code-block in jupyter-notebook and run:

Copy
Copied!
            

import requests AF2_HOST = 'http://localhost:8081' DIFFDOCK_HOST = 'http://localhost:8082' MOLMIM_HOST = 'http://localhost:8083' def is_ready(name, endpoint, expected): try: r = requests.get(f'{endpoint}/v1/health/ready') return name, 'READY' if r.text == expected else 'FAILED' except: return name, "OFFLINE" print(is_ready('AlphaFold2', AF2_HOST, '{"status":"ready"}')) print(is_ready('MolMIM', MOLMIM_HOST, '{"status":"ready"}')) print(is_ready('DiffDock', DIFFDOCK_HOST, 'true'))


Expected output:

Copy
Copied!
            

('AlphaFold2', 'READY') ('MolMIM', 'READY') ('DiffDock', 'READY')


This example notebook demonstrates how to connect BioNeMo NIMs to carry out a few key steps of a virtual screening workflow. Importantly, these steps are powered by highly performant AI models in each category: AlphaFold2 for folding, MolMIM for molecular generation, and DiffDock for protein-ligand docking.Below, we illustrate this workflow using an example protein and example molecule of interest, the SARS-CoV-2 main protease and Nirmatrelvir.

Protein Folding with AlphaFold2

Copy the following code into a new code-block in jupyter-notebook and run:

Copy
Copied!
            

protein = "SGFRKMAFPSGKVEGCMVQVTCGTTTLNGLWLDDVVYCPRHVICTSEDMLNPNYEDLLIRKSNHNFLVQAGNVQLRVIGHSMQNCVLKLKVDTANPKTPKYKFVRIQPGQTFSVLACYNGSPSGVYQCAMRPNFTIKGSFLNGSCGSVGFNIDYDCVSFCYMHHMELPTGVHAGTDLEGNFYGPFVDRQTAQAAGTDTTITVNVLAWLYAAVINGDRWFLNRFTTTLNDFNLVAMKYNYEPLTQDHVDILGPLSAQTGIAVLDMCASLKELLQNGMNGRTILGSALLEDEFTPFDVVRQCSGVTFQ" af2_response = requests.post( f'{AF2_HOST}/protein-structure/alphafold2/predict-structure-from-sequence', json={ 'sequence': protein, 'databases': ['uniref90', 'mgnify', 'small_bfd'], 'msa_algorithm': 'jackhmmer', 'e_value': 0.0001, 'bit_score': -1, # -1 means to fallback to the e-value 'msa_iterations': 1, 'relax_prediction': True, }).json() print(af2_response[0][:485])


This step can take about 15~20 minutes, depending on the GPU type. It will print the first 5 rows of the result PDB file. Example output:

Copy
Copied!
            

ATOM 1 N SER A 1 22.994 7.615 -6.454 1.00 78.58 N ATOM 2 H SER A 1 23.381 6.685 -6.517 1.00 78.58 H ATOM 3 H2 SER A 1 22.366 7.739 -7.236 1.00 78.58 H ATOM 4 H3 SER A 1 23.716 8.318 -6.519 1.00 78.58 H ATOM 5 CA SER A 1 22.213 7.766 -5.199 1.00 78.58 C ATOM 6 HA SER A 1 22.898 7.757 -4.351 1.00 78.58 H


Molecular Generation with MolMIM

Copy the following code into a new code-block in jupyter-notebook and run:

Copy
Copied!
            

molecule = "CC1(C2C1C(N(C2)C(=O)C(C(C)(C)C)NC(=O)C(F)(F)F)C(=O)NC(CC3CCNC3=O)C#N)C" molmim_response = requests.post( f'{MOLMIM_HOST}/generate', json={ 'smi': molecule, 'num_molecules': 5, 'algorithm': 'CMA-ES', 'property_name': 'QED', 'min_similarity': 0.7, # Ignored if algorithm is not "CMA-ES". 'iterations': 10, }).json() import pandas pandas.DataFrame(molmim_response['generated'])


Example output:

Copy
Copied!
            

smiles score 0 CC(C)(C)[C@H](NC(=O)C(F)(F)c1ccccc1Cl)C1CC1 0.877810 1 CC(C)(C)C(NC(=O)Cc1cccc(F)c1Br)C(N)=O 0.877725 2 CCCC(C)(C)NC(=O)C(F)(F)c1ccccc1Cl 0.867557 3 CC(C)(C)C(NC(=O)C(F)(F)F)C(=O)NN1Cc2ccccc2C1=O 0.865407 4 CCCC(C)(C)NC(=O)C(F)(F)c1ccccc1N1CCCC1 0.855187


Molecular Docking with DiffDock

Please note that docking is a downstream task following the generation of protein structure and ligand molecules. So, please be sure the previous two steps by AlphaFold2 and MolMIM are finished to start this step. Copy the following code into a new code-block in jupyter-notebook and run:

Copy
Copied!
            

folded_protein = af2_response[0] generated_ligands = '\n'.join([v['smiles'] for v in molmim_response['generated']]) diffdock_response = requests.post( f'{DIFFDOCK_HOST}/molecular-docking/diffdock/generate', json={ 'protein': folded_protein, 'ligand': generated_ligands, 'ligand_file_type': 'txt', 'num_poses': 10, 'time_divisions': 20, 'num_steps': 18, }).json() for i in range(len(diffdock_response['ligand_positions'])): print(diffdock_response['ligand_positions'][i][0])


The code above will also print the best pose (top-1 with the highest confidence score) in SDF format for every generated molecule . Example output:

Copy
Copied!
            

protein_ligand_0 RDKit 3D 21 22 0 0 0 0 0 0 0 0999 V2000 -11.1359 -7.9280 8.5774 C 0 0 0 0 0 0 0 0 0 0 0 0 -11.1762 -9.3281 9.1767 C 0 0 0 0 0 0 0 0 0 0 0 0 -9.7296 -9.6812 9.6177 C 0 0 0 0 0 0 0 0 0 0 0 0 -11.9626 -9.2559 10.4524 C 0 0 0 0 0 0 0 0 0 0 0 0 -11.6435 -10.3516 8.2080 C 0 0 2 0 0 0 0 0 0 0 0 0 -13.0176 -10.1552 7.7807 N 0 0 0 0 0 0 0 0 0 0 0 0 -13.3862 -9.6916 6.4945 C 0 0 0 0 0 0 0 0 0 0 0 0 -12.5155 -9.2515 5.7192 O 0 0 0 0 0 0 0 0 0 0 0 0 -14.8015 -9.7369 6.0902 C 0 0 0 0 0 0 0 0 0 0 0 0 -15.5618 -10.5302 6.9412 F 0 0 0 0 0 0 0 0 0 0 0 0 ...


Previous Getting Started
Next Stopping Containers
© Copyright © 2024, NVIDIA Corporation. Last updated on Aug 29, 2024.