Basic Usage

It is recommended to use Python>=3.9 and jupyter-notebook to interact with the steps in the Generative Virtual Screening Blueprint. Please run following commands to ensure your have all required dependencies for Python. For more complete example please also refer to GitHub for these files.

Copy
Copied!

            
            pip install jupyterlab pandas numpy

In a terminal, please launch a jupyter-notebook with the command:

Copy
Copied!

            
            jupyter-notebook

The command will launch your default web-browser with a jupyter-notebook UI.

Click the “New” button, and select “Notebook” from the dropdown list.

Then a new notebook will be created, in which you can follow the rest of this guide by copying the code blocks into it and run.

Note

Because all the models applied here are generative AI with intentional randomness, one may not obtain exactly the same values in the results as shown in the rest of this document, however, the format should be the same.

NIM Health Check

Copy the following code into a new code-block in jupyter-notebook and run:

Copy
Copied!

            
            import requests

AF2_HOST      = 'http://localhost:8081'
DIFFDOCK_HOST = 'http://localhost:8082'
MOLMIM_HOST   = 'http://localhost:8083'

def is_ready(name, endpoint, expected):
    try:
        r = requests.get(f'{endpoint}/v1/health/ready')
        return name, 'READY' if r.text == expected else 'FAILED'
    except:
        return name, "OFFLINE"

print(is_ready('AlphaFold2', AF2_HOST, '{"status":"ready"}'))
print(is_ready('MolMIM', MOLMIM_HOST, '{"status":"ready"}'))
print(is_ready('DiffDock', DIFFDOCK_HOST, 'true'))

Expected output:

Copy
Copied!

            
            ('AlphaFold2', 'READY')
('MolMIM', 'READY')
('DiffDock', 'READY')

Example Code

This example notebook demonstrates how to connect BioNeMo NIMs to carry out a few key steps of a virtual screening workflow. Importantly, these steps are powered by highly performant AI models in each category: AlphaFold2 for folding, MolMIM for molecular generation, and DiffDock for protein-ligand docking.Below, we illustrate this workflow using an example protein and example molecule of interest, the SARS-CoV-2 main protease and Nirmatrelvir.

Protein Folding with AlphaFold2

Copy the following code into a new code-block in jupyter-notebook and run:

Copy
Copied!

            
            protein = "SGFRKMAFPSGKVEGCMVQVTCGTTTLNGLWLDDVVYCPRHVICTSEDMLNPNYEDLLIRKSNHNFLVQAGNVQLRVIGHSMQNCVLKLKVDTANPKTPKYKFVRIQPGQTFSVLACYNGSPSGVYQCAMRPNFTIKGSFLNGSCGSVGFNIDYDCVSFCYMHHMELPTGVHAGTDLEGNFYGPFVDRQTAQAAGTDTTITVNVLAWLYAAVINGDRWFLNRFTTTLNDFNLVAMKYNYEPLTQDHVDILGPLSAQTGIAVLDMCASLKELLQNGMNGRTILGSALLEDEFTPFDVVRQCSGVTFQ"

af2_response = requests.post(
   f'{AF2_HOST}/protein-structure/alphafold2/predict-structure-from-sequence',
   json={
     'sequence': protein,
     'databases': ['uniref90', 'mgnify', 'small_bfd'],
     'msa_algorithm': 'jackhmmer',
     'e_value': 0.0001,
     'bit_score': -1, # -1 means to fallback to the e-value
     'msa_iterations': 1,
     'relax_prediction': True,
   }).json()

print(af2_response[0][:485])

This step can take about 15~20 minutes, depending on the GPU type. It will print the first 5 rows of the result PDB file. Example output:

Copy
Copied!

            
            ATOM      1  N   SER A   1      22.994   7.615  -6.454  1.00 78.58           N  
ATOM      2  H   SER A   1      23.381   6.685  -6.517  1.00 78.58           H  
ATOM      3  H2  SER A   1      22.366   7.739  -7.236  1.00 78.58           H  
ATOM      4  H3  SER A   1      23.716   8.318  -6.519  1.00 78.58           H  
ATOM      5  CA  SER A   1      22.213   7.766  -5.199  1.00 78.58           C  
ATOM      6  HA  SER A   1      22.898   7.757  -4.351  1.00 78.58           H

Molecular Generation with MolMIM

Copy the following code into a new code-block in jupyter-notebook and run:

Copy
Copied!

            
            molecule = "CC1(C2C1C(N(C2)C(=O)C(C(C)(C)C)NC(=O)C(F)(F)F)C(=O)NC(CC3CCNC3=O)C#N)C"

molmim_response = requests.post(
  f'{MOLMIM_HOST}/generate',
  json={
    'smi': molecule,
    'num_molecules': 5,
    'algorithm': 'CMA-ES',
    'property_name': 'QED',
    'min_similarity': 0.7, # Ignored if algorithm is not "CMA-ES".
    'iterations': 10,
  }).json()

import pandas 
pandas.DataFrame(molmim_response['generated'])

Example output:

Copy
Copied!

            
                                                       smiles     score
0     CC(C)(C)[C@H](NC(=O)C(F)(F)c1ccccc1Cl)C1CC1  0.877810
1           CC(C)(C)C(NC(=O)Cc1cccc(F)c1Br)C(N)=O  0.877725
2               CCCC(C)(C)NC(=O)C(F)(F)c1ccccc1Cl  0.867557
3  CC(C)(C)C(NC(=O)C(F)(F)F)C(=O)NN1Cc2ccccc2C1=O  0.865407
4          CCCC(C)(C)NC(=O)C(F)(F)c1ccccc1N1CCCC1  0.855187

Molecular Docking with DiffDock

Please note that docking is a downstream task following the generation of protein structure and ligand molecules. So, please be sure the previous two steps by AlphaFold2 and MolMIM are finished to start this step. Copy the following code into a new code-block in jupyter-notebook and run:

Copy
Copied!

            
            folded_protein    = af2_response[0]
generated_ligands = '\n'.join([v['smiles'] for v in molmim_response['generated']])

diffdock_response = requests.post(
  f'{DIFFDOCK_HOST}/molecular-docking/diffdock/generate',
  json={
    'protein': folded_protein,
    'ligand': generated_ligands,
    'ligand_file_type': 'txt',
    'num_poses': 10,
    'time_divisions': 20,
    'num_steps': 18,
  }).json()

for i in range(len(diffdock_response['ligand_positions'])):
  print(diffdock_response['ligand_positions'][i][0])

The code above will also print the best pose (top-1 with the highest confidence score) in SDF format for every generated molecule . Example output:

Copy
Copied!

            
            protein_ligand_0
     RDKit          3D

 21 22  0  0  0  0  0  0  0  0999 V2000
  -11.1359   -7.9280    8.5774 C   0  0  0  0  0  0  0  0  0  0  0  0
  -11.1762   -9.3281    9.1767 C   0  0  0  0  0  0  0  0  0  0  0  0
   -9.7296   -9.6812    9.6177 C   0  0  0  0  0  0  0  0  0  0  0  0
  -11.9626   -9.2559   10.4524 C   0  0  0  0  0  0  0  0  0  0  0  0
  -11.6435  -10.3516    8.2080 C   0  0  2  0  0  0  0  0  0  0  0  0
  -13.0176  -10.1552    7.7807 N   0  0  0  0  0  0  0  0  0  0  0  0
  -13.3862   -9.6916    6.4945 C   0  0  0  0  0  0  0  0  0  0  0  0
  -12.5155   -9.2515    5.7192 O   0  0  0  0  0  0  0  0  0  0  0  0
  -14.8015   -9.7369    6.0902 C   0  0  0  0  0  0  0  0  0  0  0  0
  -15.5618  -10.5302    6.9412 F   0  0  0  0  0  0  0  0  0  0  0  0
...

Previous Getting Started

Next Stopping Containers