Advanced Usage#
Run Inference with Bash Script#
In this example, we create a simple bash
script to launch inference using two local files as input and dump the generated poses in the output
folder.
Create a new blank file in the same folder, name it as
diffdock.sh
and copy the content below into it.
#!/bin/bash # Script: diffdock.sh - Run inference using local files as input # Usage: ./diffdock.sh [receptor].pdb [ligand].sdf protein_file=$1 ligand_file=$2 protein_bytes=`grep -E ^ATOM $protein_file | sed -z 's/\n/\\\n/g'` ligand_bytes=`sed -z 's/\n/\\\n/g' $ligand_file` ligand_format=`basename $ligand_file | awk -F. '{print $NF}'` echo "{ \"ligand\": \"${ligand_bytes}\", \"ligand_file_type\": \"${ligand_format}\", \"protein\": \"${protein_bytes}\", \"num_poses\": 10, \"time_divisions\": 20, \"steps\": 18, \"save_trajectory\": false, \"is_staged\": false }" > diffdock.json curl --header "Content-Type: application/json" \ --request POST \ --data @diffdock.json \ --output output.json \ http://localhost:8000/molecular-docking/diffdock/generate
Make the script executable.
chmod +x diffdock.sh
Download the input files from RCSB database and launch the inference.
curl -o 8G43.pdb https://files.rcsb.org/download/8G43.pdb curl -o ZU6.sdf https://files.rcsb.org/ligands/download/ZU6_ideal.sdf ./diffdock.sh 8G43.pdb ZU6.sdf
Dump the output using the python script created in
Getting Started
.
python3 dump_output.py ls output
Example of output
rank01_confidence_0.57.sdf rank06_confidence_-0.25.sdf rank02_confidence_0.57.sdf rank07_confidence_-0.69.sdf rank03_confidence_0.55.sdf rank08_confidence_-1.31.sdf rank04_confidence_0.41.sdf rank09_confidence_-1.90.sdf rank05_confidence_0.38.sdf rank10_confidence_-2.07.sdf
Run Inference for Batch-Docking#
DiffDock NIM allows for a Batch-Docking mode, which docks a group of ligand molecules against the same protein receptor through a single inference request if a multi-molecule SDF file is submitted in this request. Batch-docking mode is much more efficient than running separate inference requests. The example below illustrates batch-docking using a protein PDB file with five molecule SDF files downloaded from RSCB.
Prepare the SDF input file with multiple ligand molecules. Create a new blank file, name it as
make-multiligand.sh
, and copy the content below into it.
#!/bin/bash # Script: make-multiligand.sh # Usage: ./make-multiligand.sh [Ligand1_CCD_ID] [Ligand2_CCD_ID] ... # Example: ./make-multiligand.sh COM Q4H QPK R4W SIN ligand_files="" for lig in $* do ligand_file=${lig}.sdf echo "Download ligand file: ${ligand_file}" curl -o $ligand_file "https://files.rcsb.org/ligands/download/${lig}_ideal.sdf" ligand_files="${ligand_files} ${ligand_file}" done # Combine ligand files into a single SDF file cat $ligand_files > multi_ligands.sdf
Run the commands below to generate the
multi_ligands.sdf
for input.
chmod +x make-multiligand.sh ./make-multiligand.sh COM Q4H QPK R4W SIN
Download the protein PDB file and launch the inference.
curl -o 7RWO.pdb "https://files.rcsb.org/download/7RWO.pdb" ./diffdock.sh 7RWO.pdb multi_ligands.sdf
Dump the result and an example of output is below.
python3 dump_output.py ls output/* diffdock-output/ligand0: rank01_confidence_-0.74.sdf rank05_confidence_-1.15.sdf rank09_confidence_-1.55.sdf rank02_confidence_-0.92.sdf rank06_confidence_-1.25.sdf rank10_confidence_-1.93.sdf rank03_confidence_-0.93.sdf rank07_confidence_-1.46.sdf rank04_confidence_-1.04.sdf rank08_confidence_-1.46.sdf diffdock-output/ligand1: rank01_confidence_-0.25.sdf rank05_confidence_-0.55.sdf rank09_confidence_-0.72.sdf rank02_confidence_-0.28.sdf rank06_confidence_-0.55.sdf rank10_confidence_-0.77.sdf rank03_confidence_-0.34.sdf rank07_confidence_-0.56.sdf rank04_confidence_-0.49.sdf rank08_confidence_-0.57.sdf ...
Batch-Docking using SMILES#
Besides the SDF format for ligand molecules, DiffDock also support SMILES text strings as the input. DiffDock uses RDKit to generate random molecular conformers from the SMILES information. A plain text file can be used as the ligand input with multiple lines, each of which is a SMILES formula representing a molecule, to conduct batch-docking.
Create a new blank file, name it as
ligands.txt
and copy the content below into it.
Cc1cc(F)c(NC(=O)NCCC(C)(C)C)cc1Nc1ccc2ncn(C)c(=O)c2c1F COc1cccc(NC(=O)c2ccc(C)c(Nc3nc(-c4cccnc4)nc4c3cnn4C)c2)c1 Cc1nn(C)c(C)c1CCOc1cc(F)ccc1-c1ccc2n[nH]c(CN(C)C)c2c1 Cc1c(C(=O)c2cccc3ccccc23)c2cccc3c2n1[C@H](CN1CCOCC1)CO3
Run the commands below to invoke the DiffDock model. The script generates an input JSON file and returns the inference result in JSON format in the file
output.json
.
./diffdock.sh 8G43.pdb ligands.txt
Dump the result and check the output folder.
$ python3 dump_output.py $ ls output/* diffdock-output/ligand0: rank01_confidence_-0.98.sdf rank05_confidence_-1.30.sdf rank09_confidence_-1.77.sdf rank02_confidence_-1.00.sdf rank06_confidence_-1.36.sdf rank10_confidence_-2.27.sdf rank03_confidence_-1.03.sdf rank07_confidence_-1.58.sdf rank04_confidence_-1.21.sdf rank08_confidence_-1.61.sdf diffdock-output/ligand1: rank01_confidence_-0.15.sdf rank05_confidence_-1.25.sdf rank09_confidence_-1.55.sdf rank02_confidence_-0.54.sdf rank06_confidence_-1.29.sdf rank10_confidence_-1.66.sdf rank03_confidence_-0.91.sdf rank07_confidence_-1.38.sdf rank04_confidence_-1.03.sdf rank08_confidence_-1.39.sdf ...