Advanced Usage
In this example we create a simple bash
script to launch inference using two local files as input and dump the generated poses in the output
folder.
Create a new blank file in the same folder, name it as
diffdock.sh
and copy the content below into it.
#!/bin/bash
# Script: diffdock.sh - Run inference using local files as input
# Usage: ./diffdock.sh [receptor].pdb [ligand].sdf
protein_file=$1
ligand_file=$2
protein_bytes=`grep -E ^ATOM $protein_file | sed -z 's/\n/\\\n/g'`
ligand_bytes=`sed -z 's/\n/\\\n/g' $ligand_file`
ligand_format=`basename $ligand_file | awk -F. '{print $NF}'`
echo "{
\"ligand\": \"${ligand_bytes}\",
\"ligand_file_type\": \"${ligand_format}\",
\"protein\": \"${protein_bytes}\",
\"num_poses\": 10,
\"time_divisions\": 20,
\"steps\": 18,
\"save_trajectory\": false,
\"is_staged\": false
}" > diffdock.json
curl --header "Content-Type: application/json" \
--request POST \
--data @diffdock.json \
--output output.json \
http://localhost:8000/molecular-docking/diffdock/generate
Make the script executable.
chmod +x diffdock.sh
Download the input files from RCSB database, and launch the inference.
curl -o 8G43.pdb https://files.rcsb.org/download/8G43.pdb
curl -o ZU6.sdf https://files.rcsb.org/ligands/download/ZU6_ideal.sdf
./diffdock.sh 8G43.pdb ZU6.sdf
Dump the output using the python script created in
Getting Started
.
python3 dump_output.py
ls output
Example of output
rank01_confidence_0.57.sdf rank06_confidence_-0.25.sdf
rank02_confidence_0.57.sdf rank07_confidence_-0.69.sdf
rank03_confidence_0.55.sdf rank08_confidence_-1.31.sdf
rank04_confidence_0.41.sdf rank09_confidence_-1.90.sdf
rank05_confidence_0.38.sdf rank10_confidence_-2.07.sdf
DiffDock NIM allows for a Batch-Docking mode, which docks a group of ligand molecules against the same protein receptor through a single inference request, if a multi-molecule SDF file is submitted in this request. Compared with running mulitple inference requests one-by-one, it’s much more efficient. The example below is a batch-docking using a protein PDB file with five molecule SDF files that are downloaded from RSCB.
Prepare the SDF input file with multiple ligand molecules. Create a new blank file, name it as
make-multiligand.sh
and copy the content below into it.
#!/bin/bash
# Script: make-multiligand.sh
# Usage: ./make-multiligand.sh [Ligand1_CCD_ID] [Ligand2_CCD_ID] ...
# Example: ./make-multiligand.sh COM Q4H QPK R4W SIN
ligan_files=""
for lig in $*
do
ligand_file=${lig}.sdf
echo "Download ligand file:${ligand_file}"
curl -o $ligand_file "https://files.rcsb.org/ligands/download/${lig}_ideal.sdf"
ligand_files="${ligand_files}${ligand_file}"
done
# Combine ligand files into a single SDF file
cat $ligand_files > multi_ligands.sdf
Run the commands below to generate the
multi_ligands.sdf
for input.
chmod +x make-multiligand.sh
./make-multiligand.sh COM Q4H QPK R4W SIN
Download the protein PDB file and launch the inference.
curl -o 7RWO.pdb "https://files.rcsb.org/download/7RWO.pdb"
./diffdock.sh 7RWO.pdb multi_ligands.sdf
Dump the result and an example of output is below.
python3 dump_output.py
ls output/*
diffdock-output/ligand0:
rank01_confidence_-0.74.sdf rank05_confidence_-1.15.sdf rank09_confidence_-1.55.sdf
rank02_confidence_-0.92.sdf rank06_confidence_-1.25.sdf rank10_confidence_-1.93.sdf
rank03_confidence_-0.93.sdf rank07_confidence_-1.46.sdf
rank04_confidence_-1.04.sdf rank08_confidence_-1.46.sdf
diffdock-output/ligand1:
rank01_confidence_-0.25.sdf rank05_confidence_-0.55.sdf rank09_confidence_-0.72.sdf
rank02_confidence_-0.28.sdf rank06_confidence_-0.55.sdf rank10_confidence_-0.77.sdf
rank03_confidence_-0.34.sdf rank07_confidence_-0.56.sdf
rank04_confidence_-0.49.sdf rank08_confidence_-0.57.sdf
...
Besides the SDF format for ligand molecules, DiffDock also support SMILES text strings as the input. DiffDock uses RDKit to generate random molecular conformers from the SMILES information. A plain text file can be used as the ligand input with multiple lines, each of which is a SMILES formula representing a molecule, to conduct batch-docking.
Create a new blank file, name it as
ligands.txt
and copy the content below into it.
Cc1cc(F)c(NC(=O)NCCC(C)(C)C)cc1Nc1ccc2ncn(C)c(=O)c2c1F
COc1cccc(NC(=O)c2ccc(C)c(Nc3nc(-c4cccnc4)nc4c3cnn4C)c2)c1
Cc1nn(C)c(C)c1CCOc1cc(F)ccc1-c1ccc2n[nH]c(CN(C)C)c2c1
Cc1c(C(=O)c2cccc3ccccc23)c2cccc3c2n1[C@H](CN1CCOCC1)CO3
Run the commands below to invoke the diffdock model. The script will generate an input JSON file, and return the inference result in JSON format in the file
output.json
.
./diffdock.sh 8G43.pdb ligands.txt
Dump the result and check the output folder.
$ python3 dump_output.py
$ ls output/*
diffdock-output/ligand0:
rank01_confidence_-0.98.sdf rank05_confidence_-1.30.sdf rank09_confidence_-1.77.sdf
rank02_confidence_-1.00.sdf rank06_confidence_-1.36.sdf rank10_confidence_-2.27.sdf
rank03_confidence_-1.03.sdf rank07_confidence_-1.58.sdf
rank04_confidence_-1.21.sdf rank08_confidence_-1.61.sdf
diffdock-output/ligand1:
rank01_confidence_-0.15.sdf rank05_confidence_-1.25.sdf rank09_confidence_-1.55.sdf
rank02_confidence_-0.54.sdf rank06_confidence_-1.29.sdf rank10_confidence_-1.66.sdf
rank03_confidence_-0.91.sdf rank07_confidence_-1.38.sdf
rank04_confidence_-1.03.sdf rank08_confidence_-1.39.sdf
...