Advanced Usage#

Run Inference with Bash Script#

In this example, we create a simple bash script to launch inference using two local files as input and dump the generated poses in the output folder.

  1. Create a new blank file in the same folder, name it as diffdock.sh and copy the content below into it.

#!/bin/bash

# Script: diffdock.sh - Run inference using local files as input
# Usage: ./diffdock.sh [receptor].pdb [ligand].sdf

protein_file=$1
ligand_file=$2

protein_bytes=`grep -E ^ATOM $protein_file | sed -z 's/\n/\\\n/g'`
ligand_bytes=`sed -z 's/\n/\\\n/g' $ligand_file`
ligand_format=`basename $ligand_file | awk -F. '{print $NF}'`

echo "{
   \"ligand\": \"${ligand_bytes}\",
   \"ligand_file_type\": \"${ligand_format}\",
   \"protein\": \"${protein_bytes}\",
   \"num_poses\": 10,
   \"time_divisions\": 20,
   \"steps\": 18,
   \"save_trajectory\": false,
   \"is_staged\": false
}" > diffdock.json

curl --header "Content-Type: application/json" \
   --request POST \
   --data @diffdock.json \
   --output output.json \
   http://localhost:8000/molecular-docking/diffdock/generate
  1. Make the script executable.

chmod +x diffdock.sh
  1. Download the input files from RCSB database and launch the inference.

curl -o 8G43.pdb https://files.rcsb.org/download/8G43.pdb
curl -o ZU6.sdf https://files.rcsb.org/ligands/download/ZU6_ideal.sdf
./diffdock.sh 8G43.pdb ZU6.sdf
  1. Dump the output using the python script created in Getting Started.

python3 dump_output.py
ls output
  1. Example of output

rank01_confidence_0.57.sdf  rank06_confidence_-0.25.sdf
rank02_confidence_0.57.sdf  rank07_confidence_-0.69.sdf
rank03_confidence_0.55.sdf  rank08_confidence_-1.31.sdf
rank04_confidence_0.41.sdf  rank09_confidence_-1.90.sdf
rank05_confidence_0.38.sdf  rank10_confidence_-2.07.sdf

Run Inference for Batch-Docking#

DiffDock NIM allows for a Batch-Docking mode, which docks a group of ligand molecules against the same protein receptor through a single inference request if a multi-molecule SDF file is submitted in this request. Batch-docking mode is much more efficient than running separate inference requests. The example below illustrates batch-docking using a protein PDB file with five molecule SDF files downloaded from RSCB.

  1. Prepare the SDF input file with multiple ligand molecules. Create a new blank file, name it as make-multiligand.sh, and copy the content below into it.

#!/bin/bash

# Script: make-multiligand.sh
# Usage: ./make-multiligand.sh [Ligand1_CCD_ID] [Ligand2_CCD_ID] ...
# Example: ./make-multiligand.sh COM Q4H QPK R4W SIN

ligand_files=""

for lig in $*
do
    ligand_file=${lig}.sdf
    echo "Download ligand file: ${ligand_file}"
    curl -o $ligand_file "https://files.rcsb.org/ligands/download/${lig}_ideal.sdf"
    ligand_files="${ligand_files} ${ligand_file}"
done

# Combine ligand files into a single SDF file
cat $ligand_files > multi_ligands.sdf
  1. Run the commands below to generate the multi_ligands.sdf for input.

chmod +x make-multiligand.sh
./make-multiligand.sh COM Q4H QPK R4W SIN
  1. Download the protein PDB file and launch the inference.

curl -o 7RWO.pdb "https://files.rcsb.org/download/7RWO.pdb"
./diffdock.sh 7RWO.pdb multi_ligands.sdf
  1. Dump the result and an example of output is below.

python3 dump_output.py
ls output/*

diffdock-output/ligand0:
rank01_confidence_-0.74.sdf  rank05_confidence_-1.15.sdf  rank09_confidence_-1.55.sdf
rank02_confidence_-0.92.sdf  rank06_confidence_-1.25.sdf  rank10_confidence_-1.93.sdf
rank03_confidence_-0.93.sdf  rank07_confidence_-1.46.sdf
rank04_confidence_-1.04.sdf  rank08_confidence_-1.46.sdf

diffdock-output/ligand1:
rank01_confidence_-0.25.sdf  rank05_confidence_-0.55.sdf  rank09_confidence_-0.72.sdf
rank02_confidence_-0.28.sdf  rank06_confidence_-0.55.sdf  rank10_confidence_-0.77.sdf
rank03_confidence_-0.34.sdf  rank07_confidence_-0.56.sdf
rank04_confidence_-0.49.sdf  rank08_confidence_-0.57.sdf

...

Batch-Docking using SMILES#

Besides the SDF format for ligand molecules, DiffDock also support SMILES text strings as the input. DiffDock uses RDKit to generate random molecular conformers from the SMILES information. A plain text file can be used as the ligand input with multiple lines, each of which is a SMILES formula representing a molecule, to conduct batch-docking.

  1. Create a new blank file, name it as ligands.txt and copy the content below into it.

Cc1cc(F)c(NC(=O)NCCC(C)(C)C)cc1Nc1ccc2ncn(C)c(=O)c2c1F
COc1cccc(NC(=O)c2ccc(C)c(Nc3nc(-c4cccnc4)nc4c3cnn4C)c2)c1
Cc1nn(C)c(C)c1CCOc1cc(F)ccc1-c1ccc2n[nH]c(CN(C)C)c2c1
Cc1c(C(=O)c2cccc3ccccc23)c2cccc3c2n1[C@H](CN1CCOCC1)CO3
  1. Run the commands below to invoke the DiffDock model. The script generates an input JSON file and returns the inference result in JSON format in the file output.json.

./diffdock.sh 8G43.pdb ligands.txt
  1. Dump the result and check the output folder.

$ python3 dump_output.py
$ ls output/*

diffdock-output/ligand0:
rank01_confidence_-0.98.sdf  rank05_confidence_-1.30.sdf  rank09_confidence_-1.77.sdf
rank02_confidence_-1.00.sdf  rank06_confidence_-1.36.sdf  rank10_confidence_-2.27.sdf
rank03_confidence_-1.03.sdf  rank07_confidence_-1.58.sdf
rank04_confidence_-1.21.sdf  rank08_confidence_-1.61.sdf

diffdock-output/ligand1:
rank01_confidence_-0.15.sdf  rank05_confidence_-1.25.sdf  rank09_confidence_-1.55.sdf
rank02_confidence_-0.54.sdf  rank06_confidence_-1.29.sdf  rank10_confidence_-1.66.sdf
rank03_confidence_-0.91.sdf  rank07_confidence_-1.38.sdf
rank04_confidence_-1.03.sdf  rank08_confidence_-1.39.sdf

...