Advanced Usage

Run Inference with Bash Script

In this example, we create a simple bash script to launch inference using two local files as input and dump the generated poses in the output folder.

  1. Create a new blank file in the same folder, name it as diffdock.sh and copy the content below into it.

#!/bin/bash

# Script: diffdock.sh - Run inference using local files as input
# Usage: ./diffdock.sh [receptor].pdb [ligand].sdf

protein_file=$1
ligand_file=$2

protein_bytes=`grep -E ^ATOM $protein_file | sed -z 's/\n/\\\n/g'`
ligand_bytes=`sed -z 's/\n/\\\n/g' $ligand_file`
ligand_format=`basename $ligand_file | awk -F. '{print $NF}'`

echo "{
   \"ligand\": \"${ligand_bytes}\",
   \"ligand_file_type\": \"${ligand_format}\",
   \"protein\": \"${protein_bytes}\",
   \"num_poses\": 10,
   \"time_divisions\": 20,
   \"steps\": 18,
   \"save_trajectory\": false,
   \"is_staged\": false
}" > diffdock.json

curl --header "Content-Type: application/json" \
   --request POST \
   --data @diffdock.json \
   --output output.json \
   http://localhost:8000/molecular-docking/diffdock/generate
  1. Make the script executable.

chmod +x diffdock.sh
  1. Download the input files from RCSB database and launch the inference.

curl -o 8G43.pdb https://files.rcsb.org/download/8G43.pdb
curl -o ZU6.sdf https://files.rcsb.org/ligands/download/ZU6_ideal.sdf
./diffdock.sh 8G43.pdb ZU6.sdf
  1. Dump the output using the python script created in Getting Started.

python3 dump_output.py
ls output
  1. Example of output

rank01_confidence_0.57.sdf  rank06_confidence_-0.25.sdf
rank02_confidence_0.57.sdf  rank07_confidence_-0.69.sdf
rank03_confidence_0.55.sdf  rank08_confidence_-1.31.sdf
rank04_confidence_0.41.sdf  rank09_confidence_-1.90.sdf
rank05_confidence_0.38.sdf  rank10_confidence_-2.07.sdf

Run Inference for Batch-Docking

DiffDock NIM allows for a Batch-Docking mode, which docks a group of ligand molecules against the same protein receptor through a single inference request if a multi-molecule SDF file is submitted in this request. Batch-docking mode is much more efficient than running separate inference requests. The example below illustrates batch-docking using a protein PDB file with five molecule SDF files downloaded from RSCB.

  1. Prepare the SDF input file with multiple ligand molecules. Create a new blank file, name it as make-multiligand.sh, and copy the content below into it.

#!/bin/bash

# Script: make-multiligand.sh
# Usage: ./make-multiligand.sh [Ligand1_CCD_ID] [Ligand2_CCD_ID] ...
# Example: ./make-multiligand.sh COM Q4H QPK R4W SIN

ligan_files=""

for lig in $*
do
    ligand_file=${lig}.sdf
    echo "Download ligand file: ${ligand_file}"
    curl -o $ligand_file "https://files.rcsb.org/ligands/download/${lig}_ideal.sdf"
    ligand_files="${ligand_files} ${ligand_file}"
done

# Combine ligand files into a single SDF file
cat $ligand_files > multi_ligands.sdf
  1. Run the commands below to generate the multi_ligands.sdf for input.

chmod +x make-multiligand.sh
./make-multiligand.sh COM Q4H QPK R4W SIN
  1. Download the protein PDB file and launch the inference.

curl -o 7RWO.pdb "https://files.rcsb.org/download/7RWO.pdb"
./diffdock.sh 7RWO.pdb multi_ligands.sdf
  1. Dump the result and an example of output is below.

python3 dump_output.py
ls output/*

diffdock-output/ligand0:
rank01_confidence_-0.74.sdf  rank05_confidence_-1.15.sdf  rank09_confidence_-1.55.sdf
rank02_confidence_-0.92.sdf  rank06_confidence_-1.25.sdf  rank10_confidence_-1.93.sdf
rank03_confidence_-0.93.sdf  rank07_confidence_-1.46.sdf
rank04_confidence_-1.04.sdf  rank08_confidence_-1.46.sdf

diffdock-output/ligand1:
rank01_confidence_-0.25.sdf  rank05_confidence_-0.55.sdf  rank09_confidence_-0.72.sdf
rank02_confidence_-0.28.sdf  rank06_confidence_-0.55.sdf  rank10_confidence_-0.77.sdf
rank03_confidence_-0.34.sdf  rank07_confidence_-0.56.sdf
rank04_confidence_-0.49.sdf  rank08_confidence_-0.57.sdf

...

Batch-Docking using SMILES

Besides the SDF format for ligand molecules, DiffDock also support SMILES text strings as the input. DiffDock uses RDKit to generate random molecular conformers from the SMILES information. A plain text file can be used as the ligand input with multiple lines, each of which is a SMILES formula representing a molecule, to conduct batch-docking.

  1. Create a new blank file, name it as ligands.txt and copy the content below into it.

Cc1cc(F)c(NC(=O)NCCC(C)(C)C)cc1Nc1ccc2ncn(C)c(=O)c2c1F
COc1cccc(NC(=O)c2ccc(C)c(Nc3nc(-c4cccnc4)nc4c3cnn4C)c2)c1
Cc1nn(C)c(C)c1CCOc1cc(F)ccc1-c1ccc2n[nH]c(CN(C)C)c2c1
Cc1c(C(=O)c2cccc3ccccc23)c2cccc3c2n1[C@H](CN1CCOCC1)CO3
  1. Run the commands below to invoke the DiffDock model. The script generates an input JSON file and returns the inference result in JSON format in the file output.json.

./diffdock.sh 8G43.pdb ligands.txt
  1. Dump the result and check the output folder.

$ python3 dump_output.py
$ ls output/*

diffdock-output/ligand0:
rank01_confidence_-0.98.sdf  rank05_confidence_-1.30.sdf  rank09_confidence_-1.77.sdf
rank02_confidence_-1.00.sdf  rank06_confidence_-1.36.sdf  rank10_confidence_-2.27.sdf
rank03_confidence_-1.03.sdf  rank07_confidence_-1.58.sdf
rank04_confidence_-1.21.sdf  rank08_confidence_-1.61.sdf

diffdock-output/ligand1:
rank01_confidence_-0.15.sdf  rank05_confidence_-1.25.sdf  rank09_confidence_-1.55.sdf
rank02_confidence_-0.54.sdf  rank06_confidence_-1.29.sdf  rank10_confidence_-1.66.sdf
rank03_confidence_-0.91.sdf  rank07_confidence_-1.38.sdf
rank04_confidence_-1.03.sdf  rank08_confidence_-1.39.sdf

...