DiffDock NIM
Important
NVIDIA NIM currently is in limited availability, sign up here to get notified when the latest NIMs are available to download.
DiffDock is a state-of-the-art generative model used for drug discovery that predicts the three-dimensional structure of a protein-ligand complex, a crucial step in the drug discovery process. It predicts the binding structure of a small molecule ligand to a protein, known as molecular docking or pose prediction.
DiffDock can:
Helps AI drug discovery pipelines and opens new research avenues for downstream task integrations.
Highly accurate and computationally efficient
Fast inference times and provides confidence estimates with high selective accuracy.
![_images/diffdock-example.png](_images/diffdock-example.png)
Example output.
Note
A more detailed description of the model can be found in the Model Card.
DiffDock is an equivariant geometric model for blind molecular docking pose estimation. It requires protein and molecule 3D structures as input and does not require any information about a binding pocket. During its diffusion process, the molecule’s position relative to the protein, its orientation, and the torsion angles can change. By running the learned reverse diffusion process, it transforms a distribution of noisy prior molecule poses to the one learned by the model. As a result, it outputs many sampled poses and ranks them via its confidence model. More information can be found at NGC collections.
Model Specific Requirements
The following are specific requirements for DiffDock NIM.
Important
Please refer to NVIDIA NIM documentation for necessary hardware, operating system, and software prerequisites if you have not done so already
Hardware
Supported GPU models:
Minimum GPU memory (GB): 16
Once the above requirements have been met, you will use the Quickstart Guide to pull the NIM container and model, perform a health check and then run inference.
Software
Minimum Driver version: 535.104.05
Quickstart Guide
Note
This page assumes Prerequisite Software (Docker, NGC CLI, NGC registry access) is installed and set up.
Pull the NIM container.
docker pull nvcr.io/nvidia/nim/bionemo_diffdock_nim:24.03.04
Pull the DiffDock model.
ngc registry model download-version "nvidia/nim/bionemo-diffdock:molecular-docking_noarchx1_fp32_24.03.04"
Run container.
1docker run --rm -it --name diffdock \ 2 --runtime=nvidia -e CUDA_VISIBLE_DEVICES=0 \ 3 --shm-size=2G \ 4 --ulimit memlock=-1 \ 5 --ulimit stack=67108864 \ 6 -v $(pwd)/bionemo-diffdock_vmolecular-docking_noarchx1_fp32_24.03.04:/nim_model \ 7 -e MODEL_PATH=/nim_model \ 8 -p 8008:8008 \ 9 nvcr.io/nvidia/nim/bionemo_diffdock_nim:24.03.04
Wait until the gRPC health check returns
true
before proceeding. This may take a couple of minutes.1curl localhost:8008/v2/health/ready 2... 3true
Important
Note the path, you will use this same path for the upcoming step.
Open a new terminal, leaving the current terminal open with the launched service.
Note
Open a new terminal, leaving the current terminal open with the launched service.
Note
The “sed” command is used to convert the multi-line text file into a single line for JSON encoding.
Prepare JSON formatted post-data. This step requires being launched in the most common bash shell environment in Linux. Users can verify if the current session is bash by using the command echo $0. If not, please run the command /bin/bash before this step.
1protein_bytes=`sed -z 's/\n/\\\n/g' bionemo-diffdock_vmolecular-docking_noarchx1_fp32_24.03.04/diffdock/1/data_sample/protein.pdb`; \ 2 ligand_bytes=`sed -z 's/\n/\\\n/g' bionemo-diffdock_vmolecular-docking_noarchx1_fp32_24.03.04/diffdock/1/data_sample/ligand.sdf`; \ 3 echo "{ 4 \"ligand\": \"${ligand_bytes}\", 5 \"ligand_file_type\": \"sdf\", 6 \"protein\": \"${protein_bytes}\", 7 \"num_poses\": 1, 8 \"time_divisions\": 20, 9 \"steps\": 18, 10 \"save_trajectory\": false, 11 \"is_staged\": false 12 }" > diffdock.json
Run Inference and save to output.json.
1curl --header "Content-Type: application/json" \ 2 --request POST \ 3 --data @diffdock.json \ 4 --output output.json \ 5 http://localhost:8008/molecular-docking/diffdock/generate
Available Models
Version |
GPU Model |
Number of GPUs |
Precision |
Memory Footprint |
File Size |
---|---|---|---|---|---|
molecular-docking_noarchx1_fp32_24.03.04 |
Volta, Ampere, Ada or Hopper |
1 |
FP32 |
16 GB |
2.52 GB |
Detailed Instructions
This section provides additional details outside of the scope of the QuickStart guide.
Pull the NIM Container
Container image tags can be seen with the command below, similar to other container images on NGC.
ngc registry image info nvcr.io/nvidia/nim/bionemo_diffdock_nim
1Image Repository Information 2 Name: bionemo_diffdock_nim 3 Display Name: bionemo_diffdock_nim 4 Short Description: Diffdock predicts the 3D structure of the interaction between a molecule and a protein. 5 Built By: 6 Publisher: MIT 7 Multinode Support: False 8 Multi-Arch Support: False 9 Logo: https://assets.ngc.nvidia.com/products/api-catalog/images/diffdock.jpg 10 Labels: NVIDIA AI Enterprise Supported, NVIDIA NIM 11 Public: No 12 Last Updated: May 03, 2024 13 Latest Image Size: 10.84 GB 14 Signed Tag?: False 15 Latest Tag: 24.03.04 16 Tags: 17 24.03.04 18 ...
Pull the container image
docker pull nvcr.io/nvidia/nim/bionemo_diffdock_nim:24.03.04
ngc registry image pull nvcr.io/nvidia/nim/bionemo_diffdock_nim:24.03.04
Pull the DiffDock Model
Model tags follow the versioning of repository:version. The model is called
bionemo-diffdock
and the version follows the naming pattern<VARIANT>_<GPU_TYPE>x<NUM_GPUS>_<precision>_YY.MM.x
. Additional versions are available and can be seen by running the following NGC command line command:ngc registry model list "nvidia/nim/bionemo-diffdock:*"
Download the model
ngc registry model download-version "nvidia/nim/bionemo-diffdock:molecular-docking_noarchx1_fp32_24.03.04"
After downloading the model, you should see the model folder named
bionemo-diffdock_vmolecular-docking_noarchx1_fp32_24.03.04/
in your working directory. This folder contains the assets required for running the service.
Launch Microservice
Launch the container. Start-up may take a couple of minutes until the service is available.
Note
In this example, we’re hosting the OpenAI API-compatible endpoint and health check on port 8008. After you start the Docker command below, you may open another terminal session on the same host and proceed to the next step.
1docker run --rm -it --name diffdock \
2--runtime=nvidia -e CUDA_VISIBLE_DEVICES=0 \
3--shm-size=2G \
4--ulimit memlock=-1 \
5--ulimit stack=67108864 \
6-v $(pwd)/bionemo-diffdock_vmolecular-docking_noarchx1_fp32_24.03.04:/nim_model \
7-e MODEL_PATH=/nim_model \
8-p 8008:8008 \
9nvcr.io/nvidia/nim/bionemo_diffdock_nim:24.03.04
Health and Liveness Checks
The container exposes a health endpoint for integration into existing systems such as Kubernetes. This endpoint only returns an HTTP 200 OK
status code if the service is ready or live, respectively.
1curl localhost:8008/v2/health/ready
2...
3true
Run Inference
Open a new terminal and locate the same path that contains the downloaded model folder.
Create a new blank file in the current folder (where the
bionemo-diffdock_vmolecular-docking_noarchx1_fp32_24.03.04
folder is located), name it asdiffdock.sh
and copy the content below into it.1#!/bin/bash 2 3# Script: diffdock.sh 4# Usage: ./diffdock.sh bionemo-diffdock_vmolecular-docking_noarchx1_fp32_24.03.04/diffdock/1/data_sample/protein.pdb bionemo-diffdock_vmolecular-docking_noarchx1_fp32_24.03.04/diffdock/1/data_sample/ligand.sdf 5# Doc: http://localhost:8008/docs#/Default/molecular_docking 6 7protein_file=$1 8ligand_file=$2 9 10# The next two commands are used to convert the multi-line text file into a single line for JSON encoding. 11protein_bytes=`sed -z 's/\n/\\\n/g' $protein_file` 12ligand_bytes=`sed -z 's/\n/\\\n/g' $ligand_file` 13 14echo "{ 15 \"ligand\": \"${ligand_bytes}\", 16 \"ligand_file_type\": \"sdf\", 17 \"protein\": \"${protein_bytes}\", 18 \"num_poses\": 1, 19 \"time_divisions\": 20, 20 \"steps\": 18, 21 \"save_trajectory\": false, 22 \"is_staged\": false 23}" > diffdock.json 24 25curl --header "Content-Type: application/json" \ 26 --request POST \ 27 --data @diffdock.json \ 28 --output output.json \ 29 http://localhost:8008/molecular-docking/diffdock/generate
Run the commands below to invoke the diffdock model. The script will generate an input JSON file, and return the inference result in JSON format in the file
output.json
.1chmod +x diffdock.sh 2./diffdock.sh bionemo-diffdock_vmolecular-docking_noarchx1_fp32_24.03.04/diffdock/1/data_sample/protein.pdb bionemo-diffdock_vmolecular-docking_noarchx1_fp32_24.03.04/diffdock/1/data_sample/ligand.sdf
The output file
output.json
is a JSON formatted content with predicted docking poses (coordinates of ligand atoms) with the structure demonstrated below:1{ 2"trajectory":"", // This is for intermediate poses during the diffusion sampling. It's empty here because we didn't save the trajectory. 3"ligand_positions": // This is the most important part: the predicted docking pose(s) in SDF format. 4[" 5 7RWO_7WN_A_3001_rank1 6 RDKit 3D 7 8 35 37 0 0 0 0 0 0 0 0999 V2000 9 -2.2332 0.8777 -4.4707 N 0 0 0 0 0 0 0 0 0 0 0 0 10 -3.0333 -0.3266 -4.2170 C 0 0 0 0 0 0 0 0 0 0 0 0 11 -3.3251 -0.4852 -2.7199 C 0 0 0 0 0 0 0 0 0 0 0 0 12 -2.0866 -0.2877 -1.8852 C 0 0 0 0 0 0 0 0 0 0 0 0 13 -2.0960 -0.6741 -0.5378 C 0 0 0 0 0 0 0 0 0 0 0 0 14 -0.9775 -0.4557 0.2690 C 0 0 0 0 0 0 0 0 0 0 0 0 15 1.7812 0.5416 2.9476 C 0 0 0 0 0 0 0 0 0 0 0 0 16 1.8978 -2.1976 3.2004 C 0 0 0 0 0 0 0 0 0 0 0 0 17 1.6589 -1.6404 1.9373 C 0 0 0 0 0 0 0 0 0 0 0 0 18 1.5768 -0.2441 1.8142 C 0 0 0 0 0 0 0 0 0 0 0 0 19 0.1740 0.1524 -0.2611 C 0 0 0 0 0 0 0 0 0 0 0 0 20 0.1733 0.5644 -1.6014 C 0 0 0 0 0 0 0 0 0 0 0 0 21 -0.9527 0.3624 -2.4140 C 0 0 0 0 0 0 0 0 0 0 0 0 22 -0.8963 0.7499 -3.8739 C 0 0 0 0 0 0 0 0 0 0 0 0 23 2.3100 -1.9356 5.6138 C 0 0 0 0 0 0 0 0 0 0 0 0 24 1.3387 0.3920 0.5497 N 0 0 0 0 0 0 0 0 0 0 0 0 25 2.0085 -0.0501 4.1471 N 0 0 0 0 0 0 0 0 0 0 0 0 26 2.0591 -1.3779 4.2848 N 0 0 0 0 0 0 0 0 0 0 0 0 27 1.9693 -3.4511 3.3302 O 0 0 0 0 0 0 0 0 0 0 0 0 28 1.5365 -2.6902 0.5250 Cl 0 0 0 0 0 0 0 0 0 0 0 0 29 -2.7161 1.7061 -4.0482 H 0 0 0 0 0 0 0 0 0 0 0 0 30 -3.9937 -0.2503 -4.7704 H 0 0 0 0 0 0 0 0 0 0 0 0 31 -2.5001 -1.2263 -4.5979 H 0 0 0 0 0 0 0 0 0 0 0 0 32 -4.0828 0.2681 -2.4134 H 0 0 0 0 0 0 0 0 0 0 0 0 33 -3.7483 -1.4972 -2.5394 H 0 0 0 0 0 0 0 0 0 0 0 0 34 -2.9746 -1.1413 -0.1100 H 0 0 0 0 0 0 0 0 0 0 0 0 35 -1.0179 -0.7548 1.3080 H 0 0 0 0 0 0 0 0 0 0 0 0 36 1.7451 1.6212 2.8812 H 0 0 0 0 0 0 0 0 0 0 0 0 37 1.0570 1.0349 -2.0157 H 0 0 0 0 0 0 0 0 0 0 0 0 38 -0.3146 -0.0168 -4.4301 H 0 0 0 0 0 0 0 0 0 0 0 0 39 -0.3617 1.7187 -3.9839 H 0 0 0 0 0 0 0 0 0 0 0 0 40 2.3844 -1.1262 6.3720 H 0 0 0 0 0 0 0 0 0 0 0 0 41 1.4777 -2.6116 5.9042 H 0 0 0 0 0 0 0 0 0 0 0 0 42 3.2645 -2.5036 5.6106 H 0 0 0 0 0 0 0 0 0 0 0 0 43 1.9946 1.1533 0.2617 H 0 0 0 0 0 0 0 0 0 0 0 0 44 1 2 1 0 45 2 3 1 0 46 3 4 1 0 47 4 5 1 0 48 5 6 2 0 49 8 9 1 0 50 7 10 1 0 51 9 10 2 0 52 6 11 1 0 53 11 12 2 0 54 4 13 2 0 55 12 13 1 0 56 1 14 1 0 57 13 14 1 0 58 11 16 1 0 59 10 16 1 0 60 7 17 2 0 61 17 18 1 0 62 8 18 1 0 63 15 18 1 0 64 8 19 2 0 65 M END 66 $$ 67"], 68 69"position_confidence": [-0.6001075506210327], // The confidence estimation from the DiffDock confidence model. 70 71"protein": "CRYST1 45.330 73.690 52.740 90.00 109.70 90.00 P 1 21 1 0..." // The rest parts are the content of input 72}
The
ligand_positions
field is a list of predicted poses, each of which is a text in SDF format.Note
In the real output, the JSON content is in a compact (minified) form, which is not in the formatted printing style above.
Stopping the Container
When you’re done testing the endpoint, you can bring down the container by running docker stop diffdock
in a new terminal.