DiffDock NIM

Important

NVIDIA NIM currently is in limited availability, sign up here to get notified when the latest NIMs are available to download.

DiffDock is a state-of-the-art generative model used for drug discovery that predicts the three-dimensional structure of a protein-ligand complex, a crucial step in the drug discovery process. It predicts the binding structure of a small molecule ligand to a protein, known as molecular docking or pose prediction.

DiffDock can:

Helps AI drug discovery pipelines and opens new research avenues for downstream task integrations.
Highly accurate and computationally efficient
Fast inference times and provides confidence estimates with high selective accuracy.

_images/diffdock-example.png — Example output.

Note

A more detailed description of the model can be found in the Model Card.

DiffDock is an equivariant geometric model for blind molecular docking pose estimation. It requires protein and molecule 3D structures as input and does not require any information about a binding pocket. During its diffusion process, the molecule’s position relative to the protein, its orientation, and the torsion angles can change. By running the learned reverse diffusion process, it transforms a distribution of noisy prior molecule poses to the one learned by the model. As a result, it outputs many sampled poses and ranks them via its confidence model. More information can be found at NGC collections.

Model Specific Requirements

The following are specific requirements for DiffDock NIM.

Important

Please refer to NVIDIA NIM documentation for necessary hardware, operating system, and software prerequisites if you have not done so already

Hardware

Supported GPU models:
- Hopper GPUs (H100)
- Ampere GPUs (e.g., A100 and A6000, details can be found here)
- Ada GPUs (e.g., L40S, details can be found here)
- Volta GPUs (e.g., V100, details can be here)
Minimum GPU memory (GB): 16

Once the above requirements have been met, you will use the Quickstart Guide to pull the NIM container and model, perform a health check and then run inference.

Software

Minimum Driver version: 535.104.05

Quickstart Guide

Note

This page assumes Prerequisite Software (Docker, NGC CLI, NGC registry access) is installed and set up.

Pull the NIM container.

docker pull nvcr.io/nvidia/nim/bionemo_diffdock_nim:24.03.04

Pull the DiffDock model.

ngc registry model download-version "nvidia/nim/bionemo-diffdock:molecular-docking_noarchx1_fp32_24.03.04"

Run container.

docker run --rm -it --name diffdock \
   --runtime=nvidia -e CUDA_VISIBLE_DEVICES=0 \
   --shm-size=2G \
   --ulimit memlock=-1 \
   --ulimit stack=67108864 \
   -v $(pwd)/bionemo-diffdock_vmolecular-docking_noarchx1_fp32_24.03.04:/nim_model \
   -e MODEL_PATH=/nim_model \
   -p 8008:8008 \
   nvcr.io/nvidia/nim/bionemo_diffdock_nim:24.03.04

Wait until the gRPC health check returns true before proceeding. This may take a couple of minutes.
1curl localhost:8008/v2/health/ready 2... 3true
Important

Note the path, you will use this same path for the upcoming step.
Open a new terminal, leaving the current terminal open with the launched service.

Note

Open a new terminal, leaving the current terminal open with the launched service.

Note

The “sed” command is used to convert the multi-line text file into a single line for JSON encoding.

Prepare JSON formatted post-data. This step requires being launched in the most common bash shell environment in Linux. Users can verify if the current session is bash by using the command echo $0. If not, please run the command /bin/bash before this step.

protein_bytes=`sed -z 's/\n/\\\n/g' bionemo-diffdock_vmolecular-docking_noarchx1_fp32_24.03.04/diffdock/1/data_sample/protein.pdb`; \
   ligand_bytes=`sed -z 's/\n/\\\n/g'  bionemo-diffdock_vmolecular-docking_noarchx1_fp32_24.03.04/diffdock/1/data_sample/ligand.sdf`; \
   echo "{
      \"ligand\": \"${ligand_bytes}\",
      \"ligand_file_type\": \"sdf\",
      \"protein\": \"${protein_bytes}\",
      \"num_poses\": 1,
      \"time_divisions\": 20,
      \"steps\": 18,
      \"save_trajectory\": false,
      \"is_staged\": false
   }" > diffdock.json

Run Inference and save to output.json.

curl --header "Content-Type: application/json" \
    --request POST \
    --data @diffdock.json \
    --output output.json \
    http://localhost:8008/molecular-docking/diffdock/generate

Available Models

Version	GPU Model	Number of GPUs	Precision	Memory Footprint	File Size
molecular-docking_noarchx1_fp32_24.03.04	Volta, Ampere, Ada or Hopper	1	FP32	16 GB	2.52 GB

Detailed Instructions

This section provides additional details outside of the scope of the QuickStart guide.

Pull the NIM Container

Container image tags can be seen with the command below, similar to other container images on NGC.

ngc registry image info nvcr.io/nvidia/nim/bionemo_diffdock_nim

Image Repository Information
   Name: bionemo_diffdock_nim
   Display Name: bionemo_diffdock_nim
   Short Description: Diffdock predicts the 3D structure of the interaction between a molecule and a protein.
   Built By:
   Publisher: MIT
   Multinode Support: False
   Multi-Arch Support: False
   Logo: https://assets.ngc.nvidia.com/products/api-catalog/images/diffdock.jpg
   Labels: NVIDIA AI Enterprise  Supported, NVIDIA NIM
   Public: No
   Last Updated: May 03, 2024
   Latest Image Size: 10.84 GB
   Signed Tag?: False
   Latest Tag: 24.03.04
   Tags:
       24.03.04
       ...

Pull the container image

Docker

docker pull nvcr.io/nvidia/nim/bionemo_diffdock_nim:24.03.04

NGC

ngc registry image pull nvcr.io/nvidia/nim/bionemo_diffdock_nim:24.03.04

Pull the DiffDock Model

Model tags follow the versioning of repository:version. The model is called bionemo-diffdock and the version follows the naming pattern <VARIANT>_<GPU_TYPE>x<NUM_GPUS>_<precision>_YY.MM.x. Additional versions are available and can be seen by running the following NGC command line command:
ngc registry model list "nvidia/nim/bionemo-diffdock:*"

Download the model

ngc registry model download-version "nvidia/nim/bionemo-diffdock:molecular-docking_noarchx1_fp32_24.03.04"

After downloading the model, you should see the model folder named bionemo-diffdock_vmolecular-docking_noarchx1_fp32_24.03.04/ in your working directory. This folder contains the assets required for running the service.

Launch Microservice

Launch the container. Start-up may take a couple of minutes until the service is available.

Note

In this example, we’re hosting the OpenAI API-compatible endpoint and health check on port 8008. After you start the Docker command below, you may open another terminal session on the same host and proceed to the next step.

docker run --rm -it --name diffdock \
--runtime=nvidia -e CUDA_VISIBLE_DEVICES=0 \
--shm-size=2G \
--ulimit memlock=-1 \
--ulimit stack=67108864 \
-v $(pwd)/bionemo-diffdock_vmolecular-docking_noarchx1_fp32_24.03.04:/nim_model \
-e MODEL_PATH=/nim_model \
-p 8008:8008 \
nvcr.io/nvidia/nim/bionemo_diffdock_nim:24.03.04

Health and Liveness Checks

The container exposes a health endpoint for integration into existing systems such as Kubernetes. This endpoint only returns an HTTP 200 OK status code if the service is ready or live, respectively.

curl localhost:8008/v2/health/ready
...
true

Run Inference

Open a new terminal and locate the same path that contains the downloaded model folder.

Create a new blank file in the current folder (where the bionemo-diffdock_vmolecular-docking_noarchx1_fp32_24.03.04 folder is located), name it as diffdock.sh and copy the content below into it.

#!/bin/bash

# Script: diffdock.sh
# Usage: ./diffdock.sh bionemo-diffdock_vmolecular-docking_noarchx1_fp32_24.03.04/diffdock/1/data_sample/protein.pdb bionemo-diffdock_vmolecular-docking_noarchx1_fp32_24.03.04/diffdock/1/data_sample/ligand.sdf
# Doc: http://localhost:8008/docs#/Default/molecular_docking

protein_file=$1
ligand_file=$2

# The next two commands are used to convert the multi-line text file into a single line for JSON encoding.
protein_bytes=`sed -z 's/\n/\\\n/g' $protein_file`
ligand_bytes=`sed -z 's/\n/\\\n/g' $ligand_file`

echo "{
   \"ligand\": \"${ligand_bytes}\",
   \"ligand_file_type\": \"sdf\",
   \"protein\": \"${protein_bytes}\",
   \"num_poses\": 1,
   \"time_divisions\": 20,
   \"steps\": 18,
   \"save_trajectory\": false,
   \"is_staged\": false
}" > diffdock.json

curl --header "Content-Type: application/json" \
   --request POST \
   --data @diffdock.json \
   --output output.json \
   http://localhost:8008/molecular-docking/diffdock/generate

Run the commands below to invoke the diffdock model. The script will generate an input JSON file, and return the inference result in JSON format in the file output.json.

chmod +x diffdock.sh
./diffdock.sh bionemo-diffdock_vmolecular-docking_noarchx1_fp32_24.03.04/diffdock/1/data_sample/protein.pdb bionemo-diffdock_vmolecular-docking_noarchx1_fp32_24.03.04/diffdock/1/data_sample/ligand.sdf

The output file output.json is a JSON formatted content with predicted docking poses (coordinates of ligand atoms) with the structure demonstrated below:

{
"trajectory":"", // This is for intermediate poses during the diffusion sampling. It's empty here because we didn't save the trajectory.
"ligand_positions": // This is the most important part: the predicted docking pose(s) in SDF format.
["
    7RWO_7WN_A_3001_rank1
    RDKit          3D

    35 37  0  0  0  0  0  0  0  0999 V2000
       -2.2332    0.8777   -4.4707 N   0  0  0  0  0  0  0  0  0  0  0  0
       -3.0333   -0.3266   -4.2170 C   0  0  0  0  0  0  0  0  0  0  0  0
       -3.3251   -0.4852   -2.7199 C   0  0  0  0  0  0  0  0  0  0  0  0
       -2.0866   -0.2877   -1.8852 C   0  0  0  0  0  0  0  0  0  0  0  0
       -2.0960   -0.6741   -0.5378 C   0  0  0  0  0  0  0  0  0  0  0  0
       -0.9775   -0.4557    0.2690 C   0  0  0  0  0  0  0  0  0  0  0  0
        1.7812    0.5416    2.9476 C   0  0  0  0  0  0  0  0  0  0  0  0
        1.8978   -2.1976    3.2004 C   0  0  0  0  0  0  0  0  0  0  0  0
        1.6589   -1.6404    1.9373 C   0  0  0  0  0  0  0  0  0  0  0  0
        1.5768   -0.2441    1.8142 C   0  0  0  0  0  0  0  0  0  0  0  0
        0.1740    0.1524   -0.2611 C   0  0  0  0  0  0  0  0  0  0  0  0
        0.1733    0.5644   -1.6014 C   0  0  0  0  0  0  0  0  0  0  0  0
       -0.9527    0.3624   -2.4140 C   0  0  0  0  0  0  0  0  0  0  0  0
       -0.8963    0.7499   -3.8739 C   0  0  0  0  0  0  0  0  0  0  0  0
        2.3100   -1.9356    5.6138 C   0  0  0  0  0  0  0  0  0  0  0  0
        1.3387    0.3920    0.5497 N   0  0  0  0  0  0  0  0  0  0  0  0
        2.0085   -0.0501    4.1471 N   0  0  0  0  0  0  0  0  0  0  0  0
        2.0591   -1.3779    4.2848 N   0  0  0  0  0  0  0  0  0  0  0  0
        1.9693   -3.4511    3.3302 O   0  0  0  0  0  0  0  0  0  0  0  0
        1.5365   -2.6902    0.5250 Cl  0  0  0  0  0  0  0  0  0  0  0  0
       -2.7161    1.7061   -4.0482 H   0  0  0  0  0  0  0  0  0  0  0  0
       -3.9937   -0.2503   -4.7704 H   0  0  0  0  0  0  0  0  0  0  0  0
       -2.5001   -1.2263   -4.5979 H   0  0  0  0  0  0  0  0  0  0  0  0
       -4.0828    0.2681   -2.4134 H   0  0  0  0  0  0  0  0  0  0  0  0
       -3.7483   -1.4972   -2.5394 H   0  0  0  0  0  0  0  0  0  0  0  0
       -2.9746   -1.1413   -0.1100 H   0  0  0  0  0  0  0  0  0  0  0  0
       -1.0179   -0.7548    1.3080 H   0  0  0  0  0  0  0  0  0  0  0  0
        1.7451    1.6212    2.8812 H   0  0  0  0  0  0  0  0  0  0  0  0
        1.0570    1.0349   -2.0157 H   0  0  0  0  0  0  0  0  0  0  0  0
       -0.3146   -0.0168   -4.4301 H   0  0  0  0  0  0  0  0  0  0  0  0
       -0.3617    1.7187   -3.9839 H   0  0  0  0  0  0  0  0  0  0  0  0
        2.3844   -1.1262    6.3720 H   0  0  0  0  0  0  0  0  0  0  0  0
        1.4777   -2.6116    5.9042 H   0  0  0  0  0  0  0  0  0  0  0  0
        3.2645   -2.5036    5.6106 H   0  0  0  0  0  0  0  0  0  0  0  0
        1.9946    1.1533    0.2617 H   0  0  0  0  0  0  0  0  0  0  0  0
    1  2  1  0
    2  3  1  0
    3  4  1  0
    4  5  1  0
    5  6  2  0
    8  9  1  0
    7 10  1  0
    9 10  2  0
    6 11  1  0
   11 12  2  0
    4 13  2  0
   12 13  1  0
    1 14  1  0
   13 14  1  0
   11 16  1  0
   10 16  1  0
    7 17  2  0
   17 18  1  0
    8 18  1  0
   15 18  1  0
    8 19  2  0
    M  END
    $$
"],

"position_confidence": [-0.6001075506210327], // The confidence estimation from the DiffDock confidence model.

"protein": "CRYST1   45.330   73.690   52.740  90.00 109.70  90.00 P 1 21 1      0..." // The rest parts are the content of input
}

The ligand_positions field is a list of predicted poses, each of which is a text in SDF format.

Note

In the real output, the JSON content is in a compact (minified) form, which is not in the formatted printing style above.

Stopping the Container

When you’re done testing the endpoint, you can bring down the container by running docker stop diffdock in a new terminal.