DiffDock NIM

Important

NVIDIA NIM currently is in limited availability, sign up here to get notified when the latest NIMs are available to download.

DiffDock is a state-of-the-art generative model used for drug discovery that predicts the three-dimensional structure of a protein-ligand complex, a crucial step in the drug discovery process. It predicts the binding structure of a small molecule ligand to a protein, known as molecular docking or pose prediction.

DiffDock can:

  • Helps AI drug discovery pipelines and opens new research avenues for downstream task integrations.

  • Highly accurate and computationally efficient

  • Fast inference times and provides confidence estimates with high selective accuracy.

_images/diffdock-example.png

Example output.

Note

A more detailed description of the model can be found in the Model Card.

DiffDock is an equivariant geometric model for blind molecular docking pose estimation. It requires protein and molecule 3D structures as input and does not require any information about a binding pocket. During its diffusion process, the molecule’s position relative to the protein, its orientation, and the torsion angles can change. By running the learned reverse diffusion process, it transforms a distribution of noisy prior molecule poses to the one learned by the model. As a result, it outputs many sampled poses and ranks them via its confidence model. More information can be found at NGC collections.

Model Specific Requirements

The following are specific requirements for DiffDock NIM.

Important

Please refer to NVIDIA NIM documentation for necessary hardware, operating system, and software prerequisites if you have not done so already

Hardware

  • Supported GPU models:

    • Hopper GPUs (H100)

    • Ampere GPUs (e.g., A100 and A6000, details can be found here)

    • Ada GPUs (e.g., L40S, details can be found here)

    • Volta GPUs (e.g., V100, details can be here)

  • Minimum GPU memory (GB): 16

Once the above requirements have been met, you will use the Quickstart Guide to pull the NIM container and model, perform a health check and then run inference.

Software

  • Minimum Driver version: 535.104.05

Quickstart Guide

Note

This page assumes Prerequisite Software (Docker, NGC CLI, NGC registry access) is installed and set up.

  1. Pull the NIM container.

    docker pull nvcr.io/nvidia/nim/bionemo_diffdock_nim:24.03.04
    
  2. Pull the DiffDock model.

    ngc registry model download-version "nvidia/nim/bionemo-diffdock:molecular-docking_noarchx1_fp32_24.03.04"
    
  3. Run container.

    1docker run --rm -it --name diffdock \
    2   --runtime=nvidia -e CUDA_VISIBLE_DEVICES=0 \
    3   --shm-size=2G \
    4   --ulimit memlock=-1 \
    5   --ulimit stack=67108864 \
    6   -v $(pwd)/bionemo-diffdock_vmolecular-docking_noarchx1_fp32_24.03.04:/nim_model \
    7   -e MODEL_PATH=/nim_model \
    8   -p 8008:8008 \
    9   nvcr.io/nvidia/nim/bionemo_diffdock_nim:24.03.04
    
  4. Wait until the gRPC health check returns true before proceeding. This may take a couple of minutes.

    1curl localhost:8008/v2/health/ready
    2...
    3true
    

    Important

    Note the path, you will use this same path for the upcoming step.

  5. Open a new terminal, leaving the current terminal open with the launched service.

    Note

    Open a new terminal, leaving the current terminal open with the launched service.

    Note

    The “sed” command is used to convert the multi-line text file into a single line for JSON encoding.

  6. Prepare JSON formatted post-data. This step requires being launched in the most common bash shell environment in Linux. Users can verify if the current session is bash by using the command echo $0. If not, please run the command /bin/bash before this step.

     1protein_bytes=`sed -z 's/\n/\\\n/g' bionemo-diffdock_vmolecular-docking_noarchx1_fp32_24.03.04/diffdock/1/data_sample/protein.pdb`; \
     2   ligand_bytes=`sed -z 's/\n/\\\n/g'  bionemo-diffdock_vmolecular-docking_noarchx1_fp32_24.03.04/diffdock/1/data_sample/ligand.sdf`; \
     3   echo "{
     4      \"ligand\": \"${ligand_bytes}\",
     5      \"ligand_file_type\": \"sdf\",
     6      \"protein\": \"${protein_bytes}\",
     7      \"num_poses\": 1,
     8      \"time_divisions\": 20,
     9      \"steps\": 18,
    10      \"save_trajectory\": false,
    11      \"is_staged\": false
    12   }" > diffdock.json
    
  7. Run Inference and save to output.json.

    1curl --header "Content-Type: application/json" \
    2    --request POST \
    3    --data @diffdock.json \
    4    --output output.json \
    5    http://localhost:8008/molecular-docking/diffdock/generate
    

Available Models

Version

GPU Model

Number of GPUs

Precision

Memory Footprint

File Size

molecular-docking_noarchx1_fp32_24.03.04

Volta, Ampere, Ada or Hopper

1

FP32

16 GB

2.52 GB

Detailed Instructions

This section provides additional details outside of the scope of the QuickStart guide.

Pull the NIM Container

  1. Container image tags can be seen with the command below, similar to other container images on NGC.

    ngc registry image info nvcr.io/nvidia/nim/bionemo_diffdock_nim
    
     1Image Repository Information
     2   Name: bionemo_diffdock_nim
     3   Display Name: bionemo_diffdock_nim
     4   Short Description: Diffdock predicts the 3D structure of the interaction between a molecule and a protein.
     5   Built By:
     6   Publisher: MIT
     7   Multinode Support: False
     8   Multi-Arch Support: False
     9   Logo: https://assets.ngc.nvidia.com/products/api-catalog/images/diffdock.jpg
    10   Labels: NVIDIA AI Enterprise  Supported, NVIDIA NIM
    11   Public: No
    12   Last Updated: May 03, 2024
    13   Latest Image Size: 10.84 GB
    14   Signed Tag?: False
    15   Latest Tag: 24.03.04
    16   Tags:
    17       24.03.04
    18       ...
    
  2. Pull the container image

    docker pull nvcr.io/nvidia/nim/bionemo_diffdock_nim:24.03.04
    
    ngc registry image pull nvcr.io/nvidia/nim/bionemo_diffdock_nim:24.03.04
    

Pull the DiffDock Model

  1. Model tags follow the versioning of repository:version. The model is called bionemo-diffdock and the version follows the naming pattern <VARIANT>_<GPU_TYPE>x<NUM_GPUS>_<precision>_YY.MM.x. Additional versions are available and can be seen by running the following NGC command line command:

    ngc registry model list "nvidia/nim/bionemo-diffdock:*"
    
  2. Download the model

    ngc registry model download-version "nvidia/nim/bionemo-diffdock:molecular-docking_noarchx1_fp32_24.03.04"
    
  3. After downloading the model, you should see the model folder named bionemo-diffdock_vmolecular-docking_noarchx1_fp32_24.03.04/ in your working directory. This folder contains the assets required for running the service.

Launch Microservice

Launch the container. Start-up may take a couple of minutes until the service is available.

Note

In this example, we’re hosting the OpenAI API-compatible endpoint and health check on port 8008. After you start the Docker command below, you may open another terminal session on the same host and proceed to the next step.

1docker run --rm -it --name diffdock \
2--runtime=nvidia -e CUDA_VISIBLE_DEVICES=0 \
3--shm-size=2G \
4--ulimit memlock=-1 \
5--ulimit stack=67108864 \
6-v $(pwd)/bionemo-diffdock_vmolecular-docking_noarchx1_fp32_24.03.04:/nim_model \
7-e MODEL_PATH=/nim_model \
8-p 8008:8008 \
9nvcr.io/nvidia/nim/bionemo_diffdock_nim:24.03.04

Health and Liveness Checks

The container exposes a health endpoint for integration into existing systems such as Kubernetes. This endpoint only returns an HTTP 200 OK status code if the service is ready or live, respectively.

1curl localhost:8008/v2/health/ready
2...
3true

Run Inference

  1. Open a new terminal and locate the same path that contains the downloaded model folder.

  2. Create a new blank file in the current folder (where the bionemo-diffdock_vmolecular-docking_noarchx1_fp32_24.03.04 folder is located), name it as diffdock.sh and copy the content below into it.

     1#!/bin/bash
     2
     3# Script: diffdock.sh
     4# Usage: ./diffdock.sh bionemo-diffdock_vmolecular-docking_noarchx1_fp32_24.03.04/diffdock/1/data_sample/protein.pdb bionemo-diffdock_vmolecular-docking_noarchx1_fp32_24.03.04/diffdock/1/data_sample/ligand.sdf
     5# Doc: http://localhost:8008/docs#/Default/molecular_docking
     6
     7protein_file=$1
     8ligand_file=$2
     9
    10# The next two commands are used to convert the multi-line text file into a single line for JSON encoding.
    11protein_bytes=`sed -z 's/\n/\\\n/g' $protein_file`
    12ligand_bytes=`sed -z 's/\n/\\\n/g' $ligand_file`
    13
    14echo "{
    15   \"ligand\": \"${ligand_bytes}\",
    16   \"ligand_file_type\": \"sdf\",
    17   \"protein\": \"${protein_bytes}\",
    18   \"num_poses\": 1,
    19   \"time_divisions\": 20,
    20   \"steps\": 18,
    21   \"save_trajectory\": false,
    22   \"is_staged\": false
    23}" > diffdock.json
    24
    25curl --header "Content-Type: application/json" \
    26   --request POST \
    27   --data @diffdock.json \
    28   --output output.json \
    29   http://localhost:8008/molecular-docking/diffdock/generate
    
  3. Run the commands below to invoke the diffdock model. The script will generate an input JSON file, and return the inference result in JSON format in the file output.json.

    1chmod +x diffdock.sh
    2./diffdock.sh bionemo-diffdock_vmolecular-docking_noarchx1_fp32_24.03.04/diffdock/1/data_sample/protein.pdb bionemo-diffdock_vmolecular-docking_noarchx1_fp32_24.03.04/diffdock/1/data_sample/ligand.sdf
    
  4. The output file output.json is a JSON formatted content with predicted docking poses (coordinates of ligand atoms) with the structure demonstrated below:

     1{
     2"trajectory":"", // This is for intermediate poses during the diffusion sampling. It's empty here because we didn't save the trajectory.
     3"ligand_positions": // This is the most important part: the predicted docking pose(s) in SDF format.
     4["
     5    7RWO_7WN_A_3001_rank1
     6    RDKit          3D
     7
     8    35 37  0  0  0  0  0  0  0  0999 V2000
     9       -2.2332    0.8777   -4.4707 N   0  0  0  0  0  0  0  0  0  0  0  0
    10       -3.0333   -0.3266   -4.2170 C   0  0  0  0  0  0  0  0  0  0  0  0
    11       -3.3251   -0.4852   -2.7199 C   0  0  0  0  0  0  0  0  0  0  0  0
    12       -2.0866   -0.2877   -1.8852 C   0  0  0  0  0  0  0  0  0  0  0  0
    13       -2.0960   -0.6741   -0.5378 C   0  0  0  0  0  0  0  0  0  0  0  0
    14       -0.9775   -0.4557    0.2690 C   0  0  0  0  0  0  0  0  0  0  0  0
    15        1.7812    0.5416    2.9476 C   0  0  0  0  0  0  0  0  0  0  0  0
    16        1.8978   -2.1976    3.2004 C   0  0  0  0  0  0  0  0  0  0  0  0
    17        1.6589   -1.6404    1.9373 C   0  0  0  0  0  0  0  0  0  0  0  0
    18        1.5768   -0.2441    1.8142 C   0  0  0  0  0  0  0  0  0  0  0  0
    19        0.1740    0.1524   -0.2611 C   0  0  0  0  0  0  0  0  0  0  0  0
    20        0.1733    0.5644   -1.6014 C   0  0  0  0  0  0  0  0  0  0  0  0
    21       -0.9527    0.3624   -2.4140 C   0  0  0  0  0  0  0  0  0  0  0  0
    22       -0.8963    0.7499   -3.8739 C   0  0  0  0  0  0  0  0  0  0  0  0
    23        2.3100   -1.9356    5.6138 C   0  0  0  0  0  0  0  0  0  0  0  0
    24        1.3387    0.3920    0.5497 N   0  0  0  0  0  0  0  0  0  0  0  0
    25        2.0085   -0.0501    4.1471 N   0  0  0  0  0  0  0  0  0  0  0  0
    26        2.0591   -1.3779    4.2848 N   0  0  0  0  0  0  0  0  0  0  0  0
    27        1.9693   -3.4511    3.3302 O   0  0  0  0  0  0  0  0  0  0  0  0
    28        1.5365   -2.6902    0.5250 Cl  0  0  0  0  0  0  0  0  0  0  0  0
    29       -2.7161    1.7061   -4.0482 H   0  0  0  0  0  0  0  0  0  0  0  0
    30       -3.9937   -0.2503   -4.7704 H   0  0  0  0  0  0  0  0  0  0  0  0
    31       -2.5001   -1.2263   -4.5979 H   0  0  0  0  0  0  0  0  0  0  0  0
    32       -4.0828    0.2681   -2.4134 H   0  0  0  0  0  0  0  0  0  0  0  0
    33       -3.7483   -1.4972   -2.5394 H   0  0  0  0  0  0  0  0  0  0  0  0
    34       -2.9746   -1.1413   -0.1100 H   0  0  0  0  0  0  0  0  0  0  0  0
    35       -1.0179   -0.7548    1.3080 H   0  0  0  0  0  0  0  0  0  0  0  0
    36        1.7451    1.6212    2.8812 H   0  0  0  0  0  0  0  0  0  0  0  0
    37        1.0570    1.0349   -2.0157 H   0  0  0  0  0  0  0  0  0  0  0  0
    38       -0.3146   -0.0168   -4.4301 H   0  0  0  0  0  0  0  0  0  0  0  0
    39       -0.3617    1.7187   -3.9839 H   0  0  0  0  0  0  0  0  0  0  0  0
    40        2.3844   -1.1262    6.3720 H   0  0  0  0  0  0  0  0  0  0  0  0
    41        1.4777   -2.6116    5.9042 H   0  0  0  0  0  0  0  0  0  0  0  0
    42        3.2645   -2.5036    5.6106 H   0  0  0  0  0  0  0  0  0  0  0  0
    43        1.9946    1.1533    0.2617 H   0  0  0  0  0  0  0  0  0  0  0  0
    44    1  2  1  0
    45    2  3  1  0
    46    3  4  1  0
    47    4  5  1  0
    48    5  6  2  0
    49    8  9  1  0
    50    7 10  1  0
    51    9 10  2  0
    52    6 11  1  0
    53   11 12  2  0
    54    4 13  2  0
    55   12 13  1  0
    56    1 14  1  0
    57   13 14  1  0
    58   11 16  1  0
    59   10 16  1  0
    60    7 17  2  0
    61   17 18  1  0
    62    8 18  1  0
    63   15 18  1  0
    64    8 19  2  0
    65    M  END
    66    $$
    67"],
    68
    69"position_confidence": [-0.6001075506210327], // The confidence estimation from the DiffDock confidence model.
    70
    71"protein": "CRYST1   45.330   73.690   52.740  90.00 109.70  90.00 P 1 21 1      0..." // The rest parts are the content of input
    72}
    
  5. The ligand_positions field is a list of predicted poses, each of which is a text in SDF format.

    Note

    In the real output, the JSON content is in a compact (minified) form, which is not in the formatted printing style above.

Stopping the Container

When you’re done testing the endpoint, you can bring down the container by running docker stop diffdock in a new terminal.