Model Overview
Contents
Model Overview#
Description:#
DSMBind [1,2] is an energy-based model that has been trained on protein-ligand complexes to predict binding affinities. The model produces comparative values that are useful for ranking protein-ligand binding affinities. This model is for research and development only.
Third-Party Community Consideration#
This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see link to the model card by Broad Institute of MIT and Harvard.
License#
DSMBind is provided under the Apache License 2.0.
References:#
[1] Wengong Jin, Siranush Sarkizova, Xun Chen, Nir Hacohen, and Caroline Uhler. “Unsupervised protein-ligand binding energy prediction via neural euler’s rotation equation.” Advances in Neural Information Processing Systems 36 (2024).
[2] Wengong Jin, Xun Chen, Amrita Vetticaden, Siranush Sarzikova, Raktima Raychowdhury, Caroline Uhler, and Nir Hacohen. “DSMBind: SE (3) denoising score matching for unsupervised binding energy prediction and nanobody design.” bioRxiv (2023): 2023-12.
Model Architecture:#
Architecture Type: Energy-Based Model (EBM)
Network Architecture: SE(3)-Invariant Neural Network
Input:#
Input Type(s): Text (PDB, SDF)
Input Format(s): Protein Data Bank (PDB) Structure files for proteins, Structural Data Files (SDF) for ligands
Output:#
Output Type(s): Numerical scores (indicating binding affinities)
Output Format: List of scalar values
Other Properties Related to Output: Only the rank of the predicted values matters because the model produces comparative values instead of absolute binding energies.
Software Integration:#
Runtime Engine(s):
BioNeMo (1.7), NeMo
Supported Hardware Microarchitecture Compatibility:
Ampere
Hopper
Preferred/Supported Operating System(s):
Linux
Model Version(s):#
dsmbind.pth, version: 1.7
Training & Evaluation:#
Training Dataset:#
Link: a subset from PDB
Data Collection Method by dataset
Human
Properties (Quantity, Dataset Descriptions, Sensor(s)): Our DSMBind checkpoint was trained using a subset of PDB. This subset includes a total of 25,561 samples, each representing a unique protein-ligand complex.
Evaluation Dataset:#
Link: CASF-16
Data Collection Method by dataset
Human
Labeling Method by datasetHybrid: Human & Automated
Properties (Quantity, Dataset Descriptions, Sensor(s)): CASF-16 is an open challenge for comparative assessment of scoring functions. This benchmark has 285 protein-ligand complexes with binding affinity labels.
Evaluation Results#
We use gaussian noise to perturbe the ligand coordinates during training. We evaluate our trained DSMBind model on the CASF-16 benchmark. We measure the Pearson correlation coefficient to assess the linear relationship between the predicted scalar values and actual binding affinities. The trained checkpoint can achieve a Pearson correlation coefficient of 0.64.
Limitations#
DSMBind produces comparative values which are useful to rank complexes. But it does not provide absolute measures that are directly comparable to experimental ground truth affinities.