ProteinMPNN (Latest)
ProteinMPNN (Latest)

Benchmarking

Accuracy benchmarking measures how well ProteinMPNN can predict amio acids sequences from 3D structures of known proteins. The benchmark is also called sequence recovery rate.

The sequence recovery rate is a critical metric for evaluating the performance of ProteinMPNN, as it quantifies the neural network’s ability to generate protein sequences that closely resemble the native sequences found in nature. This metric is calculated by comparing the designed amino acid sequences produced by ProteinMPNN to the corresponding native sequences of the target proteins.

The sequence recovery rate determines the percentage of residues in the designed sequence that are identical to those in the native sequence at each position. A higher sequence recovery rate indicates that the model is more successful at capturing the intrinsic sequence features and evolutionary constraints that define the native protein, indicating that the designed protein is more likely to adopt the correct structure and function.

For example, ProteinMPNN has demonstrated an average sequence recovery rate of 52.4%, which significantly exceeds the 32.9% recovery rate achieved by traditional methods such as Rosetta, highlighting its superior capability in accurately designing protein sequences that mimic natural proteins.12

ProteinMPNN run time depends on many factors, such as number of atoms in the inputs, number of chains, number of sequences that needs to be generated, provided list if temperatures and so forth.

As a good representative of overall performace, we measure main performance characteristic of ProteinMPNN as average number of amino acids per seconds.

This NIM comes with a simple benchmarking script that can measure both accuracy and performance. It is useful to make sure that the neural network provides sane results on some known proteins.

The script is already packaged in NIM’s docker image. You can view and study the benchmark using following command:

Copy
Copied!
            

docker run --entrypoint cat nvcr.io/nim/ipd/proteinmpnn:1.0.0 /opt/benchmark.py

To execute the benchmark, follow this sequence:

  1. Make sure NIM is running as described in Quickstart Guide.

  2. Benchmark script automatically downloads test dataset. To save time and bandwidth it is recommended to provide local cache directory. This way the script will be able to reuse already downloaded data. Execute following command to setup cache directory.

Copy
Copied!
            

export LOCAL_NIM_CACHE=~/.cache/nim

  1. Execute the benchmark.

Copy
Copied!
            

docker run -it --net host -v "$LOCAL_NIM_CACHE":/home/nvs/.cache/nim \ nvcr.io/nim/ipd/proteinmpnn:1.0.0 /opt/benchmark.py


Previous ProteinMPNN NIM endpoints
Next Advanced Usage
© Copyright © 2024, NVIDIA Corporation. Last updated on Aug 26, 2024.