ProteinMPNN (Latest)
ProteinMPNN (Latest)

ProteinMPNN NIM endpoints

The model provides the following endpoints. The input and output parameters correspond to properties in a JSON object submitted to or received from an endpoint.

Endpoint path: biology/ipd/proteinmpnn/predict

Input parameters

  • input_pdb (string, null): Optional. Input protein for which amino acid sequences need to be predicted.

  • input_pdb_asset (string, null): Optional. Optional pre-uploaded NVCF Asset ID. If using this field, original file name should be provided via input_pdb argument.

  • input_pdb_chains (array, null): Optional. The model will design amino acid sequences for the given chains in the input protein. If not specified, default is to design for all chains in the protein.

  • ca_only (boolean, null): Optional. Default is False. CA-only model helps to address specific needs in protein design where focusing on the alpha carbon (CA) atoms can be advantageous.

  • use_soluble_model (boolean, null): Optional. Default is False. ProteinMPNN offers both soluble and non-soluble models to cater to the specific needs of different protein design tasks. Soluble models are better suited for applications requiring high solubility, such as biotechnological processes, pharmaceutical development, and biochemical assays. Non-soluble models are advantageous for membrane protein studies, structural biology, and certain industrial applications where solubility is less critical or where proteins need to function in hydrophobic environments. This flexibility allows researchers to choose the appropriate model based on the specific requirements of their project.

  • random_seed (integer, null): Optional. The model allows users to set or not set the random seed based on the specific needs. For example, if reproducibility is crucial, it is recommended to set a fixed seed. However, for tasks requiring exploration and diversity, users might choose not to set the seed, allowing the model to leverage the benefits of randomness.

  • num_seq_per_target (integer, null): Optional. Default is 1. This parameter specifies the number of sequences to generate per target protein structure. By setting num_seq_per_target, users can determine how many different sequences the model should predict that will fold into the given protein backbone structure.

  • sampling_temp (array, null): Optional. The units for sampling temperatures in ProteinMPNN are dimensionless and range from 0 to 1. This parameter is used to adjust the probability values for the 20 amino acids at each position in the sequence, thereby controlling the diversity of the design outcomes. Higher values lead to increased diversity in the designed results, while lower values result in less diversity and more conservative designs. Recommended range is from 0.1 to 0.3.

  • pssm_jsonl (string, null): Optional. PSSM in the context of ProteinMPNN is a tool that incorporates evolutionary information into the protein design process. It helps guide mutations and enhance prediction accuracy by leveraging the conservation patterns observed in homologous protein sequences. This makes the designed proteins more likely to be stable and functional, improving the overall success of the design process.

  • pssm_multi (number, null): Optional. Default is 0.0. This parameter is used to adjust the influence of PSSMs on the protein sequence design process, allowing users to balance between evolutionary data and the model’s predictions to achieve desired design outcomes. A value of 0.0 means that the PSSM is not used at all, and the design relies entirely on the ProteinMPNN model’s predictions. A value of 1.0 means that the design process completely ignores the ProteinMPNN model’s predictions and relies solely on the PSSM. Intermediate values allow for a blend of both the PSSM and the model’s predictions.

  • pssm_threshold (number, null): Optional. Default is 0.0. Parameter can take any value between negative infinity and positive infinity. A higher threshold value will be more restrictive, allowing only amino acids with PSSM scores above the threshold to be included in the design. A lower threshold value will be less restrictive, allowing more amino acids to be considered. Setting the threshold to a very low value (e.g., negative infinity) effectively means that all amino acids are allowed, while a very high value (e.g., positive infinity) could exclude all amino acids.

  • pssm_bias_flag (boolean, null): Optional. Default is False. This is parameter determines whether to apply a bias based on a Position-Specific Scoring Matrix (PSSM) during the protein sequence design process.

  • pssm_log_odds_flag (boolean, null): Optional. Default is False. This parameter controls whether the PSSM values are transformed into log-odds scores. Log-odds scores are a common way to represent the likelihood of observing a particular amino acid at a given position relative to a background distribution. This transformation can make the PSSM values more interpretable and useful for guiding the design process.

  • fixed_positions_jsonl (string, null): Optional. This parameter allows to control which residues in the protein sequence remain unchanged during the design process, providing users with the ability to enforce specific constraints based on experimental or functional requirements. Note: fixed positions are indexed starting from 1, and relative to new sequence.

  • omit_AAs (array, null): Optional. This parameter allows to control which amino acids in the protein sequence should be excluded. Amino acids are specified as one-letter FASTA representations.

  • omit_AA_jsonl (string, null): Optional. This parameter allows to exclude specific amino acids from the designed protein sequences at designated chain indices, providing users with greater control over the properties and functionality of the generated proteins. Example: ‘{“input”: {“A”: [[[1], “V”]]}}’, would omit valine in chain A at first AA position (indexing starts from 1.).

  • bias_AA_jsonl (string, null): Optional. By providing a bias dictionary, users can fine-tune the amino acid composition of the designed sequences. This can help in achieving specific design goals, such as avoiding certain amino acids that might lead to undesirable properties or promoting amino acids that enhance the desired characteristics of the protein. Dictionary is specified as a JSON object, e.g. {“A”: -1.1, “F”: 0.7} would result in alanine amino acid less likely to appear in the designed protein and phenylalanine more likely.

  • bias_by_res_jsonl (string, null): Optional. By providing a position-specific bias dictionary, users can fine-tune the amino acid composition of the designed sequences at specific residue positions. This can help in achieving specific design goals, such as promoting amino acids that enhance the desired characteristics of the protein at particular sites or avoiding amino acids that might lead to undesirable properties.

  • tied_positions_jsonl (string, null): Optional. By providing a dictionary of tied positions, users can ensure that specific residues are identical across different positions or chains. This is particularly important for designing proteins with internal repeats, cyclic symmetries, or multi-chain assemblies where certain residues must be the same to maintain the desired structure and function.

Outputs

  • mfasta (string): Required. This output contains the designed protein sequences in a multi-FASTA format, which is a standard text-based format for representing multiple sequences of amino acids.

  • scores (array): Required. This output provides the log-probabilities of the designed sequences, which indicate the likelihood of each sequence given the input structure, helping to assess the quality and confidence of the design.

  • probs (array): Required. This output includes the predicted probabilities for each amino acid at each position in the sequence, offering detailed insights into the model’s predictions and the variability at each site.

Endpoint path: v1/health/ready

Input parameters

None.

Outputs

The output of the endpoint is a JSON response containing a value indicating the readiness of the microservice. When the NIM is ready, it returns the response {"status":"ready"}.

Previous Quickstart Guide
Next Benchmarking
© Copyright © 2024, NVIDIA Corporation. Last updated on Aug 26, 2024.