AlphaFold2 NIM Endpoints#

AlphaFold2 NIM provides the following endoints:

protein-structure/alphafold2/predict-structure-from-sequence - Predict a protein structure given an input amino acid sequence.
protein-structure/alphafold2/predict-MSA-from-sequence - Perform a Multiple Sequence Alignment (MSA) and return the MSA and templates for alphafold inference. This endpoint is useful for batching long-running MSA inference.
protein-structure/alphafold2/predict-structure-from-MSA - Perform structural prediction from an input MSA and templates. This is useful when using a pre-computed MSA.

Usage#

Below, we outline the three endpoints of the API. We give real examples of requests that should run when the NIM is correctly configured.

Predict structure from an input sequence#

The predict-structure-from-sequence provides a full end-to-end structural prediction pipeline. It requires nothing but an input amino acid sequence, though there are many tunable parameters:

sequence: A valid amino acid sequence. Refer to the table of amino acid codes if you are unsure if your sequence is valid.
databases: A list containing any of “uniref90”, “mgnify”, and “small_bfd”. These databases contain sequences used to generate a Multiple Sequence Alignment that is used as input to the structural prediction neural network in AlphaFold2. In general, passing all three will provide the most accurate structural prediction at the cost of requiring the longest runtime.
algorithm: The algorithm used for Multiple Sequence Alignment. The available options are jackhmmer and mmseqs2. mmseqs2 provides significantly improved performance (especially for long sequences), whereas the AlphaFold2 model was trained using the outputs of JackHMMer.
e_value: The sequence e-value for filtering sequences in the MSA. Smaller is stricter; fewer sequences will be included, however, this will also reduce the sensitivity of the MSA. The default value is in general a good choice. This value ranges from 0 to 1.
bit_score: The sequence bit-score to use for filtering before MSA. If passed, this is used in place of e-value for filtering. A good starting place is around 200. This value is greater than zero.
iterations: The number of MSA iterations to perform. In general, the default iterations=1 is sufficient and takes the least amount of time.
relax_prediction: Set to True to run structural relaxation after prediction. This is set to True by default and helps fix clashes in the predicted structure.

Here’s an example query for a sequence and a full set of databases using cURL:

curl -X 'POST' \
    -i \
    "http://localhost:8000/protein-structure/alphafold2/predict-structure-from-sequence"  \
    -H 'accept: application/json' \
    -H 'Content-Type: application/json' \
    -d '{"sequence": "MNVIDIAIAMAI", "databases": ["uniref90", "mgnify", "small_bfd"]}'

Here is the same example, but this time using the python requests module.

import requests
import json

url = "http://localhost:8000/protein-structure/alphafold2/predict-structure-from-sequence"  
sequence = "MNVIDIAIAMAI"

headers = {
    "content-type": "application/json"
}

data = {
    "sequence": sequence,
    "databases": ["uniref90", "mgnify", "small_bfd"]
}

response = requests.post(url, headers=headers, data=json.dumps(data))

# Check if the request was successful
if response.ok:
    print("Request succeeded:", response.json())
else:
    print("Request failed:", response.status_code, response.text)

The output of this endpoint is a PDB file. The PDB format can easily be viewed using pymol and other viewing programs; see the pymol website for documentation and usage.

Predict MSA from an input sequence#

The predict-msa-from-sequence endpoint generates the Multiple Sequence Alignments and templates used for structural prediction. This is useful if you want to batch prediction on different nodes.

Below is an example query using cURL:

curl -X 'POST' \
    -i \
    "http://localhost:8000/protein-structure/alphafold2/predict-msa-from-sequence"  \
    -H 'accept: application/json' \
    -H 'Content-Type: application/json' \
    -d '{"sequence": "MNVIDIAIAMAI", "databases": ["uniref90", "mgnify", "small_bfd"]}'

Here is the same query in python using the requests module:

import requests
import json

url = "http://0:8000/protein-structure/alphafold2/predict-msa-from-sequence"  # Replace with the actual URL
sequence = "STARWARSNVIDIAAAAAA"  # Replace with the actual sequence value

headers = {
    "content-type": "application/json"
}

data = {
    "sequence": sequence,
    "databases": ["uniref90", "mgnify", "small_bfd"]
}

response = requests.post(url, headers=headers, data=json.dumps(data))

# Check if the request was successful
if response.ok:
    print("Request succeeded:", response.json())
else:
    print("Request failed:", response.status_code, response.text)

The predict-msa-from-sequence endpoint takes the following parameters:

sequence: A valid amino acid sequence. Refer to the table of amino acid codes if you are unsure if your sequence is valid.
databases: A list containing any of “uniref90”, “mgnify”, and “small_bfd”. These databases contain sequences used to generate a Multiple Sequence Alignment that is used as input to the structural prediction neural network in AlphaFold2. In general, passing all three will provide the most accurate structural prediction at the cost of requiring the longest runtime. If you must pick only one, uniref90 is considered the best choice, though it is still recommended to run with all three.
algorithm: The algorithm used for Multiple Sequence Alignment. The available options are jackhmmer and mmseqs2. mmseqs2 provides significantly improved performance (especially for long sequences), whereas the AlphaFold2 model was trained using the outputs of JackHMMer.
e_value: The sequence e-value for filtering sequences in the MSA. Smaller is stricter; fewer sequences will be included, however, this will also reduce the sensitivity of the MSA. The default value is in general a good choice. This value ranges from 0 to 1.
bit_score: The sequence bit-score to use for filtering before MSA. If passed, this is used in place of e-value for filtering. A good starting place is around 200. This value is greater than zero.
iterations: The number of MSA iterations to perform. In general, the default iterations=1 is sufficient and takes the least amount of time.

Predict structure from an input MSA#

The predict-structure-from-msa endpoint takes the results of the predict-msa-from-sequence endpoint and runs structural prediction.

Note: we do not recommend running the msa-to-structure prediction using CURL. This is because the inputs have characters that require careful escaping in bash. For the best user experience, we recommend interacting with this endpoint via the python request module.

The predict-structure-from-msa endpoint takes the following arguments:

sequence: A valid amino acid sequence. Refer to the table of amino acid codes if you are unsure if your sequence is valid.
alignments: The MSA results from predict-msa-from-sequence. This is in a dictionary of tuples of the form {<db name> : {<db name>, <MSA output>, <MSA output format>}}
templates: Templates from the structural database search. These are in a format specific to the internals of AlphaFold2; more detils of the fields can be found here.
relax_prediction: Set to True to run structural relaxation after prediction. This is set to True by default and helps fix clashes in the predicted structure.

Here is an example of a request to the predict-structure-from-msa endpoint using the python request module.

import requests
import json

url = "http://0:8000/protein-structure/alphafold2/predict-structure-from-msa"  # Replace with the actual URL
sequence = "STARWARSNVIDIAAAAAA"  # Replace with the actual sequence value


alignments = {'uniref90':
    ['uniref90', '# STOCKHOLM 1.0\n\n-151285509650596177 STARWARSNVIDIAAAAAA\n#=GC RF             xxxxxxxxxxxxxxxxxxx\n//\n', 'sto'],
    'small_bfd': ['small_bfd', '# STOCKHOLM 1.0\n\n-151285509650596177 STARWARSNVIDIAAAAAA\n#=GC RF             xxxxxxxxxxxxxxxxxxx\n//\n', 'sto']}
templates = [{'index': 1, 'name': '5X6U_E Ragulator complex protein LAMTOR3, Ragulator; Ragulator complex, scaffold, roadblock, lysosome; 2.4A {Homo sapiens}', 'aligned_cols': 10, 'sum_probs': 0.0, 'query': 'RSNVIDIAAA', 'hit_sequence': 'ASNIIDVSAA', 'indices_query': [6, 7, 8, 9, 10, 11, 12, 13, 14, 15], 'indices_hit': [23, 24, 25, 26, 27, 28, 29, 30, 31, 32]}, {'index': 2, 'name': '5X6V_E Ragulator complex protein LAMTOR3, Ragulator; Ragulator Rag GTPase complex, scaffold; 2.02A {Homo sapiens}', 'aligned_cols': 10, 'sum_probs': 7.9, 'query': 'RSNVIDIAAA', 'hit_sequence': 'ASNIIDVSAA', 'indices_query': [6, 7, 8, 9, 10, 11, 12, 13, 14, 15], 'indices_hit': [23, 24, 25, 26, 27, 28, 29, 30, 31, 32]}, {'index': 3, 'name': '6EHP_E Ragulator complex protein LAMTOR3, Ragulator; Scaffolding complex, Rag-GTPase, mTOR, Ragulator; 2.3A {Homo sapiens}', 'aligned_cols': 10, 'sum_probs': 0.0, 'query': 'RSNVIDIAAA', 'hit_sequence': 'ASNIIDVSAA', 'indices_query': [6, 7, 8, 9, 10, 11, 12, 13, 14, 15], 'indices_hit': [45, 46, 47, 48, 49, 50, 51, 52, 53, 54]}, {'index': 4, 'name': '6EHR_E Ragulator complex protein LAMTOR3, Ragulator; Scaffolding complex, Rag-GTPases, mTOR, Ragulator; 2.898A {Homo sapiens}', 'aligned_cols': 10, 'sum_probs': 7.8, 'query': 'RSNVIDIAAA', 'hit_sequence': 'ASNIIDVSAA', 'indices_query': [6, 7, 8, 9, 10, 11, 12, 13, 14, 15], 'indices_hit': [45, 46, 47, 48, 49, 50, 51, 52, 53, 54]}, {'index': 5, 'name': '6CTD_B Large-conductance mechanosensitive channel; Channel Mechanosensitive Mycobacterium tuberculosis, MEMBRANE; 5.8A {Mycobacterium tuberculosis (strain ATCC 25177 / H37Ra)}', 'aligned_cols': 11, 'sum_probs': 8.7, 'query': 'ARSNVIDIAAA', 'hit_sequence': 'ARGNIVDLAVA', 'indices_query': [5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], 'indices_hit': [29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39]}, {'index': 6, 'name': '3HZQ_A Large-conductance mechanosensitive channel; intermediate state Mechanosensitive channel osmoregulation; 3.82A {Staphylococcus aureus subsp. aureus MW2}', 'aligned_cols': 11, 'sum_probs': 8.6, 'query': 'ARSNVIDIAAA', 'hit_sequence': 'LKGNVLDLAIA', 'indices_query': [5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], 'indices_hit': [29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39]}, {'index': 7, 'name': '6B9X_A Ragulator complex protein LAMTOR1, Ragulator; Ragulator, Lamtor, SIGNALING PROTEIN; 1.42A {Homo sapiens}', 'aligned_cols': 12, 'sum_probs': 0.0, 'query': 'WARSNVIDIAAA', 'hit_sequence': 'KTASNIIDVSAA', 'indices_query': [4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], 'indices_hit': [59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70]}, {'index': 8, 'name': '4V7H_BM Ribosome; eukaryotic ribosome, 80S, RACK1 protein; HET: OMC, PSU, 5MU, 1MA, OMG, 5MC, YYG, 7MG, 2MG, H2U, M2G; 8.9A {Thermomyces lanuginosus}', 'aligned_cols': 15, 'sum_probs': 9.1, 'query': 'RWARSNVIDIAAAAA', 'hit_sequence': 'GWKAAAAAAAAAAAA', 'indices_query': [3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17], 'indices_hit': [139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153]}, {'index': 9, 'name': '6QKP_A Nucleoid-associated protein Lsr2; Tuberculosis, DNA organisation, Transcriptional regulator; NMR {Mycobacterium tuberculosis (strain ATCC 25618 / H37Rv)}', 'aligned_cols': 12, 'sum_probs': 9.2, 'query': 'RWARSNVIDIAA', 'hit_sequence': 'EWARRNGHNVST', 'indices_query': [3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14], 'indices_hit': [22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33]}, {'index': 10, 'name': "1QGN_F CYSTATHIONINE GAMMA-SYNTHASE; METHIONINE BIOSYNTHESIS, PYRIDOXAL 5'-PHOSPHATE, GAMMA-FAMILY; HET: PLP; 2.9A {Nicotiana tabacum} SCOP: c.67.1.3", 'aligned_cols': 10, 'sum_probs': 0.0, 'query': 'NVIDIAAAAA', 'hit_sequence': 'KAVDAAAAAA', 'indices_query': [8, 9, 10, 11, 12, 13, 14, 15, 16, 17], 'indices_hit': [2, 3, 4, 5, 6, 7, 8, 9, 10, 11]}, {'index': 11, 'name': '2OAR_E Large-conductance mechanosensitive channel; stretch activated ion channel mechanosensitive; 3.5A {Mycobacterium tuberculosis H37Ra} SCOP: f.16.1.1', 'aligned_cols': 11, 'sum_probs': 8.9, 'query': 'ARSNVIDIAAA', 'hit_sequence': 'ARGNIVDLAVA', 'indices_query': [5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], 'indices_hit': [32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42]}, {'index': 12, 'name': '5XKX_A Flavin-containing monooxygenase; Dimethylsulfoniopropionate (DMSP) lyase, LYASE; 1.5A {Acinetobacter bereziniae NIPH 3}', 'aligned_cols': 10, 'sum_probs': 8.0, 'query': 'ARWARSNVID', 'hit_sequence': 'TVWARTTAQD', 'indices_query': [2, 3, 4, 5, 6, 7, 8, 9, 10, 11], 'indices_hit': [356, 357, 358, 359, 360, 361, 362, 363, 364, 365]}]

headers = {
    "content-type": "application/json"
}

data = {
    "sequence": sequence,
    "alignments": alignments,
    "templates": templates
}

response = requests.post(url, headers=headers, data=json.dumps(data))

# Check if the request was successful
if response.ok:
    print("Request succeeded:", response.json())
else:
    print("Request failed:", response.status_code, response.text)

The structural prediction module scales quadratically with sequence length. Long sequences can take several hours to predict.