Evo 2 NIM endpoints#

The NIM model provides endpoints that generates DNA sequences, runs model forward pass and saves layer outputs, and conducts readiness checks. The input and output parameters of these endpoints correspond to the properties in a JSON object that the endpoint submits or receives.

Generate DNA sequences#

Endpoint path: /biology/arc/evo2/generate

Input parameters#

  • sequence (string): Required. Sequence data of the DNA.

  • num_tokens (integer, null): Optional (default: 100). Number of tokens to be generated.

  • temperature (number, null): Optional (default: 0.7). Scale of randomness in the temperature sampling process. Values lower than 1.0 generates a sharper distribution, which is less random. Values higher than 1.0 generates a uniform distribution, which is more random.

  • top_k (integer, null): Optional (default: 3). Specifies the number of highest probability tokens to consider for sampling. When set to 1, it selects only the token with the highest probability. The higher the value, the more diverse the sampling will be. If set to 0, all tokens are considered.

  • top_p (number, null): Optional (default: 0.0). This parameter specifies the top-p threshold number, between 0 and 1, that enables nucleus sampling. When cumulative probability of the smallest possible set of tokens exceeds the top_p threshold, it filters out the rest of the tokens. Setting this to 0.0 disables top-p sampling.

  • random_seed (integer, null): Optional. Turns the Evo 2 model into a deterministic model, where an input DNA and a fixed seed always produces the same output. This argument should only be used for development purposes.

  • enable_logits (boolean): Optional (default: False). Enables or disables Logits reporting in the output response.

  • enable_sampled_probs (boolean): Optional (default: False). Enables or disables the reporting of sampled token probabilities. When enabled, generates a list of probability values, between 0 and 1, corresponding to each token in the output sequence. These probabilities represent the model’s confidence each token selection during the generation process. The resulting list has the same length as the output sequence, which provides insight into the model’s decision-making at each step of text generation.

  • enable_elapsed_ms_per_token (boolean): Optional (default: False). Enables or disables the reporting of per-token timing statistics, which is used for benchmarking.

Outputs#

  • sequence (string): Required. This output contains the generated DNA sequence.

  • logits (array, null): Optional. This outputs Logits report in a [num_tokens, 512] format if enabled in enable_logits input.

    Note: Evo 2’s vocabulary size is 512, but for DNA sequence generation, only 4 tokens (A, C, T, G) are meaningful in the output of the pretrained model. The remaining tokens are present in the vocabulary for technical reasons (for example, input prompts, tokenizer padding, or future fine-tuning), but are not expected to be generated as output nucleotides.

    The mapping from logit indices to DNA bases is as follows:

    • A: ASCII 65

    • C: ASCII 67

    • T: ASCII 84

    • G: ASCII 71

    The top_k parameter is allowed to be up to 6 for compatibility, but in practice, only the 4 DNA bases are relevant for sequence generation.

  • sampled_probs (array, null): Optional. This outputs a list of probabilities that corresponds to each token in the generated output sequence. Each value ranges from 0 to 1, representing the model’s confidence in selecting specific tokens during the generation process. The list length matches the output sequence length. To get this output, enable_sampled_probs must be set to True. This information provides insight into the model’s decision-making at each step of text generation.

  • elapsed_ms (integer): Required. This outputs the amount of time elapsed in milliseconds on server side.

  • elapsed_ms_per_token (array, null): Optional. This outputs the amount of time elapsed in milliseconds on server side for each generated token.

Run model forward pass and save layers outputs#

Endpoint path: /biology/arc/evo2/forward

Input parameters#

  • sequence (string): Required. Sequence data of the DNA.

  • output_layers (array): Required. List of layer names from which to capture and save output tensors.

    The Evo 2 model architecture consists of two types of layers:

    • HyenaLayer: Uses Hyena mixer for efficient long-range modeling

    • TransformerLayer: Uses multi-head self-attention mechanism

    Layer distribution by model size:

    • 7B models: 32 layers total

      • HyenaLayers: all layers except 3, 10, 17, 24, 31

      • TransformerLayers: layers 3, 10, 17, 24, 31

    • 40B models: 50 layers total

      • HyenaLayers: all layers except 3, 10, 17, 24, 31, 35, 42, 49

      • TransformerLayers: layers 3, 10, 17, 24, 31, 35, 42, 49

    Model-level layers:

    • embedding: Input token embeddings (typically layer 0). Note: This refers to the static token embeddings before any model computation. This layer is rarely useful for downstream analysis. Instead, it is recommended to use the output of an intermediate layer or the final MLP layer of a layer (as listed below), since these actually capture context-dependent features.

    • decoder.final_norm: Final model layer normalization

    • output_layer: Final output/logits of shape [seq_len, batch_size, 512], where 512 is the padded vocabulary size of the tokenizer.

    HyenaLayer components (available in HyenaLayers only):

    • decoder.layers.[n].mixer: Output of the Hyena mixer submodule in layer [n]. This captures the short, medium, and long-range sequence modeling capabilities of the Hyena mechanism, depending on the layer index.

    TransformerLayer components (available in TransformerLayers only):

    • decoder.layers.[n].self_attention: Multi-head attention output in layer [n]

    • decoder.layers.[n].self_attention.linear_qkv: Query, key, and value projection weights combined

    • decoder.layers.[n].self_attention.core_attention: The self-attention mechanism

    • decoder.layers.[n].self_attention.linear_proj: Output projection layer

    MLP components (available in both layer types):

    • decoder.layers.[n].mlp: Complete MLP output in layer [n]

    • decoder.layers.[n].mlp.linear_fc1: First MLP layer in layer [n]

    • decoder.layers.[n].mlp.linear_fc2: Second MLP layer in layer [n]

    Where [n] is the layer index:

    • For 7B models: 0 to 31

    • For 40B models: 0 to 49

    For example: ["output_layer", "decoder.layers.20.mlp.linear_fc2", "decoder.layers.3.self_attention"]

    Note: Evo 2’s vocabulary size is 512, but for DNA sequence generation, only 4 tokens (A, C, T, G) are meaningful in the output of the pretrained model. The remaining tokens are present in the vocabulary for technical reasons (for example, input prompts, tokenizer padding, or future fine-tuning), but are not expected to be generated as output nucleotides.

    The mapping from logit indices to DNA bases is as follows:

    • A: ASCII 65

    • C: ASCII 67

    • T: ASCII 84

    • G: ASCII 71

    The top_k parameter is allowed to be up to 6 for compatibility, but in practice, only the 4 DNA bases are relevant for sequence generation.

Outputs#

  • data (string): Required. This outputs the tensors of requested layers in the NumPy Zipped (NPZ) format that is Base64 encoded.

  • elapsed_ms (integer): Required. This outputs the amount of time elapsed in milliseconds on server side.

Readiness check#

Endpoint path: /v1/health/ready

Input parameters#

None.

Outputs#

The output of the endpoint is a JSON response with a value that indicates the readiness of the microservice. When the NIM is ready, it returns the response {"status":"ready"}.