Hyena#

Introduction to Hyena and Evo 2#

Introduction#

The Hyena architecture represents a significant advancement in neural network design, specifically in the form of convolutional multi-hybrid architectures. As described in the Hyena paper, these architectures provide substantial efficiency gains through co-designed convolution operators and hardware-aware algorithms, enabling faster training and inference compared to traditional Transformers. At the 40 billion parameter scale, Hyena-based models train 1.2 to 2.9 times faster than optimized Transformers, with the StripedHyena 2 architecture achieving two-fold throughput improvement over linear attention and state-space models on H100 GPUs.

Evo 2 is a powerful transformer-hyena hybrid architecture designed for biological sequence modeling. Trained on 9.3 trillion DNA base pairs spanning all domains of life, Evo 2 features an unprecedented 1 million token context window with single-nucleotide resolution. Available in 1B, 7B, and 40B parameter versions, it can accurately predict functional impacts of genetic variation without task-specific fine-tuning, autonomously learning biological features including exon-intron boundaries, transcription factor binding sites, and protein structural elements. The model also enables controllable generation of genomic sequences and epigenomic structure through inference-time search.

Hyena-Based Models#

Available Models#

The Hyena architecture is utilized in various models, with Evo 2 being a prominent example. Evo 2 is available in the following configurations:

Model

Status

Evo 2 1B

Yes

Evo 2 7B

Yes

Evo 2 40B

Yes

Training Recipes#

We provide pre-defined recipes for pre-training and fine-tuning Hyena-based models using NeMo 2.0 and NeMo-Run. These recipes configure a run.Partial for one of the nemo.collections.llm api functions introduced in NeMo 2.0. The recipes are hosted in recipes folder (for example hyena_1b.py).

Pre-Training:

from nemo.collections import llm

# For 1B model
pretrain_1b = llm.hyena_1b.pretrain_recipe(
    name="hyena_1b_pretraining",
    dir="/path/to/checkpoints",
    num_nodes=1,
    num_gpus_per_node=8,
    tensor_parallel_size=1,
    global_batch_size=8,
    micro_batch_size=1,
    vocab_file="/path/to/vocab.json",
)

# For 7B model
pretrain_7b = llm.hyena_7b.pretrain_recipe(
    name="hyena_7b_pretraining",
    dir="/path/to/checkpoints",
    num_nodes=1,
    num_gpus_per_node=8,
    tensor_parallelism=8,
    vocab_file="/path/to/vocab.json",
)

# For 40B model
pretrain_40b = llm.hyena_40b.pretrain_recipe(
    name="hyena_40b_pretraining",
    dir="/path/to/checkpoints",
    num_nodes=1,
    num_gpus_per_node=8,
    tensor_parallelism=8,
    vocab_file="/path/to/vocab.json",
)

# Configure and assign your dataloader
dataloader = a_function_that_configures_your_custom_dataset(
    gbs=8,  # Adjust as needed for your model
    mbs=1,  # Adjust as needed for your model
    seq_length=pretrain_1b.model.config.seq_length,  # Use appropriate model
)
pretrain_1b.data = dataloader  # Assign to whichever model you're using

Fine-Tuning:

from nemo.collections import llm

# For 1B model
finetune_1b = llm.hyena_1b.finetune_recipe(
    resume_path="/path/to/nemo/checkpoint",
    name="hyena_1b_finetuning",
    dir="/path/to/checkpoints",
    num_nodes=1,
    num_gpus_per_node=8,
    tensor_parallel_size=1,
    global_batch_size=8,
    micro_batch_size=1,
    vocab_file="/path/to/vocab.json",
)

# For 7B model
finetune_7b = llm.hyena_7b.finetune_recipe(
    resume_path="/path/to/nemo/checkpoint",
    name="hyena_7b_finetuning",
    dir="/path/to/checkpoints",
    num_nodes=1,
    num_gpus_per_node=8,
    tensor_parallelism=8,
    vocab_file="/path/to/vocab.json",
)

# For 40B model
finetune_40b = llm.hyena_40b.finetune_recipe(
    resume_path="/path/to/nemo/checkpoint",
    name="hyena_40b_finetuning",
    dir="/path/to/checkpoints",
    num_nodes=1,
    num_gpus_per_node=8,
    tensor_parallelism=8,
    vocab_file="/path/to/vocab.json",
)

# Configure and assign your dataloader
dataloader = a_function_that_configures_your_custom_dataset(
    gbs=8,  # Adjust as needed for your model
    mbs=1,  # Adjust as needed for your model
    seq_length=finetune_1b.model.config.seq_length,  # Use appropriate model
)
finetune_1b.data = dataloader  # Assign to whichever model you're using

Note

For pre-training and fine-tuning, the recipes use placeholder datamodules for the data argument. You are expected to replace these with your custom dataset.

Note

The configuration in the recipes is done using the NeMo-Run run.Config and run.Partial configuration objects. Please review the NeMo-Run documentation to learn more about its configuration and execution system.

Running the Training:

Once you have your final configuration ready, you can execute it on any of the NeMo-Run supported executors:

import nemo_run as run

# For pre-training - choose the appropriate model
run.run(pretrain_1b, executor=run.LocalExecutor())  # For 1B model
# or
run.run(pretrain_7b, executor=run.LocalExecutor())  # For 7B model
# or
run.run(pretrain_40b, executor=run.LocalExecutor())  # For 40B model

# For fine-tuning - choose the appropriate model
run.run(finetune_1b, executor=run.LocalExecutor())  # For 1B model
# or
run.run(finetune_7b, executor=run.LocalExecutor())  # For 7B model
# or
run.run(finetune_40b, executor=run.LocalExecutor())  # For 40B model

Alternatively, you can run it directly in the same Python process:

# Choose the appropriate model
run.run(pretrain_1b, direct=True)  # For 1B pre-training
# or
run.run(finetune_7b, direct=True)  # For 7B fine-tuning

BioNeMo Integration with Evo 2#

NVIDIA’s BioNeMo Framework provides specialized support for Evo 2 models in genomics and biological applications. BioNeMo adapts the Hyena architecture specifically for biological sequence modeling tasks.

The BioNeMo Evo 2 documentation provides comprehensive details about:

  • Model architecture and capabilities

  • Available model variants (1B, 7B, and 40B)

  • Training diagnostics and benchmarks

  • Performance characteristics across different context lengths and cluster sizes

  • Zero-shot variant effect prediction for BRCA1 genes

For users interested in applying Evo 2 to their biological data, BioNeMo provides a fine-tuning tutorial that walks through:

  • Data preparation for genomic sequences

  • Fine-tuning process with biological datasets

  • Evaluation of model performance on biological tasks

  • Best practices for biological sequence modeling

The BioNeMo implementation achieves comparable or better accuracy than the original models, with the BioNeMo Evo 2 7B model reaching an AUROC of 0.87 on BRCA1 variant effect prediction tasks.