Important

You are viewing the NeMo 2.0 documentation. This release introduces significant changes to the API and a new library, NeMo Run. We are currently porting all features from NeMo 1.0 to 2.0. For documentation on previous versions or features not yet available in 2.0, please refer to the NeMo 24.07 documentation.

Mamba 2#

State Space Models (SSMs) have recently emerged as a promising alternative to transformers. SSMs offer advantages such as linear time complexity relative to sequence length and a constant cache size for inference. These features enable the processing of longer sequences and higher throughput. Despite these benefits, SSMs alone may fall short compared to transformers on tasks that demand strong copying or in-context learning capabilities.

To harness the strengths of both approaches, SSM-Hybrid models incorporate MLP, Transformer, and SSM blocks in their architecture. As highlighted in a study by NVIDIA, these hybrid models outperform traditional transformers of the same size by achieving faster inference times due to the inclusion of SSM blocks. Based on experimental results, Mamba2-Hybrid models not only surpass transformer baselines in performance, but also benefit from increased computational efficiency.

The Mamba2 models discussed in the Transformers are SSMs paper are available in five different sizes: 130 million, 370 million, 780 million, 1.3 billion, and 2.7 billion parameters. The Mamba2-Hybrid models, along with their Mamba2 baseline as released by NVIDIA, are provided in an 8 billion parameter size.

NeMo 2.0 Pre-Training Recipes#

We provide pre-defined recipes for pre-training a Mamba2 and Hybrid models in the following sizes: 130M, 370M, 780M, 1.3B, 2.7B, 8B, and Hybrid-8B using NeMo 2.0 and NeMo-Run. These recipes configure a run.Partial for one of the nemo.collections.llm api functions introduced in NeMo 2.0. The recipes are hosted in recipes <https://github.com/NVIDIA/NeMo/tree/main/nemo/collections/llm/recipes> folder (for example mamba_130m.py).

Note

The pre-training recipes use the MockDataModule for the data argument. You are expected to replace the MockDataModule with your custom dataset.

We provide an example below on how to invoke the default recipe and override the data argument:

from nemo.collections import llm

pretrain = llm.mamba2_130m.pretrain_recipe(
    tokenizer_model="/path/to/tokenizer/model"
    name="mamba2_130m_pretraining",
    dir=f"/path/to/checkpoints",
    num_nodes=1,
    num_gpus_per_node=8,
)

dataloader = a_function_that_configures_your_custom_dataset(
    gbs=gbs,
    mbs=mbs,
    seq_length=pretrain.model.config.seq_length,
)
pretrain.data = dataloader

Note

For Mamba2 and Hybrid models, you should provide a path to the tokenizer model (defaluts to None) if the tokenizer is not available on Hugging Face model card. This is the case for 8B and Hybrid 8B models (for other variants set to None). Tokenizer model is located here.

Note

The configuration in the recipes is done using the NeMo-Run run.Config and run.Partial configuration objects. Please review the NeMo-Run documentation to learn more about its configuration and execution system.

Once you have your final configuration ready, you can execute it on any of the NeMo-Run supported executors. The simplest is the local executor, which just runs the pre-training locally in a separate process. You can use it as follows:

import nemo_run as run

run.run(pretrain, executor=run.LocalExecutor())

Additionally, you can also run it directly in the same Python process as follows:

run.run(pretrain, direct=True)

NeMo 2.0 Fine-Tuning Recipes#

Similar to pre-training, we provide fine-tuning recipes for the aformentioned Mamba2 and Hybrid models. The fine-tuning recipes are hosted in the same location in recipes <https://github.com/NVIDIA/NeMo/tree/main/nemo/collections/llm/recipes> folder (for example mamba_130m.py).

Note

The fine-tuning recipes use the SquadDataModule (designed for SQUAD dataset) for the data argument. You are expected to replace the SquadDataModule with your custom dataset.

We provide an example below on how to invoke the default recipe and override the data argument:

from nemo.collections import llm

pretrain = llm.mamba2_130m.finetune_recipe(
    resume_path="/path/to/nemo/checkpoint",
    tokenizer_model="/path/to/tokenizer/model",
    name="mamba2_130m_finetuning",
    dir=f"/path/to/checkpoints",
    num_nodes=1,
    num_gpus_per_node=8,
)

dataloader = a_function_that_configures_your_custom_dataset(
    gbs=gbs,
    mbs=mbs,
    seq_length=pretrain.model.config.seq_length,
)
pretrain.data = dataloader

Note

For Mamba2 and Hybrid models, you should provide your NeMo checkpoint to resume_path for all the models. Moreover, in case the tokenizer is not available on the Hugging Face model card (which is the case for 8B and Hybrid 8B models), you should provide the tokenizer model path to the tokenizer_model argument (for other variants set to None). Tokenizer model is located here.

Note

The configuration in the recipes is done using the NeMo-Run run.Config and run.Partial configuration objects. Please review the NeMo-Run documentation to learn more about its configuration and execution system.

Once you have your final configuration ready, you can execute it on any of the NeMo-Run supported executors. The simplest is the local executor, which just runs the fine-tuning locally in a separate process. You can use it as follows:

import nemo_run as run

run.run(finetune, executor=run.LocalExecutor())

Additionally, you can also run it directly in the same Python process as follows:

run.run(finetune, direct=True)

A comprehensive list of pre-training and fine-tuning recipes that we currently support or plan to support soon is provided below for reference:

Recipe

Status

Mamba2 130M

Yes

Mamba2 370M

Yes

Mamba2 780M

Yes

Mamba2 1.3B

Yes

Mamba2 2.7B

Yes

Mamba2 8B

Yes

Mamba2 Hybrid-8B

Yes