Important

NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.

Llama

Meta’s Llama builds on the general transformer decoder framework with some key additions such as pre-normalization, SwiGLU activations, and Rotary Positional Embeddings (RoPE). More information is available in the companion paper “Llama: Open and Efficient Foundation Language Models”. With a wide variety of model sizes - Llama has options for every inference budget.

NeMo 2.0 Pretraining Recipes

We provide pre-defined recipes for pretraining a Llama 3 model in two sizes: 8B and 70B using NeMo 2.0 and NeMo-Run. These recipes configure a run.Partial for one of the nemo.collections.llm api functions introduced in NeMo 2.0. The recipes are hosted in llama3_8b and llama3_70b files.

Note

The pretraining recipes use the MockDataModule for the data argument. You are expected to replace the MockDataModule with your custom dataset.

We provide an example below on how to invoke the default recipe and override the data argument:

from nemo.collections import llm

pretrain = llm.llama3_8b.pretrain_recipe(
    name="llama3_8b_pretraining",
    ckpt_dir=f"/path/to/checkpoints",
    num_nodes=1,
    num_gpus_per_node=8,
)

dataloader = a_function_that_configures_your_custom_dataset(
    gbs=gbs,
    mbs=mbs,
    seq_length=pretrain.model.config.seq_length,
)
pretrain.data = dataloader

Note

The configuration in the recipes is done using the NeMo-Run run.Config and run.Partial configuration objects. Please review the NeMo-Run documentation to learn more about its configuration and execution system.

Once you have your final configuration ready, you can execute it on any of the NeMo-Run supported executors. The simplest is the local executor, which just runs the pretraining locally in a separate process. You can use it as follows:

import nemo_run as run

run.run(pretrain, executor=run.LocalExecutor())

Additionally, you can also run it directly in the same Python process as follows:

run.run(pretrain, direct=True)

A comprehensive list of pretraining recipes that we currently support or plan to support soon is provided below for reference:

Recipe

Status

Llama 3 8B

Yes

Llama 3 8B FP8

N/A

Llama 3 8B 16k seq length

Yes

Llama 3 8B 64k seq length

Yes

Llama 3 70B

Yes

Llama 3 70B FP8

N/A

Llama 3 70B 16k seq length

Yes

Llama 3 70B 64k seq length

Yes