Important
You are viewing the NeMo 2.0 documentation. This release introduces significant changes to the API and a new library, NeMo Run. We are currently porting all features from NeMo 1.0 to 2.0. For documentation on previous versions or features not yet available in 2.0, please refer to the NeMo 24.07 documentation.
Llama#
Meta’s Llama builds on the general transformer decoder framework with some key additions such as pre-normalization, SwiGLU activations, and Rotary Positional Embeddings (RoPE). More information is available in the companion paper “Llama: Open and Efficient Foundation Language Models”. With a wide variety of model sizes - Llama has options for every inference budget.
We provide pre-defined recipes for pretraining and finetuning a Llama 3 model in two sizes: 8B and 70B,
as well as Llama 3.1 model in three sizes: 8B, 70B and 405B.
The recipes use NeMo 2.0 and NeMo-Run.
These recipes configure a run.Partial
for one of the nemo.collections.llm api functions introduced in NeMo 2.0.
The recipes are hosted in the following files:
llama3_8b,
llama3_70b,
llama31_8b,
llama31_70b,
llama31_405b.
NeMo 2.0 Pretraining Recipes#
Note
The pretraining recipes use the MockDataModule
for the data
argument. You are expected to replace the MockDataModule
with your custom dataset.
We provide an example below on how to invoke the default recipe and override the data argument:
from nemo.collections import llm
pretrain = llm.llama3_8b.pretrain_recipe(
name="llama3_8b_pretraining",
ckpt_dir=f"/path/to/checkpoints",
num_nodes=1,
num_gpus_per_node=8,
)
# # To override the data argument
# dataloader = a_function_that_configures_your_custom_dataset(
# gbs=gbs,
# mbs=mbs,
# seq_length=pretrain.model.config.seq_length,
# )
# pretrain.data = dataloader
NeMo 2.0 Finetuning Recipes#
Note
The finetuning recipes use the SquadDataModule
for the data
argument. You can replace the SquadDataModule
with your custom dataset.
To import the HF model and convert to NeMo 2.0 format, run the following command (this only needs to be done once)
from nemo.collections import llm
llm.import_ckpt(model=llm.LlamaModel(llm.Llama3Config8B()), source='hf://meta-llama/Meta-Llama-3-8B')
By default, the non-instruct version of the model is loaded. To load a different model, set
finetune.resume.restore_config.path=nemo://<hf model id>
or
finetune.resume.restore_config.path=<local model path>
We provide an example below on how to invoke the default recipe and override the data argument:
from nemo.collections import llm
recipe = llm.llama3_8b.finetune_recipe(
name="llama3_8b_finetuning",
ckpt_dir=f"/path/to/checkpoints",
num_nodes=1,
num_gpus_per_node=8,
peft_scheme='lora', # 'lora', 'none'
packed_sequence=False,
)
# # To override the data argument
# dataloader = a_function_that_configures_your_custom_dataset(
# gbs=gbs,
# mbs=mbs,
# seq_length=recipe.model.config.seq_length,
# )
# recipe.data = dataloader
By default, the finetuning recipe will run LoRA finetuning with LoRA applied to all linear layers in the language model.
To finetune the entire model without LoRA, set peft_scheme='none'
in the recipe argument.
To finetune with sequence packing for a higher throughput, set packed_sequence=True
. Note that you may need to
tune the global batch size in order to achieve similar convergence.
Note
The configuration in the recipes is done using the NeMo-Run run.Config
and run.Partial
configuration objects. Please review the NeMo-Run documentation to learn more about its configuration and execution system.
Once you have your final configuration ready, you can execute it on any of the NeMo-Run supported executors. The simplest is the local executor, which just runs the pretraining locally in a separate process. You can use it as follows:
import nemo_run as run
run.run(pretrain, executor=run.LocalExecutor())
Additionally, you can also run it directly in the same Python process as follows:
run.run(pretrain, direct=True)
A comprehensive list of pretraining recipes that we currently support or plan to support soon is provided below for reference:
Recipe |
Status |
---|---|
Llama 3 8B |
Yes |
Llama 3 8B FP8 |
N/A |
Llama 3 8B 16k seq length |
Yes |
Llama 3 8B 64k seq length |
Yes |
Llama 3 70B |
Yes |
Llama 3 70B FP8 |
N/A |
Llama 3 70B 16k seq length |
Yes |
Llama 3 70B 64k seq length |
Yes |
Llama 3.1 8B |
Yes |
Llama 3.1 70B |
Yes |
Llama 3.1 405B |
Yes |