Llama Nemotron#

NVIDIA’s Llama Nemotron

is a family of advanced models excelling in reasoning and a diverse set of agentic AI tasks.

The models are optimized for platforms—from data centers to PCs—and excel in graduate-level scientific reasoning, advanced math, coding, instruction following, and tool calling.

We provide pre-defined recipes for finetuning a Llama Nemotron model in various sizes: 8B, 49B, 70B, and 253B. The recipes use NeMo 2.0 and NeMo-Run. These recipes configure a run.Partial for one of the nemo.collections.llm api functions introduced in NeMo 2.0. The recipes are hosted in the following folder: llm/recipes.

NeMo 2.0 Pretraining Recipes#

We do not provide Llama Nemotron pretraining recipes as all the models are either directly finetuned or after distillation based on the Llama3 family. You can refer to Llama3 for pretraining recipe for the base model.

NeMo 2.0 Finetuning Recipes#

Note

The finetuning recipes use the SquadDataModule for the data argument. You can replace the SquadDataModule with your custom dataset.

To import the HF model and convert to NeMo 2.0 format, run the following command (this only needs to be done once)

from nemo.collections import llm
llm.import_ckpt(model=llm.LlamaNemotronModel(llm.Llama31NemotronUltra253BConfig()), source='hf://nvidia/Llama-3_1-Nemotron-Ultra-253B-v1')

We provide an example below on how to invoke the default recipe and override the data argument:

from nemo.collections import llm

recipe = llm.llama33_nemotron_super_49b.finetune_recipe(
    name="llama33_nemotron_super_49b_finetuning",
    dir=f"/path/to/checkpoints",
    num_nodes=4,
    num_gpus_per_node=8,
    peft_scheme='lora',  # 'lora', 'none'
    packed_sequence=False,
)

# # To override the data argument
# dataloader = a_function_that_configures_your_custom_dataset(
#     gbs=gbs,
#     mbs=mbs,
#     seq_length=recipe.model.config.seq_length,
# )
# recipe.data = dataloader

By default, the finetuning recipe will run LoRA finetuning with LoRA applied to all linear layers in the language model. To finetune the entire model without LoRA, set peft_scheme='none' in the recipe argument.

To finetune with sequence packing for a higher throughput, set packed_sequence=True. Note that you may need to tune the global batch size in order to achieve similar convergence.

Note

The configuration in the recipes is done using the NeMo-Run run.Config and run.Partial configuration objects. Please review the NeMo-Run documentation to learn more about its configuration and execution system.

Once you have your final configuration ready, you can execute it on any of the NeMo-Run supported executors. The simplest is the local executor, which just runs the pretraining locally in a separate process. You can use it as follows:

import nemo_run as run

run.run(pretrain, executor=run.LocalExecutor())

Additionally, you can also run it directly in the same Python process as follows:

run.run(pretrain, direct=True)

A comprehensive list of pretraining recipes that we currently support or plan to support soon is provided below for reference:

Recipe

Status

Llama 3.1 Nemotron Nano 8B

Yes

Llama 3.3 Nemotron Super 49B

Yes

Llama 3.1 Nemotron Ultra 253B

Yes

Llama 3.1 Nemotron Instruct/Reward 70B

Yes