Qwen2/2.5#

Qwen2 is the new model series of large language models from the Qwen team. Previously, we released the Qwen series, including Qwen2-0.5B, Qwen2-1.5B, Qwen2-7B, Qwen2-72B etc.

Qwen2.5 is the latest model series from the Qwen team, with improved performance and better quality, available in 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B sizes, and base and instruct variants.

We provide recipes for pretraining and fine-tuning Qwen2/2.5 models for the following sizes: 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B using NeMo 2.0 and NeMo-Run. These recipes configure a run.Partial for one of the nemo.collections.llm api functions introduced in NeMo 2.0. The recipes are hosted in qwen2_500m, qwen2_1p5b, qwen2_7b, qwen2_72b, qwen25_500m, qwen25_1p5b, qwen25_7b, qwen25_14b, qwen25_32b, and qwen25_72b.

NeMo 2.0 Pretraining Recipes#

Note

The pretraining recipes use the MockDataModule for the data argument. You are expected to replace the MockDataModule with your own custom dataset.

We provide an example below on how to invoke the default recipe and override the data argument:

from nemo.collections import llm

pretrain = llm.qwen2_7b.pretrain_recipe(
    name="qwen2_7b_pretraining",
    dir=f"/path/to/checkpoints",
    num_nodes=2,
    num_gpus_per_node=8,
)

pretrain = llm.qwen25_72b.pretrain_recipe(
    name="qwen25_72b_pretraining",
    dir=f"/path/to/checkpoints",
    num_nodes=8,
    num_gpus_per_node=8,
)

# # To override the data argument
# dataloader = a_function_that_configures_your_custom_dataset(
#     global_batch_size=global_batch_size,
#     micro_batch_size=micro_batch_size,
#     seq_length=pretrain.model.config.seq_length,
# )
# pretrain.data = dataloader

NeMo 2.0 Fine-tuning Recipes#

Note

The fine-tuning recipes use the SquadDataModule for the data argument. You can replace the SquadDataModule with your custom dataset.

Warning

When using import_ckpt in NeMo 2.0, ensure your script includes if __name__ == "__main__":. Without this, Python’s multiprocessing won’t initialize threads properly, causing a “Failure to acquire lock” error.

To import the HF model and convert to NeMo 2.0 format, run the following command (this only needs to be done once):

from nemo.collections import llm

# For Qwen2 Usage
llm.import_ckpt(model=llm.Qwen2Model(llm.Qwen2Config500M()), source='hf://Qwen/Qwen2-0.5B')

# For Qwen2.5 Usage
llm.import_ckpt(model=llm.Qwen2Model(llm.Qwen25Config500M()), source='hf://Qwen/Qwen2.5-0.5B')

By default, the non-instruct version of the model is loaded. To load a different model, set finetune.resume.restore_config.path=nemo://<hf model id> or finetune.resume.restore_config.path=<local model path>

We provide an example below on how to invoke the default recipe and override the data argument:

from nemo.collections import llm

recipe = llm.qwen2_500m.finetune_recipe(
    name="qwen2_500m_finetuning",
    dir=f"/path/to/checkpoints",
    num_nodes=1,
    num_gpus_per_node=8,
    peft_scheme='lora',  # 'lora', 'none'
    packed_sequence=False,
)

# # To override the data argument
# dataloader = a_function_that_configures_your_custom_dataset(
#     gbs=gbs,
#     mbs=mbs,
#     seq_length=recipe.model.config.seq_length,
# )
# recipe.data = dataloader

By default, the fine-tuning recipe will run LoRA finetuning with LoRA applied to all linear layers in the language model. To fine-tune the entire model without LoRA, set peft_scheme='none' in the recipe argument.

To fine-tune with sequence packing for a higher throughput, set packed_sequence=True. Note that you may need to tune the global batch size in order to achieve similar convergence.

Note

The configuration in the recipes is done using the NeMo-Run run.Config and run.Partial configuration objects. Please review the NeMo-Run documentation to learn more about its configuration and execution system.

Once you have your final configuration ready, you can execute it on any of the NeMo-Run supported executors. The simplest is the local executor, which just runs the pretraining locally in a separate process. You can use it as follows:

import nemo_run as run

run.run(pretrain, executor=run.LocalExecutor())

Additionally, you can also run it directly in the same Python process as follows:

run.run(pretrain, direct=True)

A comprehensive list of pretraining recipes that we currently support or plan to support soon is provided below for reference:

Recipe

Status

Qwen2-0.5B

Yes

Qwen2-1.5B

Yes

Qwen2-7B

Yes

Qwen2-72B

Yes

Qwen2.5-0.5B

Yes

Qwen2.5-1.5B

Yes

Qwen2.5-7B

Yes

Qwen2.5-14B

Yes

Qwen2.5-32B

Yes

Qwen2.5-72B

Yes